MaGGIe introduces an efficient framework using Cross-Attention, Self-Attention, and Sparse Convolutions for mask-guided instance matting, ensuring high accuracyMaGGIe introduces an efficient framework using Cross-Attention, Self-Attention, and Sparse Convolutions for mask-guided instance matting, ensuring high accuracy

MaGGIe Architecture: Efficient Mask-Guided Instance Matting

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

\ Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

3. MaGGIe

We introduce our efficient instance matting framework guided by instance binary masks, structured into two parts. The first Sec. 3.1 details our novel architecture to maintain accuracy and efficiency. The second Sec. 3.2 describes our approach for ensuring temporal consistency across frames in video processing.

3.1. Efficient Masked Guided Instance Matting

\

\ In cross-attention (CA), Q and (K, V) originate from different sources, whereas in self-attention (SA), they share similar information.

\

\ Figure 2. Overall pipeline of MaGGIe. This framework processes frame sequences I and instance masks M to generate per-instance alpha mattes A′ for each frame. It employs progressive refinement and sparse convolutions for accurate mattes in multi-instance scenarios, optimizing computational efficiency. The subfigures on the right illustrate the Instance Matte Decoder and the Instance Guidance, where we use mask guidance to predict coarse instance mattes and guide detail refinement by deep features, respectively. (Optimal in color and zoomed view).

\ where {; } denotes concatenation along the feature dimension, and G is a series of sparse convolutions with sigmoid activation.

\

\

:::info Authors:

(1) Chuong Huynh, University of Maryland, College Park (chuonghm@cs.umd.edu);

(2) Seoung Wug Oh, Adobe Research (seoh,jolee@adobe.com);

(3) Abhinav Shrivastava, University of Maryland, College Park (abhinav@cs.umd.edu);

(4) Joon-Young Lee, Adobe Research (jolee@adobe.com).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Piyasa Fırsatı
Mask Network Logosu
Mask Network Fiyatı(MASK)
$0.5351
$0.5351$0.5351
-5.35%
USD
Mask Network (MASK) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.