Open‑YOLO 3D replaces costly SAM/CLIP steps with 2D detection, LG label‑maps, and parallelized visibility, enabling fast and accurate 3D OV segmentation.Open‑YOLO 3D replaces costly SAM/CLIP steps with 2D detection, LG label‑maps, and parallelized visibility, enabling fast and accurate 3D OV segmentation.

Drop the Heavyweights: YOLO‑Based 3D Segmentation Outpaces SAM/CLIP

2025/08/26 16:20

Abstract and 1 Introduction

  1. Related works
  2. Preliminaries
  3. Method: Open-YOLO 3D
  4. Experiments
  5. Conclusion and References

A. Appendix

3 Preliminaries

Problem formulation: 3D instance segmentation aims at segmenting individual objects within a 3D scene and assigning one class label to each segmented object. In the open-vocabulary (OV) setting, the class label can belong to previously known classes in the training set as well as new class labels. To this end, let P denote a 3D reconstructed point cloud scene, where a sequence of RGB-D images was used for the reconstruction. We denote the RGB image frames as I along with their corresponding depth frames D. Similar to recent methods [35, 42, 34], we assume that the poses and camera parameters are available for the input 3D scene.

\

3.1 Baseline Open-Vocabulary 3D Instance Segmentation

We base our approach on OpenMask3D [42], which is the first method that performs open-vocabulary 3D instance segmentation in a zero-shot manner. OpenMask3D has two main modules: a class-agnostic mask proposal head, and a mask-feature computation module. The class-agnostic mask proposal head uses a transformer-based pre-trained 3D instance segmentation model [39] to predict a binary mask for each object in the point cloud. The mask-feature computation module first generates 2D segmentation masks by projecting 3D masks into views in which the 3D instances are highly visible, and refines them using the SAM [23] model. A pre-trained CLIP vision-language model [55] is then used to generate image embeddings for the 2D segmentation masks. The embeddings are then aggregated across all the 2D frames to generate a 3D mask-feature representation.

\ Limitations: OpenMask3D makes use of the advancements in 2D segmentation (SAM) and vision-language models (CLIP) to generate and aggregate 2D feature representations, enabling the querying of instances according to open-vocabulary concepts. However, this approach suffers from a high computation burden leading to slow inference times, with a processing time of 5-10 minutes per scene. The computation burden mainly originates from two sub-tasks: the 2D segmentation of the large number of objects from the various 2D views, and the 3D feature aggregation based on the object visibility. We next introduce our proposed method which aims at reducing the computation burden and improving the task accuracy.

\

4 Method: Open-YOLO 3D

Motivation: We here present our proposed 3D open-vocabulary instance segmentation method, Open-YOLO 3D, which aims at generating 3D instance predictions in an efficient strategy. Our proposed method introduces efficient and improved modules at the task level as well as the data level. Task Level: Unlike OpenMask3D, which generates segmentations of the projected 3D masks, we pursue a more efficient approach by relying on 2D object detection. Since the end target is to generate labels for the 3D masks, the increased computation from the 2D segmentation task is not necessary. Data Level: OpenMask3D computes the 3D mask visibility in 2D frames by iteratively counting visible points for each mask across all frames. This approach is time-consuming, and we propose an alternative approach to compute the 3D mask visibility within all frames at once.

\

4.1 Overall Architecture

\

4.2 3D Object Proposal

\

4.3 Low Granularity (LG) Label-Maps

\

4.4 Accelerated Visibility Computation (VAcc)

In order to associate 2D label maps with 3D proposals, we compute the visibility of each 3D mask. To this end, we propose a fast approach that is able to compute 3D mask visibility within frames via tensor operations which are highly parallelizable.

\ Figure 3: Multi-View Prompt Distribution (MVPDist). After creating the LG label maps for all frames, we select the top-k label maps based on the 2D projection of the 3D proposal. Using the (x, y) coordinates of the 2D projection, we choose the labels from the LG label maps to generate the MVPDist. This distribution predicts the ID of the text prompt with the highest probability.

\

\

\

4.5 Multi-View Prompt Distribution (MVPDist)

\ Table 1: State-of-the-art comparison on ScanNet200 validation set. We use Mask3D trained on the ScanNet200 training set to generate class-agnostic mask proposals. Our method demonstrates better performance compared to those that generate 3D proposals by fusing 2D masks and proposals from a 3D network (highlighted in gray in the table). It outperforms state-of-the-art methods by a wide margin under the same conditions using only proposals from a 3D network.

\

4.6 Instance Prediction Confidence Score

\

:::info Authors:

(1) Mohamed El Amine Boudjoghra, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) (mohamed.boudjoghra@mbzuai.ac.ae);

(2) Angela Dai, Technical University of Munich (TUM) (angela.dai@tum.de);

(3) Jean Lahoud, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) ( jean.lahoud@mbzuai.ac.ae);

(4) Hisham Cholakkal, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) (hisham.cholakkal@mbzuai.ac.ae);

(5) Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Aalto University (rao.anwer@mbzuai.ac.ae);

(6) Salman Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University (salman.khan@mbzuai.ac.ae);

(7) Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University (fahad.khan@mbzuai.ac.ae).

:::


:::info This paper is available on arxiv under CC BY-NC-SA 4.0 Deed (Attribution-Noncommercial-Sharelike 4.0 International) license.

:::

\

Piyasa Fırsatı
YOLO Logosu
YOLO Fiyatı(YOLO)
$0.000000006841
$0.000000006841$0.000000006841
-0.14%
USD
YOLO (YOLO) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Unlock Potential: OKX Lists LIGHT Perpetual Futures with 50x Leverage

Unlock Potential: OKX Lists LIGHT Perpetual Futures with 50x Leverage

BitcoinWorld Unlock Potential: OKX Lists LIGHT Perpetual Futures with 50x Leverage In a significant move for crypto derivatives traders, OKX has announced the
Paylaş
bitcoinworld2025/12/16 15:30
New Gold Protocol's NGP token was exploited and attacked, resulting in a loss of approximately $2 million.

New Gold Protocol's NGP token was exploited and attacked, resulting in a loss of approximately $2 million.

PANews reported on September 18th that according to Paidun monitoring, New Gold Protocol's NGP token was exploited in an attack, resulting in a loss of approximately $2 million. The NGP token plummeted 88% in an hour, and the attacker deposited the stolen funds (443.8 ETH) into TornadoCash.
Paylaş
PANews2025/09/18 11:10
USDC Exchange Inflows Hit $1.33B, Highest in Over Four Years

USDC Exchange Inflows Hit $1.33B, Highest in Over Four Years

The post USDC Exchange Inflows Hit $1.33B, Highest in Over Four Years appeared on BitcoinEthereumNews.com. Key Points: Daily USDC inflow reaches $1.33B, marking a 4-year record Global stablecoin supply surges to an all-time high of $280B USDC market cap grows steadily, reflecting rising institutional interest USDC inflows into centralized exchanges have reached $1.33 billion, the highest level recorded in more than four years. This surge indicates renewed investor interest and suggests a strong return of capital to crypto markets. USDC Exchange Inflow + BTC Price | Source : CryptoQuant The recent inflow occurred in mid-September 2025 and followed consistent large deposits over the past month. Notably, inflows of $1.2 billion and $1 billion were seen in early and late August, respectively. Rising Exchange Inflows Signal Increasing On-Chain Liquidity Large stablecoin inflows to exchanges often signal potential market activity, especially when the volume exceeds historical averages. The $1.33B inflow represents a significant injection of liquidity and indicates increased market readiness. When stablecoins like USDC are sent to exchanges in large amounts, it typically reflects user intent to trade or reposition capital. These actions suggest that investors are preparing for market moves or accumulating digital assets. Global Stablecoin Supply Surges to $280 Billion The global supply of stablecoins has reached an all-time high of $280 billion, showing strong growth from a low of $125 billion in mid-2023. This doubling in supply over two years reflects rising demand for digital dollar-based assets. Global Stablecoin Supply at all-time high of $280 billion | Source : token terminal  This growth indicates broader adoption across use cases such as trading, payments, and decentralized finance. The consistent increase in outstanding supply also reflects capital inflows from both institutional and retail users. USDC Sees Steady Growth in Market Share and Trust USDC’s market capitalization has climbed to approximately $63 billion, continuing its recovery from previous lows. This steady rise signals improving market sentiment…
Paylaş
BitcoinEthereumNews2025/09/19 17:12