The post Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads – Bitcoin News appeared on BitcoinEthereumNews.com. Key Takeaways: NvidiaThe post Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads – Bitcoin News appeared on BitcoinEthereumNews.com. Key Takeaways: Nvidia

Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads – Bitcoin News

2026/04/20 11:33
4분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

Key Takeaways:

  • Nvidia released Nemotron 3 Super, a 120B-parameter open MoE model activating only 12.7B parameters per forward pass.
  • Nemotron 3 Super delivers up to 7.5x more throughput than Qwen3.5-122B-A10B in agent workloads on 8k-in/64k-out settings.
  • The model is fully open under the Nvidia Nemotron Open Model License, with checkpoints and training data on Hugging Face.

Nvidia Launches Nemotron 3 Super With 7.5x Throughput Gains Over Qwen3.5-122B

The latest Nvidia model activates only 12.7 billion parameters per forward pass using a Mixture-of-Experts (MoE) architecture, meaning most of its weight stays idle during inference. That design choice directly targets two problems developers hit when deploying multi-step AI agents: the added cost of extended reasoning chains and the ballooning token usage that can multiply up to 15 times in multi-agent pipelines.

Nemotron 3 Super is the second model in Nvidia’s Nemotron 3 family, following Nemotron 3 Nano from December 2025. Nvidia announced the release around March 10, 2026.

The model uses a hybrid Mamba-Transformer backbone across 88 layers. Mamba-2 blocks handle long sequences with linear-time efficiency, while Transformer attention layers preserve precise recall. That combination gives the model native support for context windows up to one million tokens without the memory penalties typical of pure-attention designs.

Nvidia also built in a LatentMoE routing system that compresses token embeddings into a low-rank space before sending them to 512 experts per layer, activating 22 at a time. The company says this allows roughly four times more experts at the same inference cost compared to standard MoE approaches, and enables finer task specialization, such as separating Python logic from SQL handling at the expert level.

Image source: Nvidia blog.

Multi-Token Prediction layers, using two shared-weight heads, speed up chain-of-thought generation and allow native speculative decoding. On structured tasks, Nvidia reports up to three times faster generation.

The model was pre-trained on 25 trillion tokens across two phases. The first phase used 20 trillion tokens of broad data. The second used five trillion high-quality tokens tuned for benchmark performance. A final extension phase on 51 billion tokens extended native context to one million tokens. Post-training included supervised fine-tuning on roughly seven million samples and reinforcement learning across 21 environments with more than 1.2 million rollouts.

In benchmarks, Nemotron 3 Super scored 83.73 on MMLU-Pro, 90.21 on AIME25, and 60.47 on SWE-Bench using OpenHands. On PinchBench, it reached 85.6 percent, the highest reported score among open models in its class. On long-context evaluation, it scored 91.64 on RULER 1M.

Compared to GPT-OSS-120B, Nemotron 3 Super delivers 2.2 times the throughput at 8k input and 64k output. Against Qwen3.5-122B-A10B, that figure reaches 7.5 times. Nvidia also reports more than five times the throughput and up to two times the accuracy over the prior Nemotron Super generation.

Nvidia trained the model end-to-end in its NVFP4 four-bit floating-point format, optimized for Blackwell GPUs. On B200 hardware, Nvidia says inference runs up to four times faster compared to FP8 on H100 with no reported accuracy loss. Quantized FP8 and NVFP4 checkpoints retain 99.8 percent or more of full-precision accuracy.

The model also powers the Nvidia AI-Q research agent, which reached the top position on the Deepresearch Bench leaderboard.

Nemotron 3 Super is fully open under the Nvidia Nemotron Open Model License. Checkpoints in BF16, FP8, and NVFP4 formats, along with pre-training data, post-training samples, and reinforcement learning environments, are available on Hugging Face. Inference is supported through Nvidia NIM, build.nvidia.com, Perplexity, Openrouter, Together AI, Google Cloud, AWS, Azure, and Coreweave, with on-premises options via Dell Enterprise Hub and HPE.

Developers can access training recipes, fine-tuning guides, and inference cookbooks through the NeMo platform using vLLM, SGLang, and TensorRT-LLM.

Source: https://news.bitcoin.com/nvidia-releases-nemotron-3-super-a-120b-open-ai-model-built-for-agentic-workloads/

시장 기회
The 7 Wanderers 로고
The 7 Wanderers 가격(7)
$0.00001484
$0.00001484$0.00001484
+0.54%
USD
The 7 Wanderers (7) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

Roll the Dice & Win Up to 1 BTC

Roll the Dice & Win Up to 1 BTCRoll the Dice & Win Up to 1 BTC

Invite friends & share 500,000 USDT!