Together AI's kernel research team delivers major GPU optimization breakthroughs, cutting inference latency from 281ms to 77ms for enterprise AI deployments. (ReadTogether AI's kernel research team delivers major GPU optimization breakthroughs, cutting inference latency from 281ms to 77ms for enterprise AI deployments. (Read

Together AI Kernels Team Achieves 3.6x Performance Gains on NVIDIA Hardware

2026/04/02 03:17
4 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.

Together AI Kernels Team Achieves 3.6x Performance Gains on NVIDIA Hardware

Timothy Morano Apr 01, 2026 19:17

Together AI's kernel research team delivers major GPU optimization breakthroughs, cutting inference latency from 281ms to 77ms for enterprise AI deployments.

Together AI Kernels Team Achieves 3.6x Performance Gains on NVIDIA Hardware

The team behind FlashAttention has quietly become one of the most consequential groups in AI infrastructure. Together AI's kernel research unit, now about 15 engineers strong, is solving a problem most people don't even know exists: the massive performance gap between AI models and the hardware running them.

Their latest win? Taking a voice AI company's time-to-first-token from 281ms down to 77ms—a 3.6x improvement that translated to 7.2x better unit economics.

The Hidden Bottleneck

Here's what most AI discourse misses: having great models and expensive GPUs doesn't guarantee performance. The bottleneck sits in between—the kernel layer that translates mathematical operations into actual silicon instructions.

"The gap between what researchers design and what actually runs fast on hardware is vast," explains Dan Fu, who leads a parallel research lab at UCSD. Get kernels right and you unlock hardware's full potential. Get them wrong and your expensive GPUs sit partially idle.

For companies building AI-native products, this isn't academic. When inference costs run 2x higher than necessary, or when latency breaks the user experience, kernel optimization becomes existential.

One Week Versus One Year

The team's capabilities showed clearly when NVIDIA's Blackwell GPUs arrived in March 2025. NVIDIA had spent a year with dozens of engineers optimizing kernels for the new architecture. Together AI had a week.

Their secret weapon: ThunderKittens, a library developed with Stanford researchers that reduces kernel code from 1,000+ lines of CUDA to roughly 100-200 lines. The abstraction layer is built around NVIDIA's tensor cores, the specialized matrix multiplication units on modern GPUs.

Within seven days of hardware access, the team had some of the fastest FP4 and FP8 GEMM kernels available for Blackwell, achieving up to 2x speedups over cuBLAS on H100s.

Real-World Impact

The voice AI case study illustrates what this means in production. The customer had a hard constraint: time-to-first-64-tokens above roughly 100ms breaks conversational flow. Their B200 deployment was hitting 281ms.

Together's team hand-optimized a "Megakernel" implementation—running an entire model in a single kernel, targeting the HBM bandwidth ceiling of NVIDIA H100s. Results on Llama-3.2-1B: 77ms. On Qwen 2.5 1.5B: 127ms, down from 292ms.

The approach traces back to FlashAttention's original insight. That Memorial Day 2022 paper proved the AI establishment wrong about attention being fully optimized. By applying database systems principles—data locality, memory hierarchies—to transformer attention, the team achieved 2-3x speedups where previous sparsity methods showed only 10% real gains.

Academic-Industry Pipeline

The team operates through an unusual model. Dan Fu runs his UCSD lab on higher-risk fundamental research. Together AI co-founder Tri Dao is at Princeton. Simran Arora is at Caltech. Ideas get de-risked in academia, then productionized at Together AI. PhD students join the company. Interns work on longer-term research in academic labs.

This produces engineers who bridge theory and production—people who, as Fu puts it, "lose sleep over memory access patterns" and "find beauty in data flow diagrams."

The work isn't glamorous. No announcements when a kernel optimization lands. Just faster training times, lower costs, higher throughput. But these margins determine whether AI-native products feel instant or sluggish, whether unit economics work or don't, whether companies scale to millions of users or plateau at thousands.

For enterprise AI deployments where every millisecond matters—and every percentage point of efficiency translates to significant cost savings—this invisible infrastructure layer may be where the real competitive advantage lies.

Image source: Shutterstock
  • together ai
  • gpu optimization
  • nvidia
  • ai infrastructure
  • machine learning
Opportunità di mercato
Logo Major
Valore Major (MAJOR)
$0,06469
$0,06469$0,06469
-0,76%
USD
Grafico dei prezzi in tempo reale di Major (MAJOR)
Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.

Potrebbe anche piacerti

CME Group to launch Solana and XRP futures options in October

CME Group to launch Solana and XRP futures options in October

The post CME Group to launch Solana and XRP futures options in October appeared on BitcoinEthereumNews.com. CME Group is preparing to launch options on SOL and XRP futures next month, giving traders new ways to manage exposure to the two assets.  The contracts are set to go live on October 13, pending regulatory approval, and will come in both standard and micro sizes with expiries offered daily, monthly and quarterly. The new listings mark a major step for CME, which first brought bitcoin futures to market in 2017 and added ether contracts in 2021. Solana and XRP futures have quickly gained traction since their debut earlier this year. CME says more than 540,000 Solana contracts (worth about $22.3 billion), and 370,000 XRP contracts (worth $16.2 billion), have already been traded. Both products hit record trading activity and open interest in August. Market makers including Cumberland and FalconX plan to support the new contracts, arguing that institutional investors want hedging tools beyond bitcoin and ether. CME’s move also highlights the growing demand for regulated ways to access a broader set of digital assets. The launch, which still needs the green light from regulators, follows the end of XRP’s years-long legal fight with the US Securities and Exchange Commission. A federal court ruling in 2023 found that institutional sales of XRP violated securities laws, but programmatic exchange sales did not. The case officially closed in August 2025 after Ripple agreed to pay a $125 million fine, removing one of the biggest uncertainties hanging over the token. This is a developing story. This article was generated with the assistance of AI and reviewed by editor Jeffrey Albus before publication. Get the news in your inbox. Explore Blockworks newsletters: Source: https://blockworks.co/news/cme-group-solana-xrp-futures
Condividi
BitcoinEthereumNews2025/09/17 23:55
Zelenskyy warns Russia aims to involve Belarus in Ukraine conflict

Zelenskyy warns Russia aims to involve Belarus in Ukraine conflict

The post Zelenskyy warns Russia aims to involve Belarus in Ukraine conflict appeared on BitcoinEthereumNews.com. Zelenskyy said Russia is trying to draw Belarus
Condividi
BitcoinEthereumNews2026/04/18 11:12
Bitcoin, Gold, and U.S. Stocks Dive as Trump Pledges to Hit Iran ‘Extremely Hard’

Bitcoin, Gold, and U.S. Stocks Dive as Trump Pledges to Hit Iran ‘Extremely Hard’

The post Bitcoin, Gold, and U.S. Stocks Dive as Trump Pledges to Hit Iran ‘Extremely Hard’ appeared on BitcoinEthereumNews.com. In brief Bitcoin dropped Thursday
Condividi
BitcoinEthereumNews2026/04/02 17:57

USD1 Genesis: 0 Fees + 12% APR

USD1 Genesis: 0 Fees + 12% APRUSD1 Genesis: 0 Fees + 12% APR

New users: stake for up to 600% APR. Limited time!