The post Can AI Beat the Sports Betting Market? 8 of the Top Models Tried appeared on BitcoinEthereumNews.com. In brief Frontier AI models blew up betting on realThe post Can AI Beat the Sports Betting Market? 8 of the Top Models Tried appeared on BitcoinEthereumNews.com. In brief Frontier AI models blew up betting on real

Can AI Beat the Sports Betting Market? 8 of the Top Models Tried

2026/04/16 11:31
5 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo crypto.news@mexc.com.

In brief

  • Frontier AI models blew up betting on real-world football markets.
  • They knew the right strategy—but failed to execute it.
  • A simple 1990s model was able to best most of them.

General Reasoning just gave frontier AI its worst report card yet. Eight top models, including Claude, Grok, Gemini, and GPT-5.4, were each given a virtual bankroll and asked to build a machine learning betting strategy across a full 2023-24 English Premier League season.

Every single one lost money. Several went completely bankrupt.

The benchmark is called KellyBench, named after the Kelly criterion, a 1956 formula that tells you exactly how much to bet when you have an edge over the market. Every model could recite the Kelly formula. None of them could actually use it.

xAI’s Grok 4.20 failed all three runs, going fully bankrupt in one, forfeiting mid-season in the other two. Google’s Gemini Flash forfeited two of three runs after placing a single wager of roughly £273,000 on a three-percentage-point historical win-rate edge—and losing it. Claude Opus 4.6, Anthropic’s best model, lost 11% on average and somehow came out looking like the responsible adult in the room.

In fact, the research paper mentions that the old Dixon-Coles from the late 1990s outperformed most of the frontier models evaluated — finishing ahead of six out of eight, even with limited data.

“Dixon-Coles is an outdated 2000s baseline which doesn’t utilise all available data or account for non-stationarity in a principled way,” the researchers note. “It is therefore even more surprising that many frontier models, such as Gemini 3.1 Pro, are unable to beat or match it on KellyBench.

This matters beyond football. Earlier this year, AI benchmarks showed that Claude could dominate business simulations through price-fixing, cartel agreements, and strategic deception.

That decision-making process involved static competition, limited opponents, clear scoring, and so on. KellyBench is the opposite: 120 matchdays, constantly shifting data, a market that gets smarter every week, and promoted teams with zero historical records.

The researchers call the core problem a “knowledge-action gap.” It is exactly what it sounds like.

Business decisions are mostly based on fixed conditions while sports betting is a more fluid and mutable market, which makes things difficult for these models. “KellyBench requires agents to maintain coherent intent across potentially thousands of sequential decisions, monitor the consequences of those decisions, and close the loop between observation and action,” researchers argue.

We’re not there yet, obviously.

The models could articulate the right strategy, diagnose when something was broken, and identify the cause of their losses, but then failed to verify their code actually implemented what they planned, failed to notice when execution diverged from intent, and failed to act on their own findings.

GLM-5 wrote three separate self-critique documents during its run. Each one correctly identified that its hardcoded 25% draw rate and overestimation of home advantage were destroying its returns. At one point, with its bankroll around £44,200, it noted that its predicted 40% home win rate was only hitting 30% in reality. It never changed the code. It kept betting the same way until the money was gone.

Kimi K2.5 did something arguably more impressive and more tragic. It wrote a mathematically correct fractional Kelly staking function—the right formula, properly structured. Then it never called it. A formatting bug caused the model to send a broken bash command roughly 50 times in a row. Its reasoning noted the problem. It then sent the identical broken command again. An accidental £114,000 bet—98% of its remaining bankroll—on a Burnley versus Luton match finished the job.

GPT-5.4 was the most methodical. It spent 160 tool calls building models before placing a single bet, then calculated that its log-loss (0.974) was barely worse than the market’s (0.971) and concluded it had no edge. It spent the rest of the season placing penny bets to preserve capital. Sound reasoning.

OpenAI’s model lost 13.6% on average. One seed alone cost roughly $2,012 to run.

Ross Taylor, General Reasoning’s CEO and former Meta AI researcher, told the Financial Times that most AI benchmarks operate in “very static environments” that bear little resemblance to the real world. “There’s a lot of excitement about AI automation, but there haven’t been many attempts to evaluate AI in long-term, real-world environments,” he said.

The General Reasoning team didn’t immediately respond to a request for comments by Decrypt.

To measure strategy quality beyond raw returns, the researchers built a 44-point sophistication rubric with quantitative betting fund experts—covering feature development, stake sizing, non-stationarity handling, and execution. Claude Opus 4.6 scored highest at 32.6%. Less than a third of available points. On the best model.

Higher sophistication scores significantly predicted lower bankruptcy rates (p = 0.008) and correlated with better overall returns. The models are not failing because the market is unbeatable. They are failing because they are not using what they have.

This fits a pattern. Research published last year found AI models develop something resembling gambling addiction when told to maximize rewards—going bankrupt up to 48% of the time in simulated slot machine tests. A separate real-money crypto trading competition found the same reliability problems over extended periods.

The best-performing model averaged a final bankroll of £89,035—a net loss of £10,965 on a normalized £100,000 starting stake. Gradient boosting, fractional Kelly staking, months of Premier League football, state of the art performance… all just to get rekt.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source: https://decrypt.co/364406/ai-beat-sports-betting-market-top-8-models

Opportunità di mercato
Logo Audiera
Valore Audiera (BEAT)
$0.49114
$0.49114$0.49114
+9.50%
USD
Grafico dei prezzi in tempo reale di Audiera (BEAT)
Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta crypto.news@mexc.com per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.

Potrebbe anche piacerti

CME Group to launch Solana and XRP futures options in October

CME Group to launch Solana and XRP futures options in October

The post CME Group to launch Solana and XRP futures options in October appeared on BitcoinEthereumNews.com. CME Group is preparing to launch options on SOL and XRP futures next month, giving traders new ways to manage exposure to the two assets.  The contracts are set to go live on October 13, pending regulatory approval, and will come in both standard and micro sizes with expiries offered daily, monthly and quarterly. The new listings mark a major step for CME, which first brought bitcoin futures to market in 2017 and added ether contracts in 2021. Solana and XRP futures have quickly gained traction since their debut earlier this year. CME says more than 540,000 Solana contracts (worth about $22.3 billion), and 370,000 XRP contracts (worth $16.2 billion), have already been traded. Both products hit record trading activity and open interest in August. Market makers including Cumberland and FalconX plan to support the new contracts, arguing that institutional investors want hedging tools beyond bitcoin and ether. CME’s move also highlights the growing demand for regulated ways to access a broader set of digital assets. The launch, which still needs the green light from regulators, follows the end of XRP’s years-long legal fight with the US Securities and Exchange Commission. A federal court ruling in 2023 found that institutional sales of XRP violated securities laws, but programmatic exchange sales did not. The case officially closed in August 2025 after Ripple agreed to pay a $125 million fine, removing one of the biggest uncertainties hanging over the token. This is a developing story. This article was generated with the assistance of AI and reviewed by editor Jeffrey Albus before publication. Get the news in your inbox. Explore Blockworks newsletters: Source: https://blockworks.co/news/cme-group-solana-xrp-futures
Condividi
BitcoinEthereumNews2025/09/17 23:55
Zelenskyy warns Russia aims to involve Belarus in Ukraine conflict

Zelenskyy warns Russia aims to involve Belarus in Ukraine conflict

The post Zelenskyy warns Russia aims to involve Belarus in Ukraine conflict appeared on BitcoinEthereumNews.com. Zelenskyy said Russia is trying to draw Belarus
Condividi
BitcoinEthereumNews2026/04/18 11:12
Bitcoin, Gold, and U.S. Stocks Dive as Trump Pledges to Hit Iran ‘Extremely Hard’

Bitcoin, Gold, and U.S. Stocks Dive as Trump Pledges to Hit Iran ‘Extremely Hard’

The post Bitcoin, Gold, and U.S. Stocks Dive as Trump Pledges to Hit Iran ‘Extremely Hard’ appeared on BitcoinEthereumNews.com. In brief Bitcoin dropped Thursday
Condividi
BitcoinEthereumNews2026/04/02 17:57

USD1 Genesis: 0 Fees + 12% APR

USD1 Genesis: 0 Fees + 12% APRUSD1 Genesis: 0 Fees + 12% APR

New users: stake for up to 600% APR. Limited time!