When 29 Out of 31 AI Traders Lose Money: What the Benchmark Really Tells Us

2025-12-27 07:00

Written by:Gianni Rossi
When 29 Out of 31 AI Traders Lose Money: What the Benchmark Really Tells Us
⚠ Risk Disclaimer: All information provided on FinNews247, including market analysis, data, opinions and reviews, is for informational and educational purposes only and should not be considered financial, investment, legal or tax advice. The crypto and financial markets are highly volatile and you can lose some or all of your capital. Nothing on this site constitutes a recommendation to buy, sell or hold any asset, or to follow any particular strategy. Always conduct your own research and, where appropriate, consult a qualified professional before making investment decisions. FinNews247 and its contributors are not responsible for any losses or actions taken based on the information provided on this website.

When 29 Out of 31 AI Traders Lose Money: What the Benchmark Really Tells Us

If you opened your portfolio today and saw a painful drawdown, you might take a bit of comfort from an unexpected source: most AI traders are doing even worse.

In a recent public benchmark, more than thirty large language models and AI systems were given the same initial capital and asked to trade Bitcoin derivatives under identical conditions. After a little more than a month, the leaderboard looks brutal. Out of 31 models, only two are in profit. The rest are sitting on losses that would make any human trader deeply uncomfortable – many between −70% and −90%. One high-profile model associated with a well-known tech billionaire is down almost −98%.

On paper, these are some of the most advanced AI models available today. They have read vast amounts of financial commentary, academic research, and historical data. Yet, placed into a live, noisy market, they mostly behave like any overconfident beginner: they trade too often, use too much leverage, and underestimate how quickly small mistakes compound.

This article does not exist to mock the models. Instead, we will take the benchmark seriously and ask a deeper question: what does this experiment actually teach us about AI, markets, and the realistic role of machine intelligence in trading?

1. The Benchmark in Plain Language: Lots of Brainpower, Very Little Alpha

The screenshots from the benchmark tell a simple but important story. Each AI model starts with a similar account value and is allowed to trade Bitcoin perpetual futures over several weeks. The interface tracks metrics such as:

  • Current account value
  • Total profit and loss in dollars
  • Percentage return since the start
  • Number of trades and win rate
  • Largest individual gain and largest loss

By late December, the ranking table shows that only one “Mystery-M” model and a “GPT (Max Leverage)” configuration are slightly above water, with returns around +12% and +1.5% respectively. Everything else is negative, often severely:

  • Several well-known models from different providers show losses between −65% and −75%.
  • One configuration of Grok shows a drawdown of about −97.8%, essentially wiping out its starting capital.
  • Chinese model DeepSeek, despite its reputation, is still slightly negative but fares comparatively well, sitting near the top of the table with a small single-digit percentage loss.

Another chart plots the account value of each AI model against the Bitcoin price. The result is telling: while Bitcoin itself has moved sideways with bursts of volatility, the equity curves of most models trend steadily downward. In other words, the market has been choppy but not catastrophic; the disaster mostly comes from the way the models trade it.

This immediately leads to our first key insight.

2. Lesson One: AI Is Not a Shortcut Around Market Noise

There is a persistent myth in retail communities: if only we had access to "real" AI like the big firms, we could simply plug it into an exchange and watch it print money. The benchmark is a useful antidote to that fantasy.

Large language models are extraordinarily good at processing and generating text. They can summarize research reports, explain complex derivatives, or simulate different risk scenarios. But that strength does not automatically translate into robust, live-trading performance. Financial markets, especially highly liquid instruments like Bitcoin futures, are shaped by:

  • Microsecond-level interactions between market makers and arbitrageurs
  • Hidden order flow from large participants
  • Sudden regime changes driven by macro news or policy expectations
  • Transaction costs, slippage, and funding payments that compound over time

Most of this is not visible in the text data that large language models are trained on. When you give such a model direct control over a live derivatives account, you are asking it to operate in a domain that is far noisier, faster, and more adversarial than its training environment. The benchmark results simply show that – left on autopilot – the models struggle to filter signal from noise in this setting.

3. Lesson Two: The Regime Matters More Than the Narrative

The benchmark period from late November to late December has not been a straight bull or bear trend. Bitcoin has experienced volatile swings, but the net price change over the period is modest compared with earlier phases of the cycle.

Sideways and choppy markets are notoriously difficult for systematic strategies. Trend-following systems get chopped up by frequent reversals; mean-reversion strategies can be caught by sudden breakouts. Human traders feel this pain; AI models are no exception.

Look at the equity curves on the second chart:

  • At the start, many AI accounts sit near the 10,000 USD mark.
  • Within just a few days of volatile but directionless trading, equity begins to diverge downward.
  • By mid-December, several models have lost nearly half their capital; a few fall off a cliff after a series of rapid losses.

This pattern is a classic signature of strategies that are not calibrated for the current regime. Whatever edge those models may have had in backtests is overwhelmed by a live environment where volatility spikes, liquidity shifts, and funding rates move quickly.

The takeaway is simple but often ignored: no model is universally strong. An AI strategy that performs well in trending conditions can suffer badly when the market turns sideways. Without an adaptive overlay – or a human supervisor willing to turn it off – the strategy simply keeps trading the wrong regime.

4. Lesson Three: Risk Management Beats Raw Intelligence

Another striking element from the leaderboard is the win rate. Most models show win rates between 28% and 38%. At first glance, that does not look catastrophic. In theory, a strategy can be profitable even with a low win rate if its average gain per winning trade is much larger than its average loss.

Yet the P&L tells a different story: despite win rates around one-third, the majority of models are sitting on very large net losses. That suggests a few possible problems:

  • Position sizing may be overly aggressive. A small number of large losing trades can easily erase the gains from dozens of small winners.
  • Use of leverage may magnify volatility. Even if each individual trade is reasonable, leverage multiplies the impact of ordinary price swings.
  • Stop-loss or exit logic may be inconsistent. Models may hold onto losing positions for too long, turning manageable drawdowns into account-level damage.

In other words, the problem is not just “intelligence” in the sense of predicting price direction. It is how that intelligence is wrapped in risk rules. A very simple strategy with conservative position sizing and predefined maximum loss per day can survive long periods of poor prediction. A sophisticated model with weak risk controls can destroy its account in a few bad hours.

The fact that only two models are in profit – despite many of them having decent win rates – is a powerful reminder: in trading, risk management is not a nice add-on, it is the core product.

5. Lesson Four: Backtests and Leaderboards Are Not the Same as Live Survival

There is another subtle lesson in the numbers. Several of the underperforming models are from organisations that publish strong research, impressive benchmarks on reasoning tasks, and polished marketing material. It is safe to assume that these models were not thrown into the trading arena completely naïve; their creators likely tuned prompts, decision rules, and risk parameters based on historical data.

Yet once the benchmark went live, most of that apparent edge evaporated. Why?

Because backtests are a much gentler environment than reality. When we design a strategy using historical data, it is easy – often unconsciously – to adapt it to the quirks of that dataset. We pick thresholds that would have worked well in the past, ignore periods that look “weird”, or forget that real-world execution involves slippage and delays. The result is a system that looks impressive on paper but fragile when faced with fresh information.

AI models amplify this problem. They can discover complex patterns in data, but they can also lock onto patterns that are purely accidental. When the market regime changes, those accidental patterns disappear, and the model is left trading ghosts.

The benchmark therefore plays a valuable role: it exposes which ideas survive contact with the present. In that sense, the poor performance of most AI models is actually healthy. It reminds both developers and users that no amount of clever code eliminates the need for slow, disciplined testing and a willingness to admit when a strategy has stopped working.

6. How Humans Should Actually Use AI in Markets

If the benchmark shows that “let the AI trade my account 24/7” is a risky dream, what is the constructive alternative? How can traders – from individual enthusiasts to professional desks – use AI responsibly?

Several roles stand out as both realistic and valuable:

6.1 Research and information processing

AI models are exceptional at digesting large volumes of information: regulatory filings, macroeconomic data releases, project documentation, or long technical reports. They can summarise, translate, highlight inconsistencies, and surface key risks far faster than a human team working alone.

Used this way, AI does not decide which asset to buy or sell. Instead, it acts like a tireless research assistant, helping human decision-makers stay informed and avoid missing important context.

6.2 Scenario analysis and risk communication

Another productive use is to ask AI to describe possible scenarios rather than to choose a single directional bet. For example:

  • What might happen to Bitcoin volatility if funding rates remain elevated for several weeks?
  • How could a surprise policy announcement from a major central bank spill over into crypto derivatives?
  • Which types of portfolios are most exposed to prolonged sideways markets?

These questions encourage the model to think in distributions and narrative pathways, not in single-point predictions. Humans can then decide how – or whether – to act on those scenarios.

6.3 Tools for discipline, not for impulse

Finally, AI can help traders enforce discipline instead of encouraging impulsive decisions. A model can be asked to check whether a new trade idea is consistent with a written strategy, to calculate the impact on portfolio risk, or to simulate worst-case outcomes. It can act as an always-available second opinion that questions emotional decisions rather than amplifying them.

In all these roles, AI is a tool inside a human-designed risk framework, not an autonomous agent with an unrestricted account.

7. What This Means for Everyday Investors

For individual investors looking at the benchmark from the outside, a few practical messages emerge.

7.1 Do not compare your portfolio to unrealistic fantasies

It is common to see social media posts claiming that a new bot or AI model has found a way to “win every day” in crypto derivatives. The benchmark results are a realistic counterweight. If dozens of heavily tuned models, operated under equal conditions, mostly lose money during a difficult regime, it is unlikely that a retail-facing bot has secretly solved market efficiency.

If you are down 30% or 50% during a volatile period, that is painful but not unusual. The key is not to chase losses with ever-higher leverage or impulsive trades. It is to step back, reassess risk, and remember that survival matters more than catching every move.

7.2 Be sceptical of “hands-off” promises

Any product that suggests you can simply hand over your capital to an algorithm and forget about it deserves extra scrutiny. The benchmark shows that even sophisticated AI struggles without ongoing oversight. A healthy approach is to assume that every model will eventually go through a difficult phase and to plan position sizes accordingly.

7.3 Focus on what you can control

You cannot control macro events, liquidity shocks, or how AI models will trade in a benchmark. You can control:

  • Your maximum allocation to high-volatility assets
  • Your use (or avoidance) of leverage
  • Your time horizon and rebalancing rules
  • Your choice to treat AI as a research helper rather than an automatic decision-maker

These levers often have more impact on long-term outcomes than any single predictive model.

8. A Humbling but Healthy Reality Check

The idea of AI traders dominating markets captures the imagination. It is easy to picture fleets of algorithms reading every news headline, reacting in milliseconds, and extracting value from each tiny mispricing. The benchmark results present a more grounded picture: today’s general-purpose AI models are powerful at language, but markets remain stubbornly hard to master.

That does not mean AI has no place in finance. On the contrary, its strengths in analysis, automation, and communication are already transforming research desks and risk teams. But the myth of an effortless “AI trading autopilot” is, at least for now, not supported by live data.

For traders and investors, perhaps the most constructive way to read the leaderboard is this: even the smartest models in the world are learning the same hard lessons humans have faced for decades – that noise dominates the short term, that discipline beats excitement, and that risk management is the real edge.

If sophisticated AI can still lose 70–90% of its account in a few weeks of difficult conditions, then it is perfectly acceptable for a human investor to step back, slow down, and treat capital preservation as a success. Markets will always offer new opportunities. The point is to still be around when those opportunities arrive.

Disclaimer: This article is for educational and analytical purposes only. It does not constitute investment advice, trading guidance, or a recommendation to use any specific AI system or financial product. Digital assets and derivatives carry significant risk and may not be suitable for every investor. Always conduct your own research and consider consulting a qualified professional before making financial decisions.

More from Crypto & Market

View all
The First Trading Day of 2026 Wasn’t a Rally—It Was a Map of Constraints
The First Trading Day of 2026 Wasn’t a Rally—It Was a Map of Constraints

U.S. stocks started 2026 with a modestly higher Dow and S&P 500, a flat Nasdaq, Bitcoin hovering near $90K, gold and silver holding elevated levels, and oil staying under $60. The headline looks calm. The subtext is not: the session revealed a market

Borrowing Against Bitcoin to Buy More Bitcoin: What B HODL’s Move Reveals About the Next Corporate Treasury Playbook
Borrowing Against Bitcoin to Buy More Bitcoin: What B HODL’s Move Reveals About the Next Corporate Treasury Playbook

B HODL’s decision to draw a Bitcoin-backed loan and use the proceeds to buy additional BTC isn’t just a headline about “more accumulation.” It’s a small but telling example of how corporate Bitcoin strategies are shifting from simple holding into str

Tether Starts 2026 With 96,000+ BTC: What a Stablecoin Giant’s Bitcoin Treasury Really Signals
Tether Starts 2026 With 96,000+ BTC: What a Stablecoin Giant’s Bitcoin Treasury Really Signals

Tether reportedly began 2026 holding 96,000+ BTC after a late-December purchase. The headline invites a simple reaction—‘big institutions are buying’—but the more important story is structural: stablecoin issuers are evolving into quasi-treasury inst

How Large Bitcoin Holders Selling Call Options Are Quietly Capping the Rally
How Large Bitcoin Holders Selling Call Options Are Quietly Capping the Rally

Jeff Park, CIO of Procap, argues that a growing number of large Bitcoin holders are selling call options instead of waiting for an explosive price move. This systematic call-writing creates a soft price ceiling: as BTC nears popular strike levels, he

Long Liquidations Back at FTX Levels: What This Bitcoin Flush Really Tells Us
Long Liquidations Back at FTX Levels: What This Bitcoin Flush Really Tells Us

Even after a 30–40% drawdown, traders kept trying to long the dip — until the market delivered a liquidation wave comparable to the FTX collapse of November 2022. We unpack what the latest spike in long liquidations really means, why it says more abo

Bitcoin’s 32% Slide and the Liquidity Trap Forming Below 86,000 USD
Bitcoin’s 32% Slide and the Liquidity Trap Forming Below 86,000 USD

Over just a few weeks Bitcoin has fallen roughly 32% from around 126,000 USD to below 86,000 USD. At the same time, a major spot ETF reportedly saw redemptions of more than 500 million USD while futures open interest grew by about 36,000 BTC with fun