Skip to main content
Back to blog
Guide
AI Models9 min readMay 6, 2026

SubQ by Subquadratic: 12M Context LLM Reviewed (2026)

ByLoïc Jané·Founder, Fleece AI

SubQ by Subquadratic: The First 12M-Token Subquadratic LLM Reviewed (2026)

At a Glance (Updated May 2026): SubQ is a new large language model from Miami startup Subquadratic, launched May 5, 2026 with a $29M seed round. According to SiliconANGLE's launch coverage, SubQ uses a novel Subquadratic Selective Attention (SSA) architecture and claims a 12-million-token context window — roughly 9 million words or 120 books — while running 52× faster than dense attention at 1M tokens. The catch: as VentureBeat reports, benchmarks are self-reported, no public technical paper exists, and researchers are demanding independent proof. This review explains what's real, what's unverified, and what it means for AI agent platforms like Fleece AI.

Table of Contents

Key Takeaways

  • SubQ is the first publicly launched LLM built on a fully subquadratic attention architecture — the company calls it Subquadratic Selective Attention (SSA) — meaning compute scales linearly, not quadratically, with context length.
  • The launch model SubQ 1M-Preview ships a 12-million-token context window (50M planned), which Subquadratic claims is roughly 9 million words or ~120 books in a single context.
  • Performance claims include 52× faster inference than dense attention at 1M tokens, 92.1% needle-in-haystack at 12M tokens, and 83 on MRCR v2 — but as VentureBeat notes, every benchmark is self-reported with no third-party verification or public technical paper.
  • Subquadratic raised $29M seed from investors including Justin Mateen (Tinder co-founder) and early backers of Anthropic, OpenAI, Stripe, and Brex; products in private beta include the API, SubQ Code (coding agent), and SubQ Search.
  • Fleece AI tracks frontier model launches but evaluates new models on real-world agent reliability before adding them to plan tiers — SubQ is not on Fleece AI as of May 2026 and won't be until independent benchmarks confirm the architectural claims.

What Is SubQ?

SubQ is the first frontier large language model from Subquadratic, a Miami-based AI startup that emerged from stealth on May 5, 2026 with a $29M seed round. The launch model — SubQ 1M-Preview — claims to be the first commercial LLM built on a fully subquadratic attention architecture, meaning compute requirements grow linearly with context length instead of quadratically.

According to Subquadratic's launch announcement, the model handles a 12-million-token context window with 92.1% accuracy on needle-in-a-haystack retrieval, runs at 52× the speed of dense attention at 1M tokens, and reduces compute requirements by nearly 1,000× at the full 12M context length. The company has positioned itself as challenging "the mathematical constraint that has defined every major AI system since 2017" — the quadratic cost of self-attention in transformer architectures.

What Is Subquadratic Selective Attention (SSA)?

Subquadratic Selective Attention is the company's proprietary attention architecture. According to Subquadratic's technical post on SSA, SSA scales linearly in both compute and memory with respect to context length — sidestepping the O(n²) cost that makes a 12M-token window prohibitively expensive on standard transformers.

The promise of subquadratic attention is not new — Mamba, RWKV, Hyena, and other linear-attention architectures have explored variants since 2023. What's distinctive about Subquadratic's claim is the combination of (a) full subquadratic scaling, (b) competitive accuracy on long-context retrieval, and (c) frontier-class benchmark performance. If the architecture delivers as advertised, it would be the first time a sub-quadratic model competes with dense transformers across the board — not just in latency-critical niches.

The skepticism in the research community is not about whether subquadratic attention can work — it's about whether this particular implementation matches what's claimed. There is currently no public paper describing SSA in enough detail to reproduce.

Performance Claims (Self-Reported)

Subquadratic has published the following self-reported benchmarks alongside the SubQ launch:

BenchmarkSubQ ClaimContext
Speed at 1M tokens52× faster than dense attentionSelf-reported, no third-party run
Compute reduction at 12M~1,000× lower than frontier modelsSelf-reported
Needle-in-haystack at 12M92.1% accuracySelf-reported
MRCR v283 (claimed +9 points vs OpenAI)Self-reported
Context window12M tokens (50M roadmap)Reported by The New Stack
Funding$29M seedConfirmed by SiliconANGLE
Best ForLong-document QA, codebase understanding, persistent agent memory (if claims hold)Marketing positioning
PricingNot disclosed at launchPrivate beta only

Until independent labs run SubQ on standardized benchmarks, treat every performance number as a Subquadratic claim, not a verified fact.

Need a working AI agent platform today? Fleece AI runs Mistral Medium 3.5, GPT-5.2, GPT-5.4, and Claude Opus 4.6 — all production-tested. Start at fleeceai.app.

Why the Research Community Is Skeptical

Three reasons skepticism is louder than usual for SubQ:

  1. No public technical paper. Frontier model launches typically ship a paper or model card with enough detail for outside researchers to reason about. SubQ launched with a marketing post and benchmark numbers — that is not how the field validates claims.
  2. The Magic.dev precedent. As VentureBeat reminds readers, Magic.dev announced a 100M-token context model (LTM-2-mini) with similar 1,000× efficiency claims in August 2024 and raised ~$500M on the strength of those claims; as of early 2026, there is no public evidence of LTM-2-mini being used outside Magic.
  3. Self-reported benchmarks beating frontier labs. SubQ's claim of beating OpenAI by 9 points on MRCR v2 is the kind of result that, if real, would dominate AI conferences. The default Bayesian prior on extraordinary claims is "wait for replication."

None of this means SubQ isn't real. It means responsible coverage — and responsible model selection for production agents — requires waiting for verification.

SubQ vs Other Frontier Models

DimensionSubQ 1M-PreviewGPT-5.2Claude Opus 4.7Gemini 3.1 Pro
Best ForLong-document workflows (if validated)Agent tool callingReasoning + codingMultimodal long-context
ReleasedMay 5, 2026 (private beta)Public 2025April 16, 20262025+
Context window12M tokens (claimed)1M tokens1M tokens2M tokens
ArchitectureSSA (subquadratic)TransformerTransformerTransformer
Independent verificationNone as of May 2026StandardStandardStandard
Tool calling reliabilityUnknownIndustry-leadingStrongImproving
Fleece AI availabilityNot availablePro tierBusiness tierNot on Fleece AI
Pricing transparencyNot disclosedPublicPublicPublic
Production track recordNone18+ months1+ year (4.x line)1+ year
PricingTBA$/1M input output$5/$25 per 1MPublic

For more on each model, see our reviews of GPT-5.2, Claude Opus 4.7, and Gemini 3.1 Pro.

What 12M Tokens Means for AI Agents

If SubQ's claims hold, a 12M-token context window genuinely changes how AI agents are built. Today, agent platforms work around small windows with Retrieval-Augmented Generation (RAG): documents get chunked, embedded, indexed, and the agent retrieves the top-k chunks at inference. RAG works, but it adds latency, infrastructure (vector DBs, embedders, re-rankers), and brittleness — agents miss context that isn't surfaced by the retriever.

A genuine 12M-token native context could:

  • Drop entire knowledge bases into the prompt at run time, eliminating retrieval brittleness for many use cases.
  • Hold a full quarter of CRM activity for an account in a single agent call — letting a sales agent reason over the actual history rather than a summary.
  • Persist agent memory inline across a long-running session — your manager agent can hold a full project's context without offloading to external state.
  • Replace some multi-agent hierarchies with a single long-context agent — when the tool list is small but the context is huge.

If they don't hold — or if SubQ scales subquadratically only on niche tasks — the field reverts to RAG + transformer baselines and the long-context narrative ages a year.

How Long-Context Models Fit Into Fleece AI

Fleece AI agents already use long-context models in production. Claude Opus 4.7 ships with a 1M-token window on the Business tier; GPT-5.2 supports 1M tokens on the Pro tier. For most workflows — daily digests, multi-app automation, scheduled reporting — 200K to 1M tokens is more than enough.

The use cases where 12M+ would be transformative inside Fleece AI:

  • Codebase-aware coding agents — drop an entire repo in context and reason holistically. Pairs with GitHub automation.
  • Cross-document research — analyze ~120 books or thousands of PDFs in one agent call.
  • Persistent agent memory — replace knowledge files + conversation history with native context.
  • Cross-quarter sales analysis — hold every customer interaction across multiple years inline.

These are real wins — if the model is reliable enough for production. Fleece AI does not add models to plan tiers based on benchmarks alone. Reliability on tool calling, latency on real agent runs, and stability of structured output all matter.

Will SubQ Land on Fleece AI?

Probably not in May 2026. Fleece AI's model selection criteria — which we apply quarterly — require: (a) public availability beyond invite-only beta, (b) independent benchmark verification, (c) tool-calling reliability above 95% on standard agent suites, and (d) a pricing model that maps to our credit system. SubQ as of May 2026 fails at least three of those.

The timeline we'd expect:

  • Q3 2026 — independent labs publish reproductions or refutations of SSA claims.
  • Q4 2026 — public API with disclosed pricing and tool-calling support.
  • Early 2027 — if claims hold and tool-calling is competitive, a Fleece AI evaluation cycle.

Until then, Mistral Medium 3.5, GPT-5.2, GPT-5.4, and Claude Opus 4.7 are the production-tested options for autonomous AI agents on Fleece AI.

FAQ

Is SubQ available to use today?

Only via private beta as of May 2026. According to SiliconANGLE's launch coverage, Subquadratic launched three products into private beta — the API, SubQ Code, and SubQ Search. There is no public pricing or self-serve access at launch.

How does SubQ compare to GPT-5.2 or Claude Opus 4.7?

In context window, SubQ's 12M tokens dramatically exceeds GPT-5.2 and Opus 4.7 (both 1M). On reasoning, tool calling, and production reliability, GPT-5.2 and Opus 4.7 have 12+ months of public track record; SubQ has zero. See the best AI models for workflow automation for the production options.

What is Subquadratic Selective Attention (SSA)?

SSA is Subquadratic's proprietary attention architecture that the company says scales linearly in compute and memory with context length, instead of quadratically. There is no public technical paper describing SSA in enough detail to reproduce as of May 2026.

Should I switch to SubQ for my AI agents?

Not yet. Pre-public-API, pre-independent-verification models are appropriate for research; production agent platforms — including Fleece AI — wait for replication and stability data. Track the situation; don't move workloads until at least one independent lab publishes benchmarks.

Why is the AI research community skeptical of SubQ?

Three reasons: (1) no public technical paper, (2) the precedent of Magic.dev's similar 100M-token claims in 2024 that have not been independently verified, and (3) self-reported benchmarks that beat frontier labs by margins large enough to merit caution. As VentureBeat reports, researchers are openly demanding independent proof.

The Bottom Line

SubQ may turn out to be the most important AI model release of 2026 — or it may turn out to be the most ambitious vaporware since Magic.dev. There is no way to know in May 2026 because the architecture has not been independently verified and the model is not publicly accessible. Production agent platforms — Fleece AI included — will watch closely and add SubQ when the evidence catches up to the claims. For now, the production-tested model lineup is what runs your real workflows.


Related Articles

Build with production-tested models on Fleece AI — Mistral Medium 3.5, GPT-5.2, GPT-5.4, and Claude Opus 4.7 ready today, no invite list.

Ready to delegate your first task?

Deploy your first AI agent in under 60 seconds.

Related articles

SubQ by Subquadratic: 12M Context LLM Reviewed (2026) | Fleece AI