SONNET CODE
← Back to all articles
AI DevelopmentJune 29, 2026·10 min read

DeepSeek V4 Ships Open Weights at 1/20 the Cost of Opus 4.7

What DeepSeek V4 actually ships and the cost-tier substrate the FY27 routing matrix grades against

The China-frontier-track convergence that anchored Q1 2026 — six Chinese frontier labs shipping inside the same two-week window — settled into a single load-bearing entry inside Q2: DeepSeek V4, shipped open-weights under the MIT license in two variants — V4 Pro at 1.6T total / 49B active parameters and V4 Flash at 284B total / 13B active parameters — both supporting a 1 million token context window by default. The pricing landed at $1.74 per million input tokens and $3.48 per million output tokens at the Pro tier (Flash at $0.14/$0.28), against the $5/$25 Opus 4.7 standard tier the FY27 standing-contract slot has been anchored against — roughly 1/20 the per-output-token cost at the production-grade frontier band the regulated-industry buyer was reading against the per-token-budget cap the FY27 plan was written against.

The operationally important pieces:

  • The cost-tier substrate just collapsed by a factor of 20 against the per-output-token band the FY27 standing contract was written against. The standing-contract per-output-token cap the FY27 plan encoded against the Opus 4.7-class frontier band ($25 per million output tokens) was the cap the per-workload model-routing policy ran the per-prompt routing decision against. The DeepSeek V4 Pro entry at $3.48 per million output tokens against the same accuracy band on the verifier-shape workloads the team grades against (math, code-emission, structured extraction, schema compliance) collapses the cost-tier substrate by 7-20x against the same per-workload-grade surface. The honest read isn't the per-workload routing decision moves entirely to DeepSeek; it is the per-output-token cost-tier the routing policy grades against just shifted enough that the per-workload routing decision against the verifier-shape workload band has a per-token-cost arbitrage the standing contract was not written against.
  • Open-weights under MIT closes the on-prem and fine-tuning slot the closed-weights frontier could not fill. The MIT license — the most permissive of the standard open-source licenses — closes the on-premises deployment, regulated-industry-buyer-controlled inference substrate, and per-workload fine-tuning slot the closed-weights frontier could not fill against the regulated-industry compliance surface. The compliance committee that could not underwrite the closed-weights frontier against the per-environment data-residency requirement the FY27 production-deployment plan was written against now has a frontier-band open-weights substrate inside the per-environment compliance envelope; the per-workload fine-tuning slot the AI-training-services engagement runs against now has an open-weights frontier-band base model the standing fine-tuning contract can be written against.
  • The 1.6T-parameter MoE architecture with 49B active is the production-inference shape the standing-contract per-token-cost-grading runs against. A 1.6T-total / 49B-active mixture-of-experts shape is the substrate the per-token-inference-cost-curve grades against — the per-token cost the team's standing-contract inference-cost-cap underwrites is bounded by the per-token compute against the 49B-active fraction, not against the 1.6T-total parameter count. The per-token-cost-curve the FY27 standing contract was written against was the per-token-cost-curve at the closed-weights inference substrate; the per-token-cost-curve at the open-weights MoE inference substrate against the 49B-active fraction is the per-token-cost-curve the FY27 plan has to re-encode the per-token-budget cap against.
  • The Huawei-chip integration is the regulatory-and-supply-chain signal the per-workload procurement diligence has to encode against. DeepSeek V4 is engineered against close integration with Huawei's chip stack — the supply-chain surface the US-government-export-control regime grades against. The honest read is the open-weights MIT-licensed substrate is portable across the per-environment inference substrate the team chooses to run against (the per-environment inference substrate the regulated-industry buyer chooses runs the open-weights substrate against the buyer-controlled hardware, not against the Huawei integration the model was trained against). The per-workload-procurement diligence has to encode the per-environment inference-substrate decision against the regulatory-and-supply-chain surface explicitly; the substrate is portable, but the per-environment inference-substrate decision is the artifact the compliance committee underwrites against, not the model-training supply-chain surface.

The structural read isn't DeepSeek V4 replaces the closed-weights frontier. It is that the cost-tier substrate against the verifier-shape production workload band just collapsed by 1/20 against the per-output-token cap the FY27 standing contract was written against, the open-weights MIT-licensed surface closes the on-prem-and-fine-tuning slot the closed-weights frontier could not fill, and the FY27 per-workload model-routing matrix has to encode the cost-tier-and-open-weights slot the substrate just landed inside, not the per-prompt routing policy the closed-weights frontier band the standing contract anchored against.

What the open-weights cost-tier substrate restructures about FY27 model-routing procurement

Four concrete shifts that follow when the per-token-cost-curve collapses by 1/20 against the verifier-shape production workload band.

The per-prompt model-routing matrix adds a per-workload cost-tier-and-open-weights slot, with the closed-weights frontier band routing only the workload classes the cost-tier arbitrage does not survive. The closed-weights frontier band the FY27 plan anchored against — Opus 4.7-class, GPT-5.5-class, Gemini 3.5 Flash-class — was the per-workload routing-policy default against every workload class the team's coding-agent loop runs against. The open-weights cost-tier slot inverts the default against the verifier-shape production workload band (math, code-emission, structured extraction, schema compliance, format-following — the workload classes the deterministic verifier closes the per-output accuracy gap against): the default routes to the open-weights cost-tier slot, and the closed-weights frontier band carries the per-workload escalation path for the verifier-coverage-gap workloads the open-weights substrate cannot underwrite. The team that ships the per-prompt routing policy update against the verifier-shape workload band buys the 1/20 per-output-token cost-tier arbitrage; the team that defers the update pays the per-token-cost overhead the prior standing-contract per-token-cap did not provision against.

The on-prem-and-fine-tuning slot becomes the load-bearing standing-vendor slot the FY27 AI-training engagement runs against. The closed-weights frontier band the FY27 AI-training engagement was written against could not underwrite the per-environment data-residency, per-workload fine-tuning, or per-buyer-controlled inference substrate slot the regulated-industry buyer's compliance committee requires the AI-training engagement to run against. The open-weights MIT-licensed substrate at the frontier band closes the slot — the per-workload fine-tuning engagement against the production-workload verifier coverage map can be written against the open-weights base model, the per-environment inference substrate can be written against the buyer-controlled hardware, and the per-environment data-residency surface can be written against the per-environment inference deployment. The standing AI-training-services contract that was scoped against the closed-weights frontier band needs the per-workload open-weights fine-tuning slot added as a first-class line item; the team that re-scopes the contract this quarter takes the open-weights-fine-tuning workload-class slot the AI-training engagement carries the per-cycle revenue against.

The per-environment inference-substrate decision becomes the new per-workload procurement diligence artifact the compliance committee underwrites against. The closed-weights frontier band's per-token inference substrate was the closed-weights vendor's hosted-API surface — the per-environment inference-substrate decision was trivial (call the API). The open-weights MIT-licensed substrate at the frontier band makes the per-environment inference-substrate decision load-bearing — the substrate is portable across the team's choice of hosted inference (Anyscale, Fireworks, Together), the buyer-controlled on-prem inference, the per-environment-controlled edge inference, and the per-workload-grade-controlled fine-tuned inference deployment. The per-environment-inference-substrate decision is the new per-workload procurement diligence artifact the compliance committee underwrites the FY27 standing contract against; the team that ships the per-environment inference-substrate decision implicit ships the compliance-committee-defensibility gap the per-environment substrate decision should have surfaced.

The standing-contract per-token-budget cap re-encodes against the per-workload-cost-curve, not against the per-prompt-cost-cap the closed-weights frontier band anchored. The per-prompt-cost-cap the FY27 standing contract was anchored against was the per-token-cost cap at the closed-weights frontier band — a single number against a single substrate. The per-workload-cost-curve the per-prompt-routing policy runs against the open-weights cost-tier-and-frontier-band substrate is the load-bearing per-workload-cost-grading artifact — a per-workload-band number against a per-workload-band substrate, with the per-prompt routing policy grading against the per-workload-band cost-curve, not against the per-prompt-cost cap. The team that re-encodes the standing-contract per-token-budget cap against the per-workload-cost-curve buys the per-workload-cost-tier arbitrage the substrate provides; the team that defers the re-encoding ships the FY27 standing-contract gap the closed-weights-per-prompt-cost-cap has already surfaced against the substrate.

Where the open-weights cost-tier substrate is signal and where it is noise

Four honest reads on what DeepSeek V4 actually tells the buyer at the FY27 model-routing diligence review.

Signal: the per-workload cost-tier arbitrage on the verifier-shape production workload band is the substrate's load-bearing operational property. The 1/20 per-output-token cost-tier collapse against the verifier-shape production workload band is the operational property the per-workload routing policy translates into the per-workload-cost-tier arbitrage. The team that ships the per-prompt routing policy update against the verifier-shape workload band buys the per-workload cost-tier arbitrage at the FY27 standing-contract per-token-budget cap; the team that does not ship the update ships the per-token-cost overhead the closed-weights frontier band was anchored against.

Signal: the MIT license is the on-prem-and-fine-tuning slot's load-bearing portability artifact, not the marketing-surface signal. The MIT license is the most permissive of the standard open-source licenses — the per-environment deployment, per-workload fine-tuning, per-buyer-controlled inference substrate surface is open against the substrate the licensee chooses to deploy against. The compliance committee that could not underwrite the closed-weights frontier band's hosted-API surface against the per-environment data-residency requirement has an underwriteable open-weights substrate at the frontier band; the per-environment inference-substrate decision the substrate makes load-bearing is the artifact the compliance committee grades against, and the artifact the per-environment-inference-substrate diligence runs against.

Noise: the per-prompt-cost-cap on the per-prompt routing policy is not the per-workload-cost-curve the substrate grades against. The per-prompt-cost-cap on the standing-contract per-token-budget cap was the closed-weights-frontier-band-anchored per-token cost cap; the per-workload-cost-curve the open-weights substrate grades against is the per-workload-band cost curve, with the per-prompt routing policy running against the per-workload-band cost curve, not against the per-prompt-cost cap. The team that runs the substrate against the per-prompt-cost-cap misreads the per-workload-cost-curve the substrate grades against and pays the per-prompt-cost overhead the per-workload-cost-curve translates against; the team that runs against the per-workload-cost-curve translates the per-workload cost-tier arbitrage into the per-workload-band-routing decision the substrate underwrites.

Noise: the Huawei-chip-integration supply-chain signal is not a per-environment inference-substrate disqualifier. The Huawei-chip-integration supply-chain signal is the model-training supply-chain signal — the per-environment inference substrate is portable across the per-environment-inference hardware the team chooses to deploy against. The honest read is the open-weights substrate is portable, the per-environment inference-substrate decision is the artifact the compliance committee underwrites against, and the per-workload-procurement diligence encodes the per-environment inference-substrate decision against the regulatory-and-supply-chain surface explicitly — not the substrate is disqualified against the per-environment inference-substrate slot the regulated-industry buyer's compliance committee underwrites against.

What the engineering team should do inside the next quarter

Four concrete actions that close the gap between the open-weights cost-tier substrate and the FY27 per-prompt routing policy the substrate restructures against.

Update the per-prompt routing policy against the verifier-shape production workload band with the open-weights cost-tier-and-frontier-band slot as the default-route inside the next sprint. The routing-policy artifact in the team's repo is the load-bearing artifact the per-workload cost-tier arbitrage lands inside. Update the default-route against the verifier-shape production workload band (math, code-emission, structured extraction, schema compliance, format-following) from the closed-weights frontier band to the open-weights cost-tier-and-frontier-band slot, write the per-workload-class escalation path against the closed-weights frontier band for the verifier-coverage-gap workloads, and ship the policy with the per-cycle review cadence the next substrate-cycle re-validates against. The team that ships the routing-policy update inside the next sprint translates the 1/20 per-output-token cost-tier arbitrage into the per-prompt-cost surface; the team that defers the update ships the per-token-cost overhead the prior standing-contract per-token-cap did not provision.

Run a per-environment inference-substrate diligence inside the next two weeks against the per-workload-cost-curve and the regulatory-and-supply-chain surface. The per-environment inference-substrate decision is the new per-workload-procurement diligence artifact. For each of the team's top-three verifier-shape production workload classes, run the per-environment inference-substrate diligence input — hosted-inference vendor map (Anyscale, Fireworks, Together), buyer-controlled on-prem inference cost curve, per-environment-controlled edge inference latency surface, per-workload-grade fine-tuned inference deployment cost-tier — against the per-workload-cost-curve and the regulatory-and-supply-chain surface. The artifact the diligence produces is the per-environment inference-substrate decision map the FY27 standing contract underwrites against; the team that runs the diligence this quarter takes the per-environment inference-substrate-decision-shaped procurement decision against the per-workload-cost-curve.

Re-scope the standing AI-training-services contract against the per-workload open-weights fine-tuning slot inside the next quarter. The standing AI-training-services contract that was scoped against the closed-weights frontier band needs the per-workload open-weights fine-tuning slot added as a first-class line item. Re-scope the contract against the per-workload-fine-tuning-cycle cadence (per-quarter base-model refresh, per-quarter verifier-coverage-gap-curation cycle, per-quarter per-workload fine-tuning deployment), the per-workload-cost-tier on the per-workload fine-tuning compute budget, the per-environment fine-tuning deployment surface (per-environment data-residency, per-environment inference-substrate, per-environment per-workload-grade routing), and the per-output verification-trace artifact the per-workload fine-tuning cycle grades against. The team that re-scopes the contract this quarter takes the open-weights fine-tuning workload-class slot the per-cycle AI-training engagement carries the revenue against; the team that does not re-scope pays the per-cycle per-workload fine-tuning slot gap the standing contract was not written against.

Re-encode the standing-contract per-token-budget cap against the per-workload-cost-curve, not against the per-prompt-cost-cap the closed-weights frontier band anchored against. The standing-contract per-token-budget cap re-encodes against the per-workload-cost-curve. Write the per-workload-cost-curve as a first-class diligence input — per-workload-band per-output-token cost cap, per-workload-band per-input-token cost cap, per-workload-band per-workload-grade routing-policy cost surface, per-workload-band per-environment inference-substrate cost-tier — and ship the per-workload-cost-curve as the artifact the FY27 standing-contract negotiation grades against, not the per-prompt-cost cap the closed-weights frontier band was anchored against. The team that re-encodes the standing-contract per-token-budget cap against the per-workload-cost-curve buys the per-workload cost-tier arbitrage the substrate provides; the team that defers ships the per-cycle per-token-cost overhead the per-workload-cost-curve translates against.

The senior-judgment work the open-weights cost-tier substrate makes operationally cheap but does not replace

The open-weights cost-tier substrate compresses the cost of paying the closed-weights frontier band per-output-token cap against the verifier-shape production workload band, running the per-environment inference-substrate decision against the closed-weights hosted-API surface, scoping the standing AI-training engagement against the closed-weights frontier band's no-fine-tuning surface, and grading the per-prompt routing policy against the per-prompt-cost cap the closed-weights frontier band anchored. It does not compress the senior judgment of deciding which workload classes are verifier-shape-and-open-weights-routable and which are not, writing the per-workload verifier against the open-weights substrate the per-prompt routing policy grades against, owning the per-environment inference-substrate decision the compliance committee underwrites against, and re-scoping the standing AI-training engagement against the per-workload open-weights fine-tuning slot the substrate makes load-bearing. The teams that confuse the cheapened per-token cost for the cheapened judgment are the teams that route the per-workload free-form-generation surface against the open-weights substrate the verifier coverage gap does not close, and read the per-cycle production-reliability post-mortem on the routing-policy gap the per-workload verifier coverage map would have surfaced. The teams that keep the senior judgment at the center of the per-prompt routing-policy decision are the teams that translate the 1/20 per-output-token cost-tier arbitrage into the per-workload-cost-tier surface the closed-weights frontier band could not produce. The substrate is the leverage; the senior judgment is the load-bearing wall.

The model-routing question is no longer which closed-weights frontier-band vendor anchors the standing contract; it is which verifier-shape production workload classes the open-weights cost-tier substrate is the default-route for, which per-environment inference-substrate decision the compliance committee underwrites, which per-workload open-weights fine-tuning slot the standing AI-training engagement runs against, and which per-workload-cost-curve the standing-contract per-token-budget cap re-encodes against. The teams that ask the right question this quarter translate the 1/20 per-output-token cost-tier arbitrage into the per-workload-cost-tier surface; the teams that ask the wrong one ship the per-cycle per-token-cost overhead to the FY28 procurement review against a closed-weights-frontier-band per-prompt-cost cap the substrate has already collapsed.


At SONNET CODE we run the AI Development and AI Training engagements against the open-weights cost-tier substrate — per-prompt routing-policy updates against the verifier-shape production workload band, per-environment inference-substrate diligence against the per-workload-cost-curve, and per-workload open-weights fine-tuning cycles against the per-output verification-trace artifact. If your team's per-prompt routing policy is still anchored against the closed-weights frontier band per-token-cost cap, schedule a call — we'll walk you through the per-workload cost-tier-and-open-weights routing-matrix update we ship inside one sprint.