Sonnet Code
El Blog de Sonnet Code · Página 9

Apuntes de ingeniería desde el terreno.

Ensayos y notas de campo sobre IA, ingeniería de software, diseño y el oficio de construir equipos de producto que entregan. Escrito por los ingenieros que hacen el trabajo. Publicaciones en inglés.

AI Development10 min read

Anthropic Shipped Self-Hosted Sandboxes and MCP Tunnels for Claude Managed Agents on May 19. The Enterprise Hybrid-Orchestration Pattern Just Became a Real Product — and the "Our Data Can't Leave the VPC" Blocker That Killed Three of Your Agent Pilots Last Year No Longer Applies.

On May 19, 2026 — the second day of Anthropic's Code with Claude London event — Anthropic shipped two infrastructure features for Claude Managed Agents that quietly close the largest enterprise deployment gap in the agentic-AI space: Self-Hosted Sandboxes (public beta) move tool execution to the customer's own infrastructure (their VPC, or a managed provider like Cloudflare, Daytona, Modal, or Vercel) while keeping the agent orchestration loop on Anthropic's; MCP Tunnels (research preview) let those agents reach MCP servers inside the customer's private network through a single outbound gateway, with no inbound firewall rules and no public endpoints. The structural read isn't "Anthropic added two features." It's that the deployment pattern most large enterprises have wanted since the first wave of agentic-AI pilots — orchestration on the vendor's plane, sensitive data and tool execution on the customer's plane, with a real cryptographic boundary between them — is now a generally available product, on the same week the EU AI Act's high-risk deployer obligations are sixty days from going live. Here's what "hybrid orchestration" actually means at the architecture layer, why the agent pilots you killed in 2024 for security reasons are now buildable, and what to put in front of your platform team this quarter so the procurement conversation that starts next month doesn't catch you flat-footed.

Sonnet Code Editorial Team · 3 de junio de 2026
Developer Tools10 min read

Microsoft Shipped MAI-Code-1-Flash at Build 2026 — Its First Coding Model Trained Without OpenAI Data, Now Rolling Out Inside GitHub Copilot. The "Hyperscalers Rent Frontier Capability" Assumption Just Officially Died — and the Multi-Provider Routing Layer You've Been Deferring Is the Q3 Project You No Longer Get to Skip.

At Microsoft Build 2026 on June 2, Satya Nadella unveiled seven new in-house MAI models — including MAI-Code-1-Flash, a 5-billion-parameter coding model already rolling out inside GitHub Copilot and Visual Studio Code. The headline benchmark: MAI-Code-1-Flash outperforms Claude Haiku 4.5 across all four core coding benchmarks tested, with a 16-point lead on SWE-Bench Pro (51.2% vs. 35.2%) and the ability to solve harder coding tasks with up to 60% fewer tokens on SWE-Bench Verified. None of the seven MAI models — coding, reasoning, image, voice, transcription — were trained on OpenAI data. The structural read isn't "Microsoft built a small fast coding model." It's that the largest single customer of OpenAI's API just announced, in front of the entire developer ecosystem, that it is no longer dependent on OpenAI to ship Copilot-tier coding capability — which means every other organization that's been deferring its own multi-provider routing layer because "the hyperscaler will handle it" just lost the cover story. Here's what changes about how AI coding spend gets architected when the hyperscaler stops being a passthrough and starts being one supplier among many.

Sonnet Code Editorial Team · 3 de junio de 2026
AI Development10 min read

The EU AI Act's High-Risk Deployer Deadline Is August 2, 2026 — Sixty Days Away. Audit Trails, Human Oversight, and Incident Reporting Are About to Stop Being Best Practices and Start Being Legally Binding. Most Teams Building Agentic AI Are Not Ready.

On August 2, 2026 — sixty days from now — the EU AI Act's high-risk deployer obligations (Articles 8–17, 26, 27, and 73) become legally binding for any organization deploying a high-risk AI system to users in the EU, with penalties up to €15M or 3% of worldwide turnover for non-prohibited violations and €35M or 7% for prohibited practices. The obligations are concrete: human oversight tooling that lets a qualified person intervene before or after a model action; automated event logging retained for the lifetime of the system; risk and quality management systems documented per ISO/IEC 42001; serious-incident reporting to the relevant national authority within 15 days; conformity assessments before market placement. The standard reading is that this only matters to the high-risk categories — credit scoring, hiring, critical infrastructure, education, law enforcement. The structural reading is that the *engineering discipline the regulation demands* — auditable agent traces, real human-in-the-loop checkpoints, eval harnesses that grade safety properties, incident pipelines that work at the speed of a 15-day deadline — is exactly the discipline every serious AI deployment needs anyway, and the organizations that get the EU-mandated version of it stood up by August 2 are the ones that have, by accident, built the operating model for safe agentic deployment across every other jurisdiction that will follow. Here's what "compliance-ready" actually requires at the engineering layer, and why the teams treating it as a paperwork exercise are the teams that will pay for it the hardest.

Sonnet Code Editorial Team · 2 de junio de 2026
AI Development10 min read

Anthropic Confirmed Mythos Will Roll Out to All Customers "in the Coming Weeks." The Frontier Tier Just Stopped Being a 50-Customer Beta — and Every Production Eval Harness Now Needs a New Top Row.

When Anthropic shipped Opus 4.8 on May 28, 2026 the headline was a $965B valuation and a $47B run-rate. The line that mattered more, buried near the end of the announcement, was that Mythos-class models will be made available to all customers "in the coming weeks." Mythos has been live since April 7 under Project Glasswing — twelve founding partner organizations plus roughly forty vetted critical-infrastructure operators — priced at $25/M input and $125/M output, scoring 93.9% on SWE-bench Verified, 97.6% on USAMO 2026, and 55.3 points above Opus 4.6 on the same olympiad. For two months the model has been a closed enterprise product with a vetting gate; in the next two months it becomes a standard option on the API. The structural read isn't "a smarter model is shipping." It's that the gap between the model your team actually evaluates against and the most capable model that exists for your workload is about to close — for the first time in eighteen months — and the eval harness, the routing policy, and the human-review queue all need a new top row to handle a model whose failure modes are different in kind, not just in rate, from the Opus tier underneath it. Here's what to set up this month so the rollout doesn't catch your stack flat-footed.

Sonnet Code Editorial Team · 2 de junio de 2026
Developer Tools10 min read

GitHub Copilot Flipped to Usage-Based Billing on June 1. Power Users' Agentic Bills Are Spiking 10–50×, and "Cost per Successful Task" Just Stopped Being an Engineering Curiosity and Became a Finance-Team Line Item.

On June 1, 2026 every paid GitHub Copilot plan moved to usage-based billing — Pro at $10/mo, Pro+ at $39/mo, Business at $19/seat, Enterprise at $39/seat all keep their headline price, but every interaction beyond simple completions and Next Edit suggestions now meters against a monthly GitHub AI Credit allotment at 1 credit = $0.01, billed by tokens at the listed API rates of whichever model the request routed to. The fallback model is gone. Developers running heavy agentic sessions are reporting cost increases of 10× to 50× versus the flat-rate era, and the comments on GitHub's own announcement thread tell the story: a workflow that cost $39/mo in April is metering into the hundreds of dollars per developer per month in June. The structural read isn't "Copilot got more expensive." It's that the AI-coding-tools industry just did what every cloud-infrastructure category did before it — moved from flat-rate to consumption-based pricing — and the financial-operations discipline that grew up around AWS spend over the last fifteen years is now the discipline every engineering organization needs to apply to its agentic-coding spend, this quarter. Here's what changes about how AI dev work gets budgeted, measured, and routed when the per-task cost is suddenly visible on the line item.

Sonnet Code Editorial Team · 2 de junio de 2026
AI Development10 min read

Subquadratic Emerged From Stealth With $29M, a 12M-Token Context Window, and 300× Cost-Reduction Claims on RULER 128K. The Quadratic-Attention Tax Era of LLM Economics May Be Ending — and Architectural Lock-In Is Suddenly a Real Category of Risk Again.

On May 5, 2026 a stealth lab called Subquadratic emerged with $29M in seed funding (Javier Villamizar of the former SoftBank Vision Fund, Justin Mateen of JAM, and early backers of Anthropic, OpenAI, Stripe, and Brex) and a frontier model — SubQ — built on what they describe as a truly subquadratic sparse-attention architecture rather than the dense quadratic-attention Transformer that has defined the cost curve of every frontier LLM for the last seven years. The published claims: 95% accuracy on RULER 128K at $8 per evaluation versus Claude Opus at 94% and roughly $2,600 (a 300× cost reduction), 52× faster than FlashAttention at 1M tokens, ~1,000× less compute at 12M tokens, and a 12M-token context window shipped in production with a CLI coding agent (SubQ Code) and a search product on top. None of the headline numbers are independently reproduced yet — the weights are closed, the training details are private, and the eval surface is narrow. The structural read isn't 'one new model.' It's that if the architectural premise pans out on any meaningful workload, the entire cost calculation behind 'what's worth attempting at scale?' changes, and a category of risk most teams stopped worrying about in 2024 — architectural lock-in — is suddenly real again. Here's what to watch for over the next 90 days, what changes for your roadmap if the claims hold, and what to do about a stack you've been building on the assumption that quadratic attention is a permanent fact.

Sonnet Code Editorial Team · 1 de junio de 2026