SONNET CODE
← Back to all articles
AI TrainingJuly 4, 2026·8 min read

Rapidata's Online RLHF Compresses Human-Feedback Loops to Hours

What Rapidata's emergence signals about where the RLHF frontier is going

Rapidata emerged from stealth with €7.2 million in seed funding to build what it calls the global human-feedback network — infrastructure that moves human judgment out of the batch-labeling backlog and directly into the training loop. The company's framing is explicit: feedback cycles that previously took weeks or months are reduced to hours or even minutes, and the next major scarcity in AI won't be compute; it will be high-quality human signal. The technical shift the platform anchors on — online RLHF — moves the human-signal collection from a pre-training-run offline artifact to a per-training-step API integration, letting the network read human judgment at the same cadence at which the GPUs read weight updates.

The operationally important reads:

  • Online RLHF collapses the alignment-iteration cycle at the same rate compute collapsed the pre-training cycle. The prior alignment-cycle assumption was offline batch collection of ranked responses, per-quarter labeler-crew mobilization, per-release-cycle RLHF retraining. Online RLHF flips the axis: per-training-step human signal integration, per-week alignment iteration, per-release-candidate live-population re-scoring. The FY27 alignment plan drafted against the batch-cycle assumption is running against an iteration cadence that stops holding by Q4.
  • The load-bearing scarcity axis on frontier alignment shifts from compute to human signal. Compute is a well-priced commodity at the FY27 procurement window — the standing contract on it has been renegotiated three cycles running. High-quality human signal at the online-RLHF cadence is a supply-constrained resource whose price is set by the depth and quality of the labeler-population network, not the API's rate card. The FY27 human-signal contract needs a per-population-quality clause the way the compute contract holds a per-tier-availability clause.
  • RLHF-trained models produce 40% fewer toxic outputs than synthetic-data-only-trained models. The operational read is not toxic-output rate is a nice-to-have safety metric; it is that the workload classes where regulatory-compliance-envelope, brand-safety-envelope, or user-trust-envelope grades against toxic-output rate are workload classes whose FY27 shipping gate closes around the RLHF-envelope-native substrate. The synthetic-data-only-trained substrate does not close the shipping gate on those workload classes.
  • 96% of companies say human-in-the-loop is essential or nice to have; 86% say strictly essential. The market-adoption axis has moved past the do we need HITL debate; the axis is at what cadence, on which workload classes, against which per-population-quality envelope. The FY27 alignment plan whose HITL line item was scoped against quarterly labeler-crew mobilization is running against a per-population-quality-envelope requirement whose cadence lands at online-RLHF, not batch-RLHF.

The structural read isn't a startup raised seed money for an RLHF platform. It is that the RLHF cadence axis just moved from batch-quarterly to online-per-step, high-quality human signal replaces compute as the load-bearing scarcity on frontier alignment, and the FY27 alignment plan drafted against the batch-cycle assumption needs a per-workload-class re-shootout against the online-RLHF substrate.

What online RLHF restructures for the FY27 alignment plan

The alignment-iteration cadence unlocks the per-release-candidate re-scoring pattern the batch cadence blocked. The prior alignment-plan cadence had one RLHF re-training per major release, with alignment drift accumulating across the release cycle. Online RLHF lets the plan run per-release-candidate live-population re-scoring — the alignment envelope re-closes at each release candidate, not once per quarter. The FY27 release-cycle plan drafted against the batch-cadence assumption is running against a cadence the substrate now supports and the release-plan artifact does not yet.

The domain-expert labeler crew and the crowdsourced labeler crew stop being interchangeable inputs. At the online-RLHF cadence, the per-population-quality-envelope directly shows up as the per-step alignment-signal quality. The regulated-industry workload class (medical / legal / financial / high-stakes-agent) grades against a domain-expert-labeler substrate; the general-consumer workload class grades against a well-audited crowdsourced substrate. The FY27 labeler-crew contract that treats both populations as the same procurement line item is running against a per-workload-class per-population-quality requirement the substrate now enforces.

The per-cycle safety-evaluation gate compresses from pre-release batch audit to per-training-step live audit. The AI-safety function's operating cadence shifts from quarterly pre-release safety batches to per-training-step live audit against the online-signal stream. The compliance-officer function's FY27 headcount plan grades against the shifted cadence — the safety-evaluation workload is no longer a per-release-batch line item, it is a per-training-step live artifact. The org chart absorbs the shift.

The regulatory-compliance envelope on frontier alignment gets a per-jurisdiction population-of-labelers requirement. The FY27 regulatory calendar tracks per-jurisdiction alignment-artifact requirements — the EU AI Act audit trail on high-risk workload classes, the sectoral-regulator sign-off on regulated workload classes, the per-industry safety-standard grading on domain-specific workload classes. The online-RLHF substrate lets the per-jurisdiction population of labelers show up in the alignment-artifact trail as a first-class attribute. The compliance-envelope closes around per-jurisdiction population attributes the batch substrate could not produce as an artifact.

Where the online-RLHF signal is real and where it is hype

Real: the hours-not-months cadence delta is the load-bearing operational shift. The batch-cadence assumption locked FY27 alignment plans into per-quarter iteration; the online-cadence unlocks per-week (and eventually per-day) alignment iteration. Every alignment workload class whose FY27 plan grades against the batch-cadence assumption is a candidate for re-shootout against the online-cadence substrate.

Real: the human signal is the new scarcity framing tracks the FY27 procurement axis. Compute is well-priced; high-quality human signal at online-RLHF cadence is supply-constrained. The FY27 standing contract on human-signal needs a per-population-quality clause, a per-jurisdiction-availability clause, and a per-workload-class response-time clause. The batch-labeler contract the team has been negotiating against does not carry the shape.

Hype: online RLHF replaces batch RLHF. It does not. Batch RLHF remains on the alignment-cycle for foundational re-training runs whose signal-collection cost the online-cadence substrate cannot amortize. Online RLHF is the per-release-candidate re-scoring substrate; batch RLHF is the per-foundational-run alignment substrate. The FY27 alignment plan holds both cadences on the substrate map, not one replacing the other.

Hype: the crowdsourced labeler population closes the alignment envelope for every workload class. It does not. The regulated-industry workload class grades against a domain-expert-labeler substrate the crowdsourced population does not produce. The FY27 labeler-crew contract splits by workload-class per-population-quality requirement; the aggregate labeler-headcount metric does not encode the split.

What the alignment team should do inside the next two weeks

Run the per-workload-class shootout on online-RLHF against batch-RLHF for the team's alignment-critical workload classes inside two weeks. For the team's top-three alignment-critical workload classes (regulated-industry agent surface, brand-safety-envelope consumer surface, high-stakes-decision agent surface), measure per-class alignment-iteration cadence, per-class per-cycle alignment-drift envelope, per-class per-population-quality-envelope closure, and per-class per-jurisdiction population-of-labelers availability. The output is the alignment-substrate update artifact the FY27 plan runs against.

Split the labeler-crew contract by workload-class per-population-quality requirement. The single-line-item labeler-crew procurement contract the team has been running against does not encode the per-workload-class per-population-quality requirement the online-RLHF substrate now enforces. Split the contract by domain-expert-labeler substrate for regulated-industry workload classes and crowdsourced-labeler substrate for general-consumer workload classes; the aggregate labeler-headcount line item stops being the negotiation input.

Shift the safety-evaluation cadence from per-release batch audit to per-training-step live audit on alignment-critical workload classes. The AI-safety function's operating cadence needs to shift with the substrate. Update the per-cycle safety-evaluation runbook against the online-signal-stream input, and re-scope the FY27 headcount plan against the per-training-step live-audit workload class.

Add a per-jurisdiction population-of-labelers clause to the FY27 human-signal standing contract. The EU AI Act and sectoral-regulator artifact-trail requirements grade against per-jurisdiction population-of-labelers attributes on alignment-critical workload classes. The FY27 human-signal contract needs the per-jurisdiction clause as a first-class attribute; the aggregate global-labeler-headcount clause does not close the compliance envelope.

What online RLHF cheapens but does not replace

Online RLHF compresses the alignment-iteration cadence on the RLHF-substrate default-routing tier, not the senior judgment of deciding which workload classes are online-RLHF-shape, writing the per-workload-class alignment-envelope verifier the training-loop grades against, owning the per-jurisdiction population-of-labelers envelope on the FY27 human-signal standing contract, and running the per-cycle alignment-drift code review against the team's RLHF substrate. The teams that confuse the compressed alignment-iteration cadence for compressed judgment route the regulated-industry workload class against a crowdsourced-labeler substrate that does not close the envelope, and read the per-cycle post-mortem on the population-quality-mismatch gap the shootout would have surfaced. The teams that keep the senior judgment at the center of the substrate decision translate the cadence compression into per-week alignment improvements the batch substrate could not produce.

The alignment-substrate question is no longer which RLHF vendor is cheapest; it is which per-workload-class per-population-quality envelope the FY27 human-signal standing contract underwrites against the online-RLHF and batch-RLHF cadence map, which per-jurisdiction population-of-labelers envelope the contract retains for the regulated-industry workload classes, and which per-cycle alignment-drift code review the AI-safety function commits to against the online-signal-stream substrate.


At SONNET CODE we run the AI Training engagement against the per-workload-class alignment-substrate routing artifact — per-workload-class shootouts against the online-RLHF and batch-RLHF cadence map, per-population-quality envelopes on the FY27 human-signal standing contract, and per-cycle alignment-drift code reviews against the team's RLHF substrate. If your team's alignment plan is still drafted against the batch-cadence assumption, schedule a call — we'll walk you through the online-RLHF substrate re-shootout we ship inside one sprint, with domain-expert labeler crews on the workload classes whose per-population-quality envelope the substrate needs to close.