Stop Waiting, Start Shipping: the Open AI Stack Grew Up in 2026

2026-04-27

open-sourceai-strategydigital-sovereigntyenterprise-aipost-trainingeu-ai-actllms

Sebastian Raschka and Alexander C. S. Hendorf in the fireside chat "Stop Waiting, Start Shipping" at PyCon DE & PyData 2026, Darmstadt

At PyCon DE & PyData 2026 I had the opportunity to host a fireside chat with Sebastian Raschka, author of Build a Large Language Model (From Scratch), formerly a statistics professor, today one of the few voices that translates between LLM architecture and practitioner reality without watering down either side. What crystallised in the conversation isn't a set of model benchmarks but three strategic lines that matter more for decision-makers than the next release wave.

There Is No "Winner Takes All", and That's an Architecture Question

The mainstream story still runs as a horse race: OpenAI versus Anthropic versus Google versus the Chinese players. Who wins? Wrong question.

Raschka puts it plainly: models are increasingly post-trained for their harness: the specific environment in which they run. Cursor (the US coding startup, last valued at around $30 billion) runs Composer in its own agent, at its core a post-trained Kimi K 2.5, a Chinese open-source base model. Claude Code, Codex, and any serious coding agent operates inside a purpose-built harness in which the model is post-trained specifically for the tool interface.

"There's no general model that is gonna do well in all the harnesses." Sebastian Raschka

That shifts the build-vs-buy logic. "Which model?" is the wrong opening question in most cases. The right one: what harness are we building, and which model fits inside it? Picking a model before settling the harness puts the cart before the horse.

Europe's Allocation Question: Not the Next Base Model

The most prominent European AI-strategy debate ("we need our own European base model") is problematic on two counts.

First, capable open-weight base models already exist: DeepSeek v3 and Qwen 3 (both from China), Llama (Meta, USA). Mistral Large 3 (Europe's most prominent model) is at its core a post-trained DeepSeek-v3 architecture. Second, the actual competitive advantage sits one layer up: post-training, tool integration, domain data, and harness design.

When a CIO with budget and a twelve-month delivery window asks me where the next euro should go, the answer is clear: not into pre-training. It belongs in the layers where domain-specific, regulatory-defensible value is created: the layers that can actually carry data-protection, on-premises and audit requirements. That is the only lever that delivers technological sovereignty and compliance at the same time.

This investment logic only works on open weights: post-training on a closed API isn't really an option. Open source is therefore not preference, it's prerequisite. And one of the most underrated reasons even the labs themselves treat it that way is talent. Asked why Google releases Gemma or OpenAI releases GPT-OSS, Sebastian's answer wasn't marketing or generosity: it was hiring:

"If you hire people who never worked on an LLM because there's no LLM you can work on if it's all proprietary, well, you have to train them from scratch." Sebastian Raschka

If even the labs treat open weights as a structural prerequisite for their own talent pipeline, the same logic flows downstream: enterprises building on those weights (and hiring people who've worked with them) depend on exactly that precondition.

Agents on a Short Leash

The second wave of coding agents has gone mainstream. Raschka mentioned a friend whose startup codebase was essentially built with Claude Code, work that would previously have cost two years and a thirty-person team. That is the optimistic reading.

The other one: in large codebases (Raschka and I discussed the example of New York's financial firms) code volume suddenly grows by a factor of ten. 25,000 lines become 250,000. All of a sudden the senior reviewers required to sign off in a regulated industry are missing. The bottleneck moves from writing to reviewing.

Experienced engineers and data scientists do not become less important in this world. They become more valuable.

My recommendation is therefore unchanged: agents are great, but they need a short leash. Concretely: do not aim for end-to-end automation; aim for making existing work better. Raschka puts it crisply:

"Making your work better rather than replacing it." Sebastian Raschka

The use cases that scale in regulated contexts follow this exact pattern: enrich existing tests with agent suggestions rather than generate them outright; widen code review with a second pair of eyes rather than automate it; make documentation searchable rather than auto-produce reports.

The Replacement Debate Is the Wrong Debate

Public discussion is dominated by the replacement question: which jobs disappear? Which tasks does AI take over? Sebastian's perspective from research turns the argument inside-out. The work PhD students used to spend days on (hyperparameter sweeps, batch-script wrangling, log parsing, plotting results), the grunt work the ML community half-jokingly calls graduate student descent (a play on gradient descent), now largely automates away.

"The students now actually get to do science instead of doing busy work." Sebastian Raschka

That is not replacement. It is release. The creative capacity of researchers (formulating hypotheses, designing experiments, interpreting results) only becomes available once the tedious layer is automated. The same pattern applies in industry: equipping a team with AI agents does not replace heads, it relieves them of the share of work that never required human judgement in the first place. Value migrates upward, not away.

Stage setup for the fireside chat "Stop Waiting, Start Shipping: Real-World Strategy for Open-Source LLMs" with Sebastian Raschka and Alexander C. S. Hendorf, PyCon DE & PyData 2026

Practical Consequences

Three recommendations I take from the conversation and continue to sharpen in active engagements:

First: Harness Before Model

Move the model-selection discussion to the end of the architecture conversation, not the start. Settle the harness first: tool interfaces, data wiring, security boundaries, audit trail. Which model runs inside is the last decision, not the first.

Second: Invest in Post-Training, Not Pre-Training

Proprietary base models would be wasted capital for 99 percent of European companies. Investment in post-training capability, MLOps, data pipelines and harness engineering, by contrast, is where domain value actually compounds. Anyone who takes sovereignty seriously doesn't build the next GPT. They build the layers where domain value is defended.

Third: Experiment Instead of Wait

Raschka's closing advice is also mine: three small experiments this week beat one big plan next quarter.

"If you plan something very thoroughly it is probably irrelevant tomorrow." Sebastian Raschka

Waiting is not an option. Unstructured action isn't either. The difference sits in the discipline with which experiments are framed (small models, sandboxed environments, narrowly defined use cases) and in the reflex to fold what you learn straight into the next architectural decision.

Where does the biggest open-source lever sit in your roadmap?

Let's talk