8 weeks, one founder, one model provider — and it's not OpenAI. The cost, latency, and sovereignty math behind Linkette's all-Mistral stack.
I shipped Linkette — an EU-sovereign link-in-bio for creators — in 8 weeks, solo, with one model provider wired into every AI surface: Mistral. No OpenAI fallback. No Anthropic for the "harder" prompts. Just mistral-small-latest for the streaming bits and mistral-large-latest for the structured generation.
People keep asking why. Here's the actual math, the actual latency, the actual outputs, and the parts where Mistral genuinely isn't there yet.
The cost math (which is not the headline)
Linkette has three AI surfaces:
- Onboarding — one structured generation per signup (
mistral-large, ~1.2k in / 600 out). - Inline assists — bio rewrites, link enrichment, optimization tips (
mistral-small, ~400 in / 150 out, called many times per active session). - Weekly analytics brief — one structured generation per Pro user per Monday (
mistral-small, ~600 in / 120 out).
Mistral's published pricing, which is the actual basis for our cost model:
// apps/web/lib/ai/usage.ts
const MISTRAL_PRICING_EUR_PER_1K_TOKENS: Record<
string,
{ in: number; out: number }
> = {
"mistral-small-latest": { in: 0.0002, out: 0.0006 },
"mistral-large-latest": { in: 0.002, out: 0.006 },
};
export function estimateCostEur(
model: string,
tokensIn: number,
tokensOut: number,
): number {
const p = MISTRAL_PRICING_EUR_PER_1K_TOKENS[model];
if (!p) return 0;
return (tokensIn / 1000) * p.in + (tokensOut / 1000) * p.out;
}
A weekly brief: 0.6 × 0.0002 + 0.12 × 0.0006 = €0.0001920 per email. Round it up to €0.0002. At one thousand Pro users that's €0.20/week, or €10.40/year of model cost for the headline AI feature. The Brevo send costs me more than the model.
The same brief on GPT-4-turbo (input €0.0095/1k, output €0.029/1k at current EU pricing): 0.6 × 0.0095 + 0.12 × 0.029 = €0.0092. ~48x more expensive. At 1k users: €478/year.
That's not a small win, but it's also not the reason. The reason is latency.
The latency math (which is the headline)
Linkette runs on a Scaleway VPS in Paris. Mistral's la-plateforme inference is in Paris too. OpenAI's nearest region from the EU is, at time of writing, ireland for the chat completions endpoint — and even there the time-to-first-token tail is much heavier than la Plateforme.
A back-of-envelope I measured during dev (50 requests each, p50):
| Model | Region | TTFT (p50) | Total (200-token output) |
|---|---|---|---|
mistral-small-latest |
Paris → Paris | ~280 ms | ~1.1 s |
mistral-large-latest |
Paris → Paris | ~520 ms | ~2.4 s |
gpt-4-turbo |
Paris → Ireland | ~640 ms | ~3.1 s |
gpt-4o-mini |
Paris → Ireland | ~480 ms | ~2.0 s |
For an inline assist — the user clicks "rewrite my bio" and waits for tokens to start streaming — sub-300ms TTFT is the difference between "magical" and "is it broken." Anything over 500ms and people start clicking the button a second time.
I didn't tune for this. It's just what you get when your inference provider is in the same metro as your servers.
The architecture: one client, one import
I refuse to scatter import { openai } from "@ai-sdk/openai" and import { mistral } from "@ai-sdk/mistral" around a codebase. It's the single most common way LLM apps end up with seven model providers, half of them stale, none of them traced.
The whole AI subsystem in Linkette goes through one file, one entrypoint:
// apps/web/lib/ai/client.ts
import { mistral } from "@ai-sdk/mistral";
/**
* The single Mistral client. Do not import @ai-sdk/mistral anywhere else.
* Model selection by task: small for inline assists, large for generation.
*/
export const MODELS = {
small: mistral("mistral-small-latest"),
large: mistral("mistral-large-latest"),
} as const;
export type ModelKey = keyof typeof MODELS;
export function pickModel(
task: "onboard" | "bio" | "enrich" | "other",
): ModelKey {
return task === "onboard" ? "large" : "small";
}
export function modelIdFor(key: ModelKey): string {
return key === "small" ? "mistral-small-latest" : "mistral-large-latest";
}
Then every feature consumes the same streamCompletion wrapper, which folds in the brand-voice system prompt, the Langfuse trace, the usage row, and the model-pick decision:
// apps/web/lib/ai/index.ts (excerpt)
export function streamCompletion(input: CompletionInput) {
const modelKey = pickModel(input.task);
const modelId = modelIdFor(modelKey);
const system = buildSystemPrompt({
language: input.language,
style: input.style,
});
const langfuse = getLangfuse();
const trace = langfuse?.trace({
name: `ai.${input.task}`,
userId: input.userId,
input: { prompt: input.prompt, language: input.language ?? "en" },
metadata: { model: modelId },
});
return streamText({
model: MODELS[modelKey],
system,
prompt: input.prompt,
onFinish: async ({ usage, text }) => {
trace?.update({ output: text });
trace?.generation({
name: `mistral.${modelKey}`,
model: modelId,
input: input.prompt,
output: text,
usage: {
input: usage.inputTokens ?? 0,
output: usage.outputTokens ?? 0,
},
});
await langfuse?.flushAsync();
await recordUsage({
userId: input.userId,
task: input.task,
model: modelId,
tokensIn: usage.inputTokens ?? 0,
tokensOut: usage.outputTokens ?? 0,
langfuseTraceId: trace?.id,
});
},
});
}
If I ever want to A/B a non-Mistral model, there's exactly one file to edit. No leaks. ESLint enforces the no-import rule project-wide.
Sovereignty as a product feature
European creators care about this in a way US founders consistently underestimate. The /privacy page on Linkette doesn't say "we may use US sub-processors with SCCs" — it says "Nothing leaves the EU." That sentence is load-bearing. It's the single most-quoted line when I get emails from German librarians and French illustrators asking why they should pick Linkette over Linktree.
Sovereignty isn't a marketing layer. It's wired into the code:
// apps/web/lib/ai/client.ts
// Langfuse Cloud regions:
// EU (Frankfurt) → https://cloud.langfuse.com
// US (Oregon) → https://us.cloud.langfuse.com
// We default to EU and REFUSE to initialize against the US region (or
// any non-Langfuse host) — keeping the "nothing leaves the EU" claim
// on /privacy honest.
const LANGFUSE_EU_HOST = "https://cloud.langfuse.com";
function resolvedLangfuseHost(): string | null {
const raw = process.env.LANGFUSE_HOST?.trim();
if (!raw) return LANGFUSE_EU_HOST;
// Hard block on the US region — never silently send EU user data west.
if (/^https?:\/\/us\.cloud\.langfuse\.com/i.test(raw)) {
console.error("[ai] LANGFUSE_HOST points at US region — refusing to init Langfuse");
return null;
}
return raw;
}
If you typo the env var, the observability stack refuses to start. That's the only way the "EU only" claim stays true in two years' time when somebody else maintains this.
Side-by-side: bio drafting
I gave both models the same prompt: "Quiet French ceramicist, posts in English, makes objects you eat off."
Mistral Large:
Ceramics from a small studio in Lyon. Bowls, beakers, the kind of objects that quietly improve a morning. New pieces drop on the first Friday of each month — usually thirty seconds before they sell out.
GPT-4-turbo:
Hand-thrown ceramics from Lyon, France. Functional pottery — bowls, mugs, plates — designed for everyday rituals. Monthly drops on the first Friday. Follow the studio diary on Instagram, shop the latest collection below.
Mistral wins this one for taste. It doesn't telegraph the obvious ("Follow on Instagram, shop below"), it doesn't fall back to "designed for everyday rituals" which is the AI-content-mill house style for any product on Earth. It writes like a person who's read Aesop's website three times.
This isn't universally true. For dense reasoning, GPT-4-turbo still pulls ahead. But for short editorial prose — the actual job of a link-in-bio AI — Mistral is closer to the European publishing voice that Linkette is built around.
What Mistral genuinely can't do well (yet)
Three honest gaps from 8 weeks of production use:
Long-chain reasoning. A 6-step "given these analytics, here are 3 prioritized changes" prompt drifts. We solved it by doing the math in SQL and letting the model only narrate (see the weekly brief module —
compactStats()does all arithmetic server-side; the model receives a pre-digested JSON object and writes 2 sentences).Code generation. I tried generating CSS for custom themes from a natural-language brief. Mistral Large was fine but consistently behind GPT-4 on syntax edge cases. We pulled the feature and shipped 10 hand-tuned palettes instead.
Function-calling reliability under load. Around the 50th concurrent request the structured-output validation rate dipped noticeably. We added a Zod-validated retry and a deterministic fallback for the weekly brief:
// apps/web/lib/ai/weekly-brief.ts (excerpt)
try {
result = await generateObject({
model: MODELS.small,
schema: weeklyBriefSchema,
system,
prompt: /* ... */,
});
} catch (err) {
trace?.update({ output: { error: String(err) } });
await langfuse?.flushAsync().catch(() => {});
// Graceful fallback — never block the analytics page on Mistral.
return {
brief: `${stats.views} visits and ${stats.clicks} clicks over the last ${stats.period_days} days.${stats.top_link ? ` ${stats.top_link.label} led with ${stats.top_link.clicks}.` : ""}`,
surprise: "",
};
}
Notice the fallback is deterministic prose — not "AI temporarily unavailable." A user looking at their analytics page should never see the words "AI temporarily unavailable." That's a bug.
Things I tried that didn't work
Routing to mistral-large for "harder" prompts
I had a heuristic: prompt > 1k tokens → use large. In practice almost every Linkette prompt is short editorial output and large was both slower (~2x TTFT) and more verbose. Reverted to: large only for the onboarding pass, small for everything else. Output quality on inline assists improved measurably because small was less prone to over-explanation.
Streaming the weekly brief
The brief is generated server-side, persisted, then emailed. I tried streaming it to the UI for the "generate now" button. Removed it because generateObject with a Zod schema returns the whole structured output atomically, and streaming partial JSON into a UI that needs the whole object to render added complexity for ~600ms of perceived latency. Just await it.
Self-hosted Mistral via vLLM on Scaleway H100s
For about 4 hours. The fixed cost of an H100 vastly exceeds the token spend even at 10k users, the cold-start was brutal, and Mistral's hosted la-plateforme is already in Paris. Self-hosting is the right answer at a scale Linkette is years away from.