Prompt Engineering Patterns That Survive Production
Five battle-tested prompting patterns — structure, examples, decomposition, self-checking, and output contracts — that hold up when real traffic hits.
A prompt that works in a playground and a prompt that works under production traffic are different artifacts. The demo prompt handles the happy path; the production prompt handles the long tail of weird inputs without falling over. Here are five patterns that consistently make the difference.
1. Structure the prompt, don’t write a paragraph
Models follow structured instructions far more reliably than prose. Give the prompt clear, labelled sections — role, task, constraints, format — instead of one dense paragraph the model has to parse.
# Role
You are a support-ticket classifier.
# Task
Assign each ticket exactly one category.
# Categories
billing | bug | feature_request | other
# Rules
- If unsure, choose "other".
- Never invent a category.
# Output
Return only the category, lowercase, no punctuation.
This isn’t cosmetic. Sections act as anchors the model can attend to, and they make the prompt diffable when you iterate.
2. Show, don’t just tell (few-shot)
One concrete example outweighs a paragraph of description. When the output is at all nuanced, include 2–4 examples that span the edges of the task, not just the obvious case.
# Examples
Ticket: "Charged twice this month" → billing
Ticket: "App crashes on upload" → bug
Ticket: "Add dark mode please" → feature_request
Ticket: "thanks!" → other
Pick examples that resolve ambiguity the instructions can’t. The “thanks!” example above teaches the model what not to over-classify.
3. Decompose multi-step reasoning
If a task has several stages, don’t ask for the final answer in one shot. Either ask the model to work through steps explicitly, or split it into multiple calls. A single call doing classification and extraction and formatting will be worse at all three than three focused calls.
When accuracy matters more than latency, more, smaller calls beat one big clever one.
4. Give the model an escape hatch
The most common production failure is confident wrong answers. Models hallucinate when the prompt implies an answer must exist. Explicitly permit uncertainty:
If the context does not contain the answer, respond exactly:
"I don't have enough information."
Do not guess.
This single instruction eliminates a huge class of fabrications. Pair it with retrieval — see RAG vs Fine-Tuning — so the model usually does have the information it needs.
5. Enforce an output contract
If downstream code parses the output, the output is an API — so specify it like one and validate it. Ask for JSON, give the schema, and fail loudly when the model deviates rather than silently passing garbage downstream.
import { z } from "zod";
const Classification = z.object({
category: z.enum(["billing", "bug", "feature_request", "other"]),
confidence: z.number().min(0).max(1),
});
function parse(raw: string) {
const result = Classification.safeParse(JSON.parse(raw));
if (!result.success) {
// Retry, fall back, or alert — never trust un-validated model output.
throw new Error(`Bad model output: ${result.error}`);
}
return result.data;
}
The pattern behind the patterns
Notice the through-line: treat the model like an unreliable but capable collaborator. Be explicit, show examples, break work down, allow “I don’t know,” and verify the output. None of these are clever tricks — they’re the same disciplines you’d apply to any system with a fallible component.
The final discipline is measurement. Once these patterns are in place, build a small eval set — a handful of tricky inputs with known-good outputs — and run it on every prompt change. A prompt you can’t measure is a prompt you can’t safely improve. That’s where production AI engineering really begins; we’ll cover building eval harnesses in a future post.
Build AI software, the right way.
Get new tutorials on agents, RAG and shipping LLM apps — straight to your inbox. No spam, unsubscribe anytime.