Prompt Engineering Patterns That Survive Production

A prompt that works in a playground and a prompt that works under production traffic are different artifacts. The demo prompt handles the happy path; the production prompt handles the long tail of weird inputs without falling over. Here are five patterns that consistently make the difference.

1. Structure the prompt, don’t write a paragraph

Models follow structured instructions far more reliably than prose. Give the prompt clear, labelled sections — role, task, constraints, format — instead of one dense paragraph the model has to parse.

# Role
You are a support-ticket classifier.

# Task
Assign each ticket exactly one category.

# Categories
billing | bug | feature_request | other

# Rules
- If unsure, choose "other".
- Never invent a category.

# Output
Return only the category, lowercase, no punctuation.

This isn’t cosmetic. Sections act as anchors the model can attend to, and they make the prompt diffable when you iterate.

2. Show, don’t just tell (few-shot)

One concrete example outweighs a paragraph of description. When the output is at all nuanced, include 2–4 examples that span the edges of the task, not just the obvious case.

# Examples
Ticket: "Charged twice this month" → billing
Ticket: "App crashes on upload"     → bug
Ticket: "Add dark mode please"      → feature_request
Ticket: "thanks!"                   → other

Pick examples that resolve ambiguity the instructions can’t. The “thanks!” example above teaches the model what not to over-classify.

3. Decompose multi-step reasoning

If a task has several stages, don’t ask for the final answer in one shot. Either ask the model to work through steps explicitly, or split it into multiple calls. A single call doing classification and extraction and formatting will be worse at all three than three focused calls.

When accuracy matters more than latency, more, smaller calls beat one big clever one.

4. Give the model an escape hatch

The most common production failure is confident wrong answers. Models hallucinate when the prompt implies an answer must exist. Explicitly permit uncertainty:

If the context does not contain the answer, respond exactly:
"I don't have enough information."
Do not guess.

This single instruction eliminates a huge class of fabrications. Pair it with retrieval — see RAG vs Fine-Tuning — so the model usually does have the information it needs.

Where this bit me — Josh: I shipped a ticket classifier without this escape hatch because every ticket in my test set cleanly fit a category. In production a user pasted a wall of emoji and the model confidently tagged it billing. Nobody noticed until a finance teammate asked why an empty ticket had landed in their queue. Adding the “I don’t have enough information” clause turned roughly 3% of daily volume from confident-wrong into a clean fallback — and it was a one-line change I should have written on day one.

5. Enforce an output contract

If downstream code parses the output, the output is an API — so specify it like one and validate it. Ask for JSON, give the schema, and fail loudly when the model deviates rather than silently passing garbage downstream.

import { z } from "zod";

const Classification = z.object({
  category: z.enum(["billing", "bug", "feature_request", "other"]),
  confidence: z.number().min(0).max(1),
});

function parse(raw: string) {
  const result = Classification.safeParse(JSON.parse(raw));
  if (!result.success) {
    // Retry, fall back, or alert — never trust un-validated model output.
    throw new Error(`Bad model output: ${result.error}`);
  }
  return result.data;
}

The pattern behind the patterns

Notice the through-line: treat the model like an unreliable but capable collaborator. Be explicit, show examples, break work down, allow “I don’t know,” and verify the output. None of these are clever tricks — they’re the same disciplines you’d apply to any system with a fallible component.

The final discipline is measurement. Once these patterns are in place, build a small eval set — a handful of tricky inputs with known-good outputs — and run it on every prompt change. A prompt you can’t measure is a prompt you can’t safely improve. That’s where production AI engineering really begins; we’ll cover building eval harnesses in a future post.

Prompt Engineering Patterns That Survive Production

1. Structure the prompt, don’t write a paragraph

2. Show, don’t just tell (few-shot)

3. Decompose multi-step reasoning

4. Give the model an escape hatch

5. Enforce an output contract

The pattern behind the patterns

Keep reading

Evaluating AI-Generated Code Before It Ships

Profiling First, Prompting Second: Using AI to Optimize Slow Code

AI-Assisted Git: Writing Better Commits and PR Descriptions