Few-Shot Prompting: Show, Don't Tell AI

Describe a task to an AI and you get the model's average answer. Show it two to five worked examples and you get yours: your format, your labels, and your rules locked in. It is the fastest way to stop fighting an AI's default behavior and start steering it. That swap is the whole of few-shot prompting. Instead of writing more instructions, you paste a handful of input/output pairs that demonstrate exactly what a correct answer looks like, and the model copies the pattern.

What is few-shot prompting?

Few-shot prompting is the practice of putting a few worked examples in your prompt so the model imitates them. It works through in-context learning. The model reads your examples at the moment you send the prompt and generalizes from them on the spot, with no training, no fine-tuning, and no weights updated. The pattern lives only inside the prompt, which is why you paste your examples in again each time you need that behavior.

Insight

The one line to remember: describe it and you get the average; show it and you get yours.

Insight

Here is what most prompting guides will not tell you: on modern reasoning models, piling on examples can make answers worse, not better. More is not always more.

Zero-shot, one-shot, few-shot: the difference

The number in the name is just the count of examples you provide. Zero-shot means you give instructions and no examples. One-shot means you give exactly one example. Few-shot means more than one, and in practice that usually lands between two and five. There is nothing magic about the numbers; they describe how much you are showing versus telling.

Zero-shot: 'Classify this review as positive or negative.' No example. The model uses its general knowledge.
One-shot: One labeled example, then the new input. Enough to show the format, often not enough to pin down edge cases.
Few-shot: Two to five labeled examples, then the new input. Enough to demonstrate the format and the tricky decisions.
More is not always better: past a handful, extra examples add cost and sometimes hurt. Quality and coverage beat quantity.

The 2020 GPT-3 paper, titled 'Language Models are Few-Shot Learners,' popularized the technique, showing that a large enough model could pick up a new task from a few demonstrations placed directly in the prompt, without any parameter updates. That finding is why few-shot prompting has stayed relevant across model generations: it is a property of how these models read context, not a quirk of one product.

Also on MemX

AI Skills

Two-Stage Prompting Beat a 3x Pricier Model in Production

12 min read→

AI Skills

How to Get Cited by ChatGPT and Perplexity

12 min read→

AI Skills

AI Evals: How to Test an LLM App

11 min read→

Walkthrough 1: the normal case

Start with a task where the format matters more than the reasoning: turning messy product feedback into a structured tag. Here is the zero-shot attempt, which leaves the model to guess your conventions.

Zero-shot prompt: 'Read this support message and give me the category and urgency. Message: The checkout button does nothing on my phone.' The model might reply with a paragraph, or 'Category: Bug, Urgency: High,' or 'This looks like a technical issue that should be prioritized.' All defensible, none consistent. Feed that into a spreadsheet or script and the variation breaks everything downstream.

Now the few-shot version. You show two examples in the exact shape you want, then leave the third blank for the model to fill.

Message: My invoice has the wrong VAT number. -> category=billing | urgency=medium
Message: The whole site is down and I cannot log in. -> category=outage | urgency=high
Message: The checkout button does nothing on my phone. ->

The model now returns 'category=bug | urgency=high' in the same pipe-delimited shape, with lowercase labels, because that is what your examples demonstrated. You never wrote a rule that said 'use lowercase' or 'separate fields with a pipe.' The examples carried those rules implicitly. Format, label vocabulary, and structure transfer from the demonstrations without a single explicit instruction.

Walkthrough 2: the tricky edge case

Normal examples teach the format. Edge-case examples teach the judgment calls. A list of rules can never cover every ambiguous input, but one well-chosen example can. Suppose your support tagger keeps mislabeling angry-but-minor complaints as high urgency. The fix is not a longer instruction. It is an example that draws the line for you.

Message: My invoice has the wrong VAT number. -> category=billing | urgency=medium
Message: The whole site is down and I cannot log in. -> category=outage | urgency=high
Message: This is the THIRD time I am writing, your typo on the pricing page is embarrassing!! -> category=content | urgency=low
Message: The checkout button does nothing on my phone. ->

The third example does the heavy lifting. It is loud, it has capitals and exclamation marks, and a naive classifier reads that tone as high urgency. By labeling it 'urgency=low,' you teach the model that emotional volume is not the same as operational severity. Now an angry message about a cosmetic typo gets tagged low, while a calm message about a broken checkout gets tagged high. You drew the label boundary with a single demonstration, which works because the example exposes the whole decision rule and not just one right answer.

Pro Tip

Pick edge-case examples by looking at where the model already fails. Run a few real inputs zero-shot, find the wrong answers, and turn each mistake into a corrected example. Your failures are your best demonstrations.

A copy-paste few-shot template

Use this skeleton for almost any classify, extract, or reformat task. Replace the bracketed parts with your own task, keep the structure identical across every example, and end with one blank input for the model to complete.

Line 1, the task: 'Convert each [input type] into [output format]. Use only these labels: [list].'
Example 1, a clean normal case: 'Input: [...] -> Output: [...]'
Example 2, a second normal case with different values: 'Input: [...] -> Output: [...]'
Example 3, the edge case the model usually gets wrong: 'Input: [...] -> Output: [...]'
The real query, left open: 'Input: [your actual input] -> Output:'

Insight

Keep the separator, the field order, and the spacing byte-identical across every example. The model copies what it sees, including formatting you did not mean to teach. Inconsistent examples produce inconsistent output.

Where few-shot helps and where it does not

Few-shot is strongest at locking output format, pinning down label vocabulary, and disambiguating edge cases. It is weak at teaching genuinely new reasoning. If a task needs multi-step logic the model cannot already perform, a few example answers will not install that capability; the model copies the shape of your answers without learning the path to them.

The Prompt Engineering Guide shows this directly: on an arithmetic word problem requiring several logical steps, few-shot examples are not enough to get reliable answers. For that class of task, the technique to reach for is chain-of-thought prompting, which shows the model the intermediate steps rather than only the final answer. A practical rule: if the gap between input and output is a format gap, use few-shot; if it is a thinking gap, use chain-of-thought, or combine the two.

Recent reasoning-tuned models add a wrinkle: stacking on examples can sometimes lower accuracy compared to a clean zero-shot instruction, because these models already have a strong internal problem-solving path and extra demonstrations can pull them off it. The takeaway is not 'stop using few-shot.' It is: test both. Few-shot remains the right default for format and labels; for heavy reasoning on a modern reasoning model, check whether a plain, well-specified instruction beats your examples.

Situation	Few-shot prompting	Zero-shot prompting
You need exact output format	Strong: examples lock structure	Weak: format drifts run to run
You need fixed label vocabulary	Strong: examples show the allowed set	Mixed: model invents synonyms
Recurring edge cases	Strong: one example draws the line	Weak: rules miss cases
Multi-step novel reasoning	Limited: copies shape, not logic	Use chain-of-thought instead
Token cost and prompt length	Higher: examples add tokens	Lower: instruction only

Common mistakes that quietly break few-shot

Inconsistent formatting between examples, which teaches the model that the format is optional.
All examples being the easy case, so the model never sees how to handle the hard one.
Examples that are subtly mislabeled; the model will faithfully copy your error.
An unbalanced label mix, for example five positive and one negative example, which biases the output toward the majority label.
Too many examples for a reasoning model, which can crowd out the model's own better approach.

Pro Tip

Order matters more than people expect. Models can weight the last example heavily, so put your most representative or most important example closest to the real query, and vary the labels so the final example is not always the same class.

Few-shot prompting and your own knowledge

Few-shot examples teach format and judgment, but they cannot supply facts the model does not have. If you need an AI to answer using your documents, notes, or messages, examples alone will not put that information in the prompt. That is the line between demonstrating a pattern and supplying knowledge, and the two solve different problems.

MemX is a consumer AI memory app that handles the knowledge side: an external memory layer over your own documents, photos, and notes across Android, iOS, and WhatsApp, so the context you care about is available when you ask. It is private by architecture, with per-user keys, encryption at rest, and an on-device first pass. Pair the two ideas: use few-shot examples to control how the answer looks, and a memory layer to control what the answer knows. Examples shape the output; memory grounds it.

Frequently Asked Questions

01How many examples should I use for few-shot prompting?

Two to five is the common range, and more than one is what makes it 'few-shot.' Start with two or three that cover your normal case, then add one edge-case example where the model usually fails. Past five, extra examples raise cost and can hurt, so test rather than pile on.

02What is the difference between few-shot and zero-shot prompting?

Zero-shot gives instructions and no examples, relying on the model's general knowledge. Few-shot includes two or more worked input/output pairs that demonstrate your exact format and rules. Use zero-shot for simple tasks; use few-shot when output format, labels, or edge cases need to be pinned down.

03Does few-shot prompting train or change the AI model?

No. Few-shot works through in-context learning, meaning the model generalizes from your examples at inference time without updating any weights. The pattern exists only inside that one prompt. Send a new chat without the examples and the behavior is gone, which is why you re-paste them each time.

04Is few-shot prompting the same as fine-tuning?

No. Few-shot prompting is in-context learning: the examples live in the prompt and change nothing about the model. Fine-tuning updates the model's weights with a training run, which is permanent and costly. Few-shot is instant and disposable; fine-tuning is durable but heavyweight. For format and label control, few-shot is usually enough.

05When should I not use few-shot prompting?

Skip it when the task needs genuinely new multi-step reasoning, since examples copy the shape of an answer, not the logic. Use chain-of-thought prompting instead, or combine both. On some modern reasoning models, a clean zero-shot instruction can beat examples, so test both approaches.

Few-shot prompting is the cheapest, most reliable lever a beginner has for steering an AI. You are not writing better instructions; you are showing the answer you want and letting the model match it. Keep your examples consistent, include the edge case that usually trips the model up, and reserve chain-of-thought for the tasks that need real reasoning. Describe it and you get the average. Show it and you get yours.

Few-Shot Prompting: Show, Don't Tell AI

What is few-shot prompting?

Zero-shot, one-shot, few-shot: the difference

Walkthrough 1: the normal case

Walkthrough 2: the tricky edge case

A copy-paste few-shot template

Where few-shot helps and where it does not

Common mistakes that quietly break few-shot

Few-shot prompting and your own knowledge

Stop losing what you save.
Let MemX remember it for you.

Keep reading

What is few-shot prompting?

Zero-shot, one-shot, few-shot: the difference

Walkthrough 1: the normal case

Walkthrough 2: the tricky edge case

A copy-paste few-shot template

Where few-shot helps and where it does not

Common mistakes that quietly break few-shot

Few-shot prompting and your own knowledge

Stop losing what you save.Let MemX remember it for you.

Keep reading

Stop losing what you save.
Let MemX remember it for you.