What is the difference between zero-shot and few-shot prompting?

Zero-shot prompting asks the model to perform a task using only your instruction — no examples. The model relies entirely on knowledge baked in during pre-training. Few-shot prompting adds 2–20 worked examples before the actual task. These examples act as in-context demonstrations that show the model exactly what output format, label space, and tone you want. Zero-shot is faster and cheaper; few-shot is more reliable when the output format is unusual or the task is ambiguous. In practice: start zero-shot, switch to few-shot if outputs are consistently off-format or wrong-labelled.

Why does chain-of-thought prompting improve accuracy on math and logic?

Chain-of-thought prompting forces the model to generate intermediate reasoning steps before producing a final answer. This matters because language models predict tokens sequentially — if you ask for the answer directly, the model must "compress" all the reasoning into one step, which often introduces errors. When the model writes out its reasoning, each step provides a corrective scaffold for the next one. The model is computing rather than recalling. Wei et al. (2022) showed that simply adding "Let's think step by step" before the answer blank on benchmarks like GSM8K roughly doubled accuracy on large models.

How do I pick good few-shot examples?

Four factors matter most: label balance (if 80% of your examples belong to one class, the model learns to predict that class), diversity (cover edge cases and boundary cases, not just the easy centre of the distribution), format consistency (every example should have identical structure — same delimiters, same key names, same output schema), and length (long examples eat context budget; keep them tight). Picking 5 excellent examples beats picking 20 mediocre ones every time.

When should I use self-consistency instead of a single chain-of-thought?

Self-consistency is worth using when accuracy on a specific question matters more than speed or cost — exam-style math problems, legal or financial classification, or any task where you need confidence calibration. You generate the same prompt multiple times at a slightly raised temperature, collect 5–20 reasoning chains, and take the majority vote on the final answer. Because each chain takes a different path to the same answer, consistent answers across paths are far more reliable than any single chain. It is expensive but measurably improves accuracy, especially where a single CoT pass is error-prone.

Do modern models like Claude or GPT-5 still need chain-of-thought prompting?

Large frontier models in 2026 have built-in thinking or extended reasoning modes that apply chain-of-thought-style reasoning automatically before producing a response. In many cases you do not need to manually add "Let's think step by step" — turning on the model's thinking mode achieves the same effect with less prompt engineering overhead. However, few-shot CoT (where you provide worked examples with explicit reasoning chains) still helps for domain-specific tasks where the model's default reasoning path diverges from your domain's conventions. Manual CoT remains useful for smaller models, fine-tuned models, or any deployment where you cannot access extended thinking.

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

There is a gap between knowing that prompt techniques exist and knowing which one to reach for in a specific situation. Zero-shot, few-shot, chain-of-thought, self-consistency, Tree of Thought — the vocabulary has expanded faster than the practical guidance around it.

This guide closes that gap. Each technique is explained from first principles, illustrated with concrete prompts you can copy, and placed in a decision framework that tells you when to use which. By the end, you will understand not just what each technique does but why it works — which is the only knowledge that transfers to new tasks.

Hands-on techniques for zero-shot, few-shot, and chain-of-thought prompting across real tasks.

What In-Context Learning Actually Is

Before comparing techniques, it is worth being precise about the mechanism underlying all of them.

In-context learning is the ability of a sufficiently large language model to adapt its behaviour based on examples provided within the prompt itself — without any change to the model's weights. You do not need to retrain, fine-tune, or even call a training API. You include examples, and the model adjusts.

This emerged as a surprise at scale. Early language models did not exhibit it. GPT-3 (2020) was among the first to demonstrate it convincingly: add a few labelled examples to the prompt, and performance on classification tasks jumps dramatically — sometimes matching fine-tuned models that saw thousands of examples.

Why does it happen? The dominant explanation is that during pre-training on enormous text corpora, the model encountered millions of patterns where a sequence of examples was followed by a natural continuation. It learned, at a statistical level, to infer the task from the examples and produce the right continuation. The examples in your prompt trigger this latent capability.

Critically: the model is not learning in the training sense. Weights do not update. What happens is pattern-matching at inference time. The examples narrow the model's probability distribution over outputs, steering it toward the format and label space you want. This distinction matters because it tells you what in-context learning can and cannot do: it can nudge a capable model toward a format; it cannot teach a model something it has no pre-training signal for.

Every prompting technique in this guide is a different way of exploiting this mechanism.

Zero-Shot Prompting

What It Is

Zero-shot prompting means giving the model a task description and asking it to perform the task — with no examples. The model relies entirely on knowledge and capability accumulated during pre-training.

text

Classify the sentiment of the following customer review as Positive, Negative, or Neutral.

Review: "The packaging was damaged but the product itself works perfectly."

Sentiment:

That is a zero-shot prompt. No examples of what Positive, Negative, or Neutral reviews look like. No sample outputs. Just the task and the input.

Why It Works (When It Does)

Zero-shot works when the task is well-represented in the model's pre-training data. Sentiment classification, translation between major languages, summarisation of standard documents, named-entity extraction, basic code generation — these are tasks the model has encountered in thousands of variations. The instruction alone is enough to activate the right behaviour.

Zero-shot also has a practical advantage: no examples means no token overhead, faster iteration, and no risk of examples biasing the model toward patterns that do not generalise.

When It Fails

Zero-shot fails in three predictable scenarios:

Unusual output format. If you want JSON with a specific schema — {"sentiment": "Negative", "confidence": 0.8, "reason": "..."} — zero-shot often produces something close but not exact. The model has a strong prior toward prose output, and without an example, it approximates rather than follows precisely.

Non-standard label space. Standard classification labels like Positive/Negative/Neutral work zero-shot because they are canonical. But if your categories are company-specific (BILLING_DISPUTE, FEATURE_REQUEST_PREMIUM, CHURN_RISK), the model has no training signal for them. Zero-shot will guess, usually poorly.

Strong conflicting priors. If the model's pre-training strongly associates a surface pattern with a different output, zero-shot cannot override that. Asking the model to classify a clearly sarcastic negative review as "Positive" because it is from a loyalty customer requires a nuanced instruction that zero-shot often cannot sustain.

Zero-Shot Summarisation Example

text

Summarise the following technical incident report in exactly three bullet points.
Each bullet must start with a bolded category label: **Cause**, **Impact**, **Resolution**.
Do not include any additional text before or after the bullets.

Incident report:
On March 14 at 02:17 UTC, the authentication service began returning 503 errors
due to a misconfigured load balancer rule deployed at 01:45 UTC. Approximately
12,000 users were unable to log in for 43 minutes. The on-call engineer identified
the faulty rule at 02:58 UTC and rolled it back. Service restored at 03:00 UTC.
Full post-mortem scheduled for March 17.

Summary:

This will work reliably zero-shot because summarisation and bullet formatting are deeply represented in pre-training. But notice that even here, the prompt is precise about count, format, and constraints — zero-shot does not mean vague.

One-Shot Prompting

One-shot prompting adds exactly one example before the task. It occupies the space between zero-shot and few-shot: cheaper than a full few-shot battery, but more reliable than no examples at all.

The example functions as a template, not just an instruction. You are showing the model the exact shape of the output you want.

text

Classify the sentiment of the customer review. Output a JSON object with keys
"sentiment" (Positive/Negative/Neutral), "confidence" (0.0–1.0), and "reason" (one sentence).

Example:
Review: "Arrived two days late but the quality is outstanding."
Output: {"sentiment": "Positive", "confidence": 0.75, "reason": "Product quality outweighs delivery delay in the customer's assessment."}

Now classify:
Review: "The customer service team refused to process my refund despite the item being defective."
Output:

The single example has done something the instruction alone could not: it established the exact JSON schema, the confidence range, and the expected length and tone of the reason field. The model now has a concrete target to match.

Use one-shot when:

The output format is unusual or precise
The task is straightforward but the schema needs anchoring
You cannot afford the token cost of multiple examples

Few-Shot Prompting

What It Is

Few-shot prompting provides 2–20 worked examples before the task. Each example shows an input and the correct output, letting the model infer the task, label space, format, and edge-case handling simultaneously.

Few-shot is the workhorse of production prompt engineering. For classification tasks, extraction tasks, and any task with a rigid output schema, few-shot consistently outperforms zero-shot by a wide margin.

Concrete Few-Shot Classification Prompt

text

You are a support ticket router. Classify each ticket into exactly one category:
BILLING, TECHNICAL, FEATURE_REQUEST, or ACCOUNT_ACCESS.

---
Ticket: "I was charged twice for my subscription this month."
Category: BILLING

Ticket: "The export to CSV function crashes when there are more than 500 rows."
Category: TECHNICAL

Ticket: "It would be great if we could schedule reports to run automatically."
Category: FEATURE_REQUEST

Ticket: "I can't log in — it says my account has been suspended but I haven't received any email."
Category: ACCOUNT_ACCESS

Ticket: "My invoice shows a different amount than what I agreed to in the contract."
Category: BILLING

Ticket: "The API keeps returning a 429 error even though I'm well under my rate limit."
Category: TECHNICAL
---

Now classify the following ticket:
Ticket: "I need to transfer my account to a different email address."
Category:

Six examples cover all four categories, include two BILLING examples (the more common class), and cover format-edge cases like billing vs account disputes. The model now has both the label space and the boundary between categories.

What Makes a Good Few-Shot Example

This is where most practitioners make mistakes. The quality of examples matters far more than the quantity.

Label distribution must reflect reality. If 70% of your actual inputs are TECHNICAL tickets, your examples should be roughly 70% TECHNICAL. If you show equal examples for each label but the real distribution is skewed, the model will be poorly calibrated. It will spread its predictions more evenly than reality warrants.

Diversity beats repetition. Two identical BILLING examples teach the model less than one standard billing example and one edge-case billing example (e.g., a refund request vs a pricing dispute). Cover the range of your input space, not just the easy centre.

Format consistency is non-negotiable. If your first three examples use Category: as the output label and example four uses Label:, the model will notice the inconsistency and may waver on which to use. Every example must follow identical formatting — same delimiters, same key names, same capitalisation.

Length should be controlled. Long examples consume context window budget that could be spent on the actual task. If an input can be shortened without losing the feature that makes it a good example, shorten it. On long-context models this matters less, but it is still good hygiene.

Include edge cases deliberately. The easy examples are the ones the model would get right anyway. The examples that move the needle are the ones at the decision boundary: a customer complaint that is simultaneously a billing issue and an account access issue, a review that is sarcastic positive. These boundary examples teach the model where you draw the line.

Few-Shot Extraction Prompt

text

Extract structured data from the job posting. Output JSON only, no prose.

---
Posting: "Senior Backend Engineer at Moonshot Labs — Remote (US only) — $180k–$220k — 7+ years Python, Postgres, Kafka required — Apply by July 31"
Output: {"title": "Senior Backend Engineer", "company": "Moonshot Labs", "location": "Remote (US only)", "salary_range": "$180k–$220k", "required_skills": ["Python", "Postgres", "Kafka"], "min_experience_years": 7, "application_deadline": "July 31"}

Posting: "Product Designer, Fintech — Hybrid (London) — Competitive salary — 3–5 yrs experience, Figma expert, fintech background preferred — No specified deadline"
Output: {"title": "Product Designer", "company": null, "location": "Hybrid (London)", "salary_range": null, "required_skills": ["Figma"], "min_experience_years": 3, "application_deadline": null}
---

Posting: "Staff ML Engineer at VectorIQ — San Francisco, CA — $250k–$300k total comp — 10+ years experience, PyTorch, LLM fine-tuning, distributed systems — Rolling applications"
Output:

Notice the second example deliberately includes nulls — no company name, no salary, no deadline. Without that example, the model would invent values rather than return null. Edge cases in examples prevent hallucination of missing fields.

Chain-of-Thought Prompting

The Core Idea

Chain-of-thought (CoT) prompting, introduced by Wei et al. in 2022, works on a simple insight: if you want the model to reason correctly, make it show its reasoning.

The original finding was striking. On multi-step math and logic benchmarks, large models performed dramatically better when prompted to reason step by step before giving the final answer. On some benchmarks, accuracy more than doubled. The technique worked not just with few-shot examples but with a single zero-shot addition: "Let's think step by step."

Why does generating intermediate steps help? Several mechanisms are at work:

Sequential correction. Each generated token conditions all subsequent tokens. When the model writes out a correct intermediate step, the next step is more likely to be correct. Errors propagate less because each step is grounded in a written, verifiable intermediate.
Computation vs recall. A model producing a direct answer to a math problem must "compress" all arithmetic into a single prediction. A model writing out calculations can effectively perform the arithmetic token by token, which is far more reliable.
Longer effective computation. Transformer models have fixed depth — the number of layers limits how much "thinking" can happen for a single forward pass. But when the model generates a long reasoning chain, it effectively performs deeper computation across many forward passes, one per token.

Zero-Shot Chain-of-Thought

The simplest CoT technique requires no examples at all:

text

A store sells apples for $0.50 each and oranges for $0.75 each.
Emily buys 8 apples and 5 oranges. She pays with a $10 bill.
How much change does she receive?

Let's think step by step.

The phrase "Let's think step by step" is the entire intervention. The model will produce something like:

snippet

Step 1: Calculate the cost of apples.
8 apples × $0.50 = $4.00

Step 2: Calculate the cost of oranges.
5 oranges × $0.75 = $3.75

Step 3: Calculate the total cost.
$4.00 + $3.75 = $7.75

Step 4: Calculate the change.
$10.00 − $7.75 = $2.25

Emily receives $2.25 in change.

Without "Let's think step by step", the model is more likely to collapse the arithmetic and make an error. With it, the model traces each step, which catches arithmetic mistakes before they compound.

Alternatives that produce similar effects:

"Walk me through your reasoning before answering."
"Show your work."
"First, think about what information you have. Then reason to the answer."
"Think carefully before giving your final answer."

Few-Shot Chain-of-Thought

Few-shot CoT combines example provision with explicit reasoning chains. You write out not just the correct answer but the correct reasoning path, and the model learns to follow it.

text

Answer the following logic questions. Think step by step before giving the final answer.

---
Question: All mammals are warm-blooded. Dolphins are mammals. Are dolphins warm-blooded?
Reasoning: The first premise tells us all mammals share the property of being warm-blooded. The second premise establishes that dolphins belong to the category of mammals. Since dolphins are mammals, and all mammals are warm-blooded, dolphins must be warm-blooded.
Answer: Yes, dolphins are warm-blooded.

Question: If it rains, the ground gets wet. The ground is wet. Did it rain?
Reasoning: The premise only tells us that rain causes wet ground — if rain then wet ground. But wet ground can have other causes (sprinklers, flooding, spilled water). Wet ground is consistent with rain but does not prove rain was the cause. This is a logical fallacy known as affirming the consequent.
Answer: Not necessarily. The ground being wet is consistent with rain but does not prove it rained.
---

Question: No reptiles are warm-blooded. Snakes are reptiles. Are snakes warm-blooded?
Reasoning:

The few-shot reasoning chains show the model how to handle both valid syllogisms and logical fallacies. The model learns both the answer format and the type of reasoning to apply.

Few-shot CoT is consistently stronger than zero-shot CoT, but it requires more work: you must write correct, explicit reasoning chains for each example. On complex domains — medical reasoning, legal analysis, multi-step code debugging — this investment pays off.

When to Use CoT vs Standard Prompting

Task type	CoT helpful?	Reason
Multi-step arithmetic	Yes, strongly	Sequential correction of arithmetic errors
Logical deduction	Yes, strongly	Forces structured reasoning over recall
Factual recall	No	CoT adds tokens without improving accuracy
Simple classification	No	No reasoning steps involved
Creative writing	Rarely	Reasoning chains disrupt fluency
Code generation	Sometimes	Helps for algorithmic problems, not boilerplate
Summarisation	No	Task doesn't require stepwise reasoning

The rule of thumb: use CoT whenever the task requires multi-step computation or logical dependency between steps. Avoid it when the task is primarily retrieval or generation without inter-step dependencies.

Self-Consistency: CoT With a Vote

A single chain-of-thought pass can still go wrong — the model commits to a reasoning path early and follows it even when it leads astray. Self-consistency, introduced by Wang et al. (2022), fixes this by generating multiple reasoning chains and taking the majority vote on the final answer.

The procedure:

Run the same prompt 5–20 times at a slightly elevated temperature (0.5–0.8)
Each run produces a different reasoning chain
Extract the final answer from each chain
Return the most common final answer

python

# Pseudocode for self-consistency
answers = []
for _ in range(10):
    response = model.generate(prompt + "\nLet's think step by step.", temperature=0.7)
    final_answer = extract_answer(response)
    answers.append(final_answer)

result = majority_vote(answers)

Why does this work? Correct reasoning paths are more concentrated in the probability distribution than incorrect ones. The model can reach the wrong answer through many different faulty paths, but correct answers cluster around fewer paths. Majority voting across diverse chains amplifies the signal of correct reasoning.

Self-consistency improves accuracy by 5–15 percentage points on math and reasoning benchmarks over single-chain CoT. The cost is real — you are running the model 5–20x — so reserve it for high-stakes single questions where latency and cost are acceptable tradeoffs.

Least-to-Most Prompting

CoT works well for single-question reasoning but struggles when a problem has dependencies between subproblems — where you cannot solve step 4 without the answer to step 2.

Least-to-most prompting (Zhou et al., 2022) decomposes complex problems explicitly:

First, prompt the model to decompose the problem into ordered subproblems
Solve the easiest/earliest subproblem
Use that answer as context to solve the next subproblem
Continue until the full problem is solved

text

Problem: A company's revenue grew 20% in year 1, then fell 15% in year 2,
then grew 30% in year 3. If the starting revenue was $500,000,
what was the revenue at the end of year 3?

Step 1: What are the subproblems we need to solve in order?

Model output:

snippet

Subproblem 1: Calculate revenue after year 1 (20% growth from $500,000)
Subproblem 2: Calculate revenue after year 2 (15% decline from year 1 result)
Subproblem 3: Calculate revenue after year 3 (30% growth from year 2 result)

Then you feed each subproblem sequentially, using the previous answer as input to the next. The result is significantly more reliable than asking the model to solve the full problem in one CoT pass, because each subproblem is simpler and the intermediate answers are grounded in explicit, verified prior steps.

Least-to-most excels at:

Multi-step word problems with numerical dependencies
Code that requires designing a solution before implementing it
Research questions that require answering sub-questions in order
Planning tasks where early decisions constrain later options

Tree of Thoughts: The Model as a Search Algorithm

Tree of Thoughts (ToT) (Yao et al., 2023) extends CoT from a linear chain to a tree structure. Instead of following one reasoning path to completion, the model:

Generates multiple candidate next steps at each decision point
Evaluates which candidates are most promising
Expands promising branches further
Backtracks from dead ends
Returns the best complete path found

This turns the model into a search algorithm over reasoning space — closer to how humans solve hard problems (exploring options, abandoning bad approaches, returning to forks in the road) than to how standard CoT works (committing to a path and following it straight through).

ToT is overkill for most tasks but genuinely valuable for:

Creative writing with specific constraints (word games, constrained poetry)
Mathematical proofs where multiple approaches must be evaluated
Planning problems with many interdependent decisions
Any task where a single wrong turn early in reasoning invalidates the whole answer

The implementation complexity is higher than other techniques — you need to orchestrate multiple model calls, implement the branching and evaluation logic, and manage a growing tree of partial solutions. In 2026, several agent frameworks expose ToT-style reasoning natively, reducing the engineering overhead.

Decision Table: Which Technique for Which Task

Task Type	Recommended Technique	Why
Sentiment classification	Zero-shot	Well-represented in pre-training; no special format
Custom-category classification	Few-shot	Model needs examples of non-standard labels
Extraction with rigid JSON schema	One-shot or few-shot	Schema needs anchoring via example
Summarisation	Zero-shot	Strong pre-training signal; format is flexible
Multi-step arithmetic	Zero-shot CoT	"Let's think step by step" is sufficient
Complex logic / syllogisms	Few-shot CoT	Domain reasoning chains clarify the reasoning style
Translation (major languages)	Zero-shot	Abundant pre-training data
Translation (rare languages or style)	Few-shot	Examples anchor the target register and style
High-stakes single question	Self-consistency	Majority vote over multiple chains
Multi-step with dependencies	Least-to-most	Sequential subproblem solving
Open-ended problem with many approaches	Tree of Thoughts	Branching exploration over reasoning space
Format-sensitive output	Few-shot	Examples are the most reliable format anchor
Code generation (algorithmic)	Zero-shot or CoT	Problem decomposition helps; examples often unnecessary
Named-entity extraction	Zero-shot	Standard task; add few-shot only if entity types are unusual

Technique Comparison at a Glance

Dimension	Zero-Shot	One-Shot	Few-Shot	CoT (Zero-Shot)	Few-Shot CoT	Self-Consistency
Token cost	Lowest	Low	Medium	Low	Medium-High	Very High
Latency	Lowest	Low	Low	Low	Low	Very High
Format control	Poor	Good	Excellent	Poor	Good	Good
Reasoning accuracy	Baseline	Baseline	Baseline	+++	++++	+++++
Example writing effort	None	Minimal	Medium	None	High	None
Best for	Standard tasks	Anchoring format	Custom labels/schemas	Multi-step reasoning	Domain reasoning	High-stakes single Q

The 2026 Context: When the Model Does It For You

Modern frontier models have changed the calculus around manual prompting techniques. Claude's extended thinking mode, GPT-5's reasoning settings, and similar features in other models apply chain-of-thought-style reasoning automatically before producing a response. For many tasks, turning on the model's thinking mode eliminates the need to manually add "Let's think step by step."

This does not make understanding CoT obsolete — it makes it more important. When you understand why CoT works, you can:

Diagnose when a thinking mode is failing (wrong reasoning style for the domain)
Write few-shot CoT examples for domain-specific reasoning the model otherwise handles poorly
Choose the right effort level / thinking budget for a given task

For more on selecting between reasoning modes and model variants, see Claude Effort Parameter and Model Selection.

The broader picture is context engineering — assembling everything the model sees (examples, retrieved documents, tool definitions, constraints) into the tightest, most signal-dense package possible. Few-shot prompting is one component of that assembly. See Context Engineering: Why Clean Prompts Matter for the full stack.

If you are building agent systems that call models in loops — where each call's output becomes the next call's input — the prompting techniques here are applied at every node of the loop. The Agent Harness Complete Guide covers how to structure those loops so examples and reasoning chains are passed effectively across steps.

For Claude-specific prompting patterns including the 4-block structure and XML conventions, Master Prompt Engineering with Claude covers the model-specific conventions that amplify everything discussed here.

Common Mistakes and How to Fix Them

Using few-shot when zero-shot is sufficient. If a task is standard (summarisation, translation, simple classification), few-shot examples add token cost without meaningful accuracy gains. Try zero-shot first; only switch to few-shot if quality is consistently off.

Picking examples that are too similar to each other. Five examples of the same type of BILLING ticket teaches the model less than one each of five different billing scenarios. Diversity in examples matters more than quantity.

Inconsistent example formatting. Mixing Answer:, Output:, and Result: as output labels across examples confuses the model about what to produce. Pick one and use it consistently across every example.

Adding CoT to tasks that don't need reasoning. Chain-of-thought adds tokens and sometimes hurts performance on tasks that are primarily retrieval (factual questions where the answer is a single entity) or creative generation (where reasoning chains disrupt fluency). Apply CoT selectively.

Forgetting to extract the final answer in CoT. When using CoT, specify where the final answer should appear: "After your reasoning, provide the final answer on a line beginning with 'Final answer:'". Without this, parsing the answer from the reasoning chain programmatically is brittle.

Using self-consistency for every query. Self-consistency is 5–20x more expensive than a single call. It is worth the cost for high-stakes single questions; it is not worth it for bulk classification, extraction, or any task where a single well-prompted pass is already reliable.

Putting It Together: A Practical Workflow

When approaching a new task, work through this sequence:

Start zero-shot. Write the clearest possible instruction and try it on 10–20 representative inputs. If output quality is consistently good and format is correct, stop here.
Add one example if format is wrong. If the model produces the right content but wrong format, one-shot is usually enough to anchor the schema. If one-shot doesn't fix it, move to few-shot.
Move to few-shot if labels or categories are wrong. If the model is misclassifying, extracting wrong fields, or ignoring your label space, add 5–10 carefully selected examples covering the full distribution.
Add CoT if multi-step reasoning is failing. If the task requires arithmetic, logic, or sequential dependencies, add "Let's think step by step" (zero-shot CoT) or write explicit reasoning chains into your examples (few-shot CoT).
Add self-consistency for high-stakes single questions. If you need maximum accuracy on a specific question and cost/latency are acceptable, run self-consistency with 5–10 samples and take the majority vote.
Consider least-to-most or ToT for complex structured problems. If CoT still fails because the problem has deep dependency structure or requires exploration, move to decomposition-based approaches.

This workflow avoids over-engineering. Most production tasks resolve at step 1 or 2. The more exotic techniques (self-consistency, ToT, least-to-most) are precision tools for specific hard problems, not defaults.

Example Prompt Library

A compact reference of ready-to-use prompts for each technique.

Zero-Shot Translation

text

Translate the following English text to formal Brazilian Portuguese.
Preserve technical terms in English.

Text: "The API endpoint accepts a JSON payload with a required 'query' field and an optional 'filters' array."

Translation:

Zero-Shot CoT for Logic

text

Determine whether the following argument is logically valid or invalid. Explain your reasoning step by step before giving your final verdict.

Argument: "All engineers know Python. Sarah knows Python. Therefore, Sarah is an engineer."

Let's think step by step.

Few-Shot Classification

text

Classify the priority of the following bug reports as P0 (production down), P1 (major feature broken), P2 (minor issue), or P3 (cosmetic).

Report: "Users cannot complete checkout — payment form throws a 500 error."
Priority: P0

Report: "The bulk export feature crashes for files over 100MB."
Priority: P1

Report: "Tooltip text on the settings page is truncated on smaller screens."
Priority: P3

Report: "Dark mode toggle doesn't persist across sessions."
Priority: P2

Now classify:
Report: "Search results return in random order instead of relevance order."
Priority:

Few-Shot CoT for Reasoning

text

Solve the following rate problems. Show your work step by step before giving the final answer.

Problem: A train travels at 60 mph. How long does it take to travel 150 miles?
Reasoning: Time = Distance ÷ Speed = 150 miles ÷ 60 mph = 2.5 hours.
Answer: 2.5 hours

Problem: A worker completes a task in 4 hours. A second worker completes the same task in 6 hours. How long does it take them working together?
Reasoning: Worker 1 completes 1/4 of the task per hour. Worker 2 completes 1/6 of the task per hour. Together: 1/4 + 1/6 = 3/12 + 2/12 = 5/12 per hour. Time = 1 ÷ (5/12) = 12/5 = 2.4 hours.
Answer: 2.4 hours

Problem: A pump fills a tank in 8 hours. A drain empties the same tank in 12 hours. If both are running simultaneously with the tank starting full, how long until the tank is empty?
Reasoning:

Understanding these techniques at the mechanism level — not just as named methods but as specific manipulations of the model's probability distribution — is what separates practitioners who can debug failing prompts from those who can only follow recipes. The techniques are composable: few-shot with CoT, self-consistency over few-shot CoT chains, least-to-most with CoT at each subproblem step. The decision about which combination to apply follows from understanding what each one actually does.

Update — July 7, 2026: Wharton's Prompting Science Report 2 finds explicit CoT adds marginal gains on reasoning models and 20–80% latency cost — pair this guide with specs, rubrics, and eval loops, not more "think step by step" boilerplate.

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

Hands-on techniques for zero-shot, few-shot, and chain-of-thought prompting across real tasks.

What In-Context Learning Actually Is

Before comparing techniques, it is worth being precise about the mechanism underlying all of them.

Every prompting technique in this guide is a different way of exploiting this mechanism.

Zero-Shot Prompting

What It Is

text

Classify the sentiment of the following customer review as Positive, Negative, or Neutral.

Review: "The packaging was damaged but the product itself works perfectly."

Sentiment:

That is a zero-shot prompt. No examples of what Positive, Negative, or Neutral reviews look like. No sample outputs. Just the task and the input.

Why It Works (When It Does)

Zero-shot also has a practical advantage: no examples means no token overhead, faster iteration, and no risk of examples biasing the model toward patterns that do not generalise.

When It Fails

Zero-shot fails in three predictable scenarios:

Zero-Shot Summarisation Example

text

Summarise the following technical incident report in exactly three bullet points.
Each bullet must start with a bolded category label: **Cause**, **Impact**, **Resolution**.
Do not include any additional text before or after the bullets.

Incident report:
On March 14 at 02:17 UTC, the authentication service began returning 503 errors
due to a misconfigured load balancer rule deployed at 01:45 UTC. Approximately
12,000 users were unable to log in for 43 minutes. The on-call engineer identified
the faulty rule at 02:58 UTC and rolled it back. Service restored at 03:00 UTC.
Full post-mortem scheduled for March 17.

Summary:

One-Shot Prompting

One-shot prompting adds exactly one example before the task. It occupies the space between zero-shot and few-shot: cheaper than a full few-shot battery, but more reliable than no examples at all.

The example functions as a template, not just an instruction. You are showing the model the exact shape of the output you want.

text

Classify the sentiment of the customer review. Output a JSON object with keys
"sentiment" (Positive/Negative/Neutral), "confidence" (0.0–1.0), and "reason" (one sentence).

Example:
Review: "Arrived two days late but the quality is outstanding."
Output: {"sentiment": "Positive", "confidence": 0.75, "reason": "Product quality outweighs delivery delay in the customer's assessment."}

Now classify:
Review: "The customer service team refused to process my refund despite the item being defective."
Output:

Use one-shot when:

The output format is unusual or precise
The task is straightforward but the schema needs anchoring
You cannot afford the token cost of multiple examples

Few-Shot Prompting

What It Is

Concrete Few-Shot Classification Prompt

text

You are a support ticket router. Classify each ticket into exactly one category:
BILLING, TECHNICAL, FEATURE_REQUEST, or ACCOUNT_ACCESS.

---
Ticket: "I was charged twice for my subscription this month."
Category: BILLING

Ticket: "The export to CSV function crashes when there are more than 500 rows."
Category: TECHNICAL

Ticket: "It would be great if we could schedule reports to run automatically."
Category: FEATURE_REQUEST

Ticket: "I can't log in — it says my account has been suspended but I haven't received any email."
Category: ACCOUNT_ACCESS

Ticket: "My invoice shows a different amount than what I agreed to in the contract."
Category: BILLING

Ticket: "The API keeps returning a 429 error even though I'm well under my rate limit."
Category: TECHNICAL
---

Now classify the following ticket:
Ticket: "I need to transfer my account to a different email address."
Category:

What Makes a Good Few-Shot Example

This is where most practitioners make mistakes. The quality of examples matters far more than the quantity.

Few-Shot Extraction Prompt

text

Extract structured data from the job posting. Output JSON only, no prose.

---
Posting: "Senior Backend Engineer at Moonshot Labs — Remote (US only) — $180k–$220k — 7+ years Python, Postgres, Kafka required — Apply by July 31"
Output: {"title": "Senior Backend Engineer", "company": "Moonshot Labs", "location": "Remote (US only)", "salary_range": "$180k–$220k", "required_skills": ["Python", "Postgres", "Kafka"], "min_experience_years": 7, "application_deadline": "July 31"}

Posting: "Product Designer, Fintech — Hybrid (London) — Competitive salary — 3–5 yrs experience, Figma expert, fintech background preferred — No specified deadline"
Output: {"title": "Product Designer", "company": null, "location": "Hybrid (London)", "salary_range": null, "required_skills": ["Figma"], "min_experience_years": 3, "application_deadline": null}
---

Posting: "Staff ML Engineer at VectorIQ — San Francisco, CA — $250k–$300k total comp — 10+ years experience, PyTorch, LLM fine-tuning, distributed systems — Rolling applications"
Output:

Chain-of-Thought Prompting

The Core Idea

Chain-of-thought (CoT) prompting, introduced by Wei et al. in 2022, works on a simple insight: if you want the model to reason correctly, make it show its reasoning.

Why does generating intermediate steps help? Several mechanisms are at work:

Sequential correction. Each generated token conditions all subsequent tokens. When the model writes out a correct intermediate step, the next step is more likely to be correct. Errors propagate less because each step is grounded in a written, verifiable intermediate.
Computation vs recall. A model producing a direct answer to a math problem must "compress" all arithmetic into a single prediction. A model writing out calculations can effectively perform the arithmetic token by token, which is far more reliable.
Longer effective computation. Transformer models have fixed depth — the number of layers limits how much "thinking" can happen for a single forward pass. But when the model generates a long reasoning chain, it effectively performs deeper computation across many forward passes, one per token.

Zero-Shot Chain-of-Thought

The simplest CoT technique requires no examples at all:

text

A store sells apples for $0.50 each and oranges for $0.75 each.
Emily buys 8 apples and 5 oranges. She pays with a $10 bill.
How much change does she receive?

Let's think step by step.

The phrase "Let's think step by step" is the entire intervention. The model will produce something like:

snippet

Step 1: Calculate the cost of apples.
8 apples × $0.50 = $4.00

Step 2: Calculate the cost of oranges.
5 oranges × $0.75 = $3.75

Step 3: Calculate the total cost.
$4.00 + $3.75 = $7.75

Step 4: Calculate the change.
$10.00 − $7.75 = $2.25

Emily receives $2.25 in change.

Without "Let's think step by step", the model is more likely to collapse the arithmetic and make an error. With it, the model traces each step, which catches arithmetic mistakes before they compound.

Alternatives that produce similar effects:

"Walk me through your reasoning before answering."
"Show your work."
"First, think about what information you have. Then reason to the answer."
"Think carefully before giving your final answer."

Few-Shot Chain-of-Thought

Few-shot CoT combines example provision with explicit reasoning chains. You write out not just the correct answer but the correct reasoning path, and the model learns to follow it.

text

Answer the following logic questions. Think step by step before giving the final answer.

---
Question: All mammals are warm-blooded. Dolphins are mammals. Are dolphins warm-blooded?
Reasoning: The first premise tells us all mammals share the property of being warm-blooded. The second premise establishes that dolphins belong to the category of mammals. Since dolphins are mammals, and all mammals are warm-blooded, dolphins must be warm-blooded.
Answer: Yes, dolphins are warm-blooded.

Question: If it rains, the ground gets wet. The ground is wet. Did it rain?
Reasoning: The premise only tells us that rain causes wet ground — if rain then wet ground. But wet ground can have other causes (sprinklers, flooding, spilled water). Wet ground is consistent with rain but does not prove rain was the cause. This is a logical fallacy known as affirming the consequent.
Answer: Not necessarily. The ground being wet is consistent with rain but does not prove it rained.
---

Question: No reptiles are warm-blooded. Snakes are reptiles. Are snakes warm-blooded?
Reasoning:

The few-shot reasoning chains show the model how to handle both valid syllogisms and logical fallacies. The model learns both the answer format and the type of reasoning to apply.

When to Use CoT vs Standard Prompting

Task type	CoT helpful?	Reason
Multi-step arithmetic	Yes, strongly	Sequential correction of arithmetic errors
Logical deduction	Yes, strongly	Forces structured reasoning over recall
Factual recall	No	CoT adds tokens without improving accuracy
Simple classification	No	No reasoning steps involved
Creative writing	Rarely	Reasoning chains disrupt fluency
Code generation	Sometimes	Helps for algorithmic problems, not boilerplate
Summarisation	No	Task doesn't require stepwise reasoning

Self-Consistency: CoT With a Vote

The procedure:

Run the same prompt 5–20 times at a slightly elevated temperature (0.5–0.8)
Each run produces a different reasoning chain
Extract the final answer from each chain
Return the most common final answer

python

# Pseudocode for self-consistency
answers = []
for _ in range(10):
    response = model.generate(prompt + "\nLet's think step by step.", temperature=0.7)
    final_answer = extract_answer(response)
    answers.append(final_answer)

result = majority_vote(answers)

Least-to-Most Prompting

CoT works well for single-question reasoning but struggles when a problem has dependencies between subproblems — where you cannot solve step 4 without the answer to step 2.

Least-to-most prompting (Zhou et al., 2022) decomposes complex problems explicitly:

First, prompt the model to decompose the problem into ordered subproblems
Solve the easiest/earliest subproblem
Use that answer as context to solve the next subproblem
Continue until the full problem is solved

text

Problem: A company's revenue grew 20% in year 1, then fell 15% in year 2,
then grew 30% in year 3. If the starting revenue was $500,000,
what was the revenue at the end of year 3?

Step 1: What are the subproblems we need to solve in order?

Model output:

snippet

Subproblem 1: Calculate revenue after year 1 (20% growth from $500,000)
Subproblem 2: Calculate revenue after year 2 (15% decline from year 1 result)
Subproblem 3: Calculate revenue after year 3 (30% growth from year 2 result)

Least-to-most excels at:

Multi-step word problems with numerical dependencies
Code that requires designing a solution before implementing it
Research questions that require answering sub-questions in order
Planning tasks where early decisions constrain later options

Tree of Thoughts: The Model as a Search Algorithm

Tree of Thoughts (ToT) (Yao et al., 2023) extends CoT from a linear chain to a tree structure. Instead of following one reasoning path to completion, the model:

Generates multiple candidate next steps at each decision point
Evaluates which candidates are most promising
Expands promising branches further
Backtracks from dead ends
Returns the best complete path found

ToT is overkill for most tasks but genuinely valuable for:

Creative writing with specific constraints (word games, constrained poetry)
Mathematical proofs where multiple approaches must be evaluated
Planning problems with many interdependent decisions
Any task where a single wrong turn early in reasoning invalidates the whole answer

Decision Table: Which Technique for Which Task

Task Type	Recommended Technique	Why
Sentiment classification	Zero-shot	Well-represented in pre-training; no special format
Custom-category classification	Few-shot	Model needs examples of non-standard labels
Extraction with rigid JSON schema	One-shot or few-shot	Schema needs anchoring via example
Summarisation	Zero-shot	Strong pre-training signal; format is flexible
Multi-step arithmetic	Zero-shot CoT	"Let's think step by step" is sufficient
Complex logic / syllogisms	Few-shot CoT	Domain reasoning chains clarify the reasoning style
Translation (major languages)	Zero-shot	Abundant pre-training data
Translation (rare languages or style)	Few-shot	Examples anchor the target register and style
High-stakes single question	Self-consistency	Majority vote over multiple chains
Multi-step with dependencies	Least-to-most	Sequential subproblem solving
Open-ended problem with many approaches	Tree of Thoughts	Branching exploration over reasoning space
Format-sensitive output	Few-shot	Examples are the most reliable format anchor
Code generation (algorithmic)	Zero-shot or CoT	Problem decomposition helps; examples often unnecessary
Named-entity extraction	Zero-shot	Standard task; add few-shot only if entity types are unusual

Technique Comparison at a Glance

Dimension	Zero-Shot	One-Shot	Few-Shot	CoT (Zero-Shot)	Few-Shot CoT	Self-Consistency
Token cost	Lowest	Low	Medium	Low	Medium-High	Very High
Latency	Lowest	Low	Low	Low	Low	Very High
Format control	Poor	Good	Excellent	Poor	Good	Good
Reasoning accuracy	Baseline	Baseline	Baseline	+++	++++	+++++
Example writing effort	None	Minimal	Medium	None	High	None
Best for	Standard tasks	Anchoring format	Custom labels/schemas	Multi-step reasoning	Domain reasoning	High-stakes single Q

The 2026 Context: When the Model Does It For You

This does not make understanding CoT obsolete — it makes it more important. When you understand why CoT works, you can:

Diagnose when a thinking mode is failing (wrong reasoning style for the domain)
Write few-shot CoT examples for domain-specific reasoning the model otherwise handles poorly
Choose the right effort level / thinking budget for a given task

For more on selecting between reasoning modes and model variants, see Claude Effort Parameter and Model Selection.

Common Mistakes and How to Fix Them

Putting It Together: A Practical Workflow

When approaching a new task, work through this sequence:

Start zero-shot. Write the clearest possible instruction and try it on 10–20 representative inputs. If output quality is consistently good and format is correct, stop here.
Add one example if format is wrong. If the model produces the right content but wrong format, one-shot is usually enough to anchor the schema. If one-shot doesn't fix it, move to few-shot.
Move to few-shot if labels or categories are wrong. If the model is misclassifying, extracting wrong fields, or ignoring your label space, add 5–10 carefully selected examples covering the full distribution.
Add CoT if multi-step reasoning is failing. If the task requires arithmetic, logic, or sequential dependencies, add "Let's think step by step" (zero-shot CoT) or write explicit reasoning chains into your examples (few-shot CoT).
Add self-consistency for high-stakes single questions. If you need maximum accuracy on a specific question and cost/latency are acceptable, run self-consistency with 5–10 samples and take the majority vote.
Consider least-to-most or ToT for complex structured problems. If CoT still fails because the problem has deep dependency structure or requires exploration, move to decomposition-based approaches.

Example Prompt Library

A compact reference of ready-to-use prompts for each technique.

Zero-Shot Translation

text

Translate the following English text to formal Brazilian Portuguese.
Preserve technical terms in English.

Text: "The API endpoint accepts a JSON payload with a required 'query' field and an optional 'filters' array."

Translation:

Zero-Shot CoT for Logic

text

Determine whether the following argument is logically valid or invalid. Explain your reasoning step by step before giving your final verdict.

Argument: "All engineers know Python. Sarah knows Python. Therefore, Sarah is an engineer."

Let's think step by step.

Few-Shot Classification

text

Classify the priority of the following bug reports as P0 (production down), P1 (major feature broken), P2 (minor issue), or P3 (cosmetic).

Report: "Users cannot complete checkout — payment form throws a 500 error."
Priority: P0

Report: "The bulk export feature crashes for files over 100MB."
Priority: P1

Report: "Tooltip text on the settings page is truncated on smaller screens."
Priority: P3

Report: "Dark mode toggle doesn't persist across sessions."
Priority: P2

Now classify:
Report: "Search results return in random order instead of relevance order."
Priority:

Few-Shot CoT for Reasoning

text

Solve the following rate problems. Show your work step by step before giving the final answer.

Problem: A train travels at 60 mph. How long does it take to travel 150 miles?
Reasoning: Time = Distance ÷ Speed = 150 miles ÷ 60 mph = 2.5 hours.
Answer: 2.5 hours

Problem: A worker completes a task in 4 hours. A second worker completes the same task in 6 hours. How long does it take them working together?
Reasoning: Worker 1 completes 1/4 of the task per hour. Worker 2 completes 1/6 of the task per hour. Together: 1/4 + 1/6 = 3/12 + 2/12 = 5/12 per hour. Time = 1 ÷ (5/12) = 12/5 = 2.4 hours.
Answer: 2.4 hours

Problem: A pump fills a tank in 8 hours. A drain empties the same tank in 12 hours. If both are running simultaneously with the tank starting full, how long until the tank is empty?
Reasoning:

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

What In-Context Learning Actually Is

Zero-Shot Prompting

What It Is

Why It Works (When It Does)

When It Fails

Zero-Shot Summarisation Example

One-Shot Prompting

Few-Shot Prompting

What It Is

Concrete Few-Shot Classification Prompt

What Makes a Good Few-Shot Example

Few-Shot Extraction Prompt

Chain-of-Thought Prompting

The Core Idea

Zero-Shot Chain-of-Thought

Few-Shot Chain-of-Thought

When to Use CoT vs Standard Prompting

Self-Consistency: CoT With a Vote

Least-to-Most Prompting

Tree of Thoughts: The Model as a Search Algorithm

Decision Table: Which Technique for Which Task

Technique Comparison at a Glance

The 2026 Context: When the Model Does It For You

Common Mistakes and How to Fix Them

Putting It Together: A Practical Workflow

Example Prompt Library

Zero-Shot Translation

Zero-Shot CoT for Logic

Few-Shot Classification

Few-Shot CoT for Reasoning

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

What In-Context Learning Actually Is

Zero-Shot Prompting

What It Is

Why It Works (When It Does)

When It Fails

Zero-Shot Summarisation Example

One-Shot Prompting

Few-Shot Prompting

What It Is

Concrete Few-Shot Classification Prompt

What Makes a Good Few-Shot Example

Few-Shot Extraction Prompt

Chain-of-Thought Prompting

The Core Idea

Zero-Shot Chain-of-Thought

Few-Shot Chain-of-Thought

When to Use CoT vs Standard Prompting

Self-Consistency: CoT With a Vote

Least-to-Most Prompting

Tree of Thoughts: The Model as a Search Algorithm

Decision Table: Which Technique for Which Task

Technique Comparison at a Glance

The 2026 Context: When the Model Does It For You

Common Mistakes and How to Fix Them

Putting It Together: A Practical Workflow

Example Prompt Library

Zero-Shot Translation

Zero-Shot CoT for Logic

Few-Shot Classification

Few-Shot CoT for Reasoning

Related posts

What Is a System Prompt? The Hidden Instructions That Shape Every AI Response

Temperature, Top-P, and Top-K in LLMs: The Complete Sampling Guide (2026)

Top 10 Claude Opus 5 Game Prompts (With the Actual Prompts)

Related posts

What Is a System Prompt? The Hidden Instructions That Shape Every AI Response

Temperature, Top-P, and Top-K in LLMs: The Complete Sampling Guide (2026)

Top 10 Claude Opus 5 Game Prompts (With the Actual Prompts)