← Blog
explainx / blog

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

A comprehensive deep-dive into zero-shot, one-shot, few-shot, and chain-of-thought prompting techniques — with concrete examples, decision tables, and 2026 context on self-consistency, Tree of Thought, and least-to-most prompting.

19 min readYash Thakker
Prompt EngineeringZero-ShotFew-ShotChain-of-ThoughtAI Fundamentals

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

Zero-Shot vs Few-Shot vs Chain-of-Thought Prompting: Complete Guide 2026

There is a gap between knowing that prompt techniques exist and knowing which one to reach for in a specific situation. Zero-shot, few-shot, chain-of-thought, self-consistency, Tree of Thought — the vocabulary has expanded faster than the practical guidance around it.

This guide closes that gap. Each technique is explained from first principles, illustrated with concrete prompts you can copy, and placed in a decision framework that tells you when to use which. By the end, you will understand not just what each technique does but why it works — which is the only knowledge that transfers to new tasks.

Hands-on techniques for zero-shot, few-shot, and chain-of-thought prompting across real tasks.

What In-Context Learning Actually Is

Before comparing techniques, it is worth being precise about the mechanism underlying all of them.

In-context learning is the ability of a sufficiently large language model to adapt its behaviour based on examples provided within the prompt itself — without any change to the model's weights. You do not need to retrain, fine-tune, or even call a training API. You include examples, and the model adjusts.

This emerged as a surprise at scale. Early language models did not exhibit it. GPT-3 (2020) was among the first to demonstrate it convincingly: add a few labelled examples to the prompt, and performance on classification tasks jumps dramatically — sometimes matching fine-tuned models that saw thousands of examples.

Why does it happen? The dominant explanation is that during pre-training on enormous text corpora, the model encountered millions of patterns where a sequence of examples was followed by a natural continuation. It learned, at a statistical level, to infer the task from the examples and produce the right continuation. The examples in your prompt trigger this latent capability.

Critically: the model is not learning in the training sense. Weights do not update. What happens is pattern-matching at inference time. The examples narrow the model's probability distribution over outputs, steering it toward the format and label space you want. This distinction matters because it tells you what in-context learning can and cannot do: it can nudge a capable model toward a format; it cannot teach a model something it has no pre-training signal for.

Every prompting technique in this guide is a different way of exploiting this mechanism.


Zero-Shot Prompting

What It Is

Zero-shot prompting means giving the model a task description and asking it to perform the task — with no examples. The model relies entirely on knowledge and capability accumulated during pre-training.

Classify the sentiment of the following customer review as Positive, Negative, or Neutral.

Review: "The packaging was damaged but the product itself works perfectly."

Sentiment:

That is a zero-shot prompt. No examples of what Positive, Negative, or Neutral reviews look like. No sample outputs. Just the task and the input.

Why It Works (When It Does)

Zero-shot works when the task is well-represented in the model's pre-training data. Sentiment classification, translation between major languages, summarisation of standard documents, named-entity extraction, basic code generation — these are tasks the model has encountered in thousands of variations. The instruction alone is enough to activate the right behaviour.

Zero-shot also has a practical advantage: no examples means no token overhead, faster iteration, and no risk of examples biasing the model toward patterns that do not generalise.

When It Fails

Zero-shot fails in three predictable scenarios:

Unusual output format. If you want JSON with a specific schema — {"sentiment": "Negative", "confidence": 0.8, "reason": "..."} — zero-shot often produces something close but not exact. The model has a strong prior toward prose output, and without an example, it approximates rather than follows precisely.

Non-standard label space. Standard classification labels like Positive/Negative/Neutral work zero-shot because they are canonical. But if your categories are company-specific (BILLING_DISPUTE, FEATURE_REQUEST_PREMIUM, CHURN_RISK), the model has no training signal for them. Zero-shot will guess, usually poorly.

Strong conflicting priors. If the model's pre-training strongly associates a surface pattern with a different output, zero-shot cannot override that. Asking the model to classify a clearly sarcastic negative review as "Positive" because it is from a loyalty customer requires a nuanced instruction that zero-shot often cannot sustain.

Zero-Shot Summarisation Example

Summarise the following technical incident report in exactly three bullet points.
Each bullet must start with a bolded category label: **Cause**, **Impact**, **Resolution**.
Do not include any additional text before or after the bullets.

Incident report:
On March 14 at 02:17 UTC, the authentication service began returning 503 errors
due to a misconfigured load balancer rule deployed at 01:45 UTC. Approximately
12,000 users were unable to log in for 43 minutes. The on-call engineer identified
the faulty rule at 02:58 UTC and rolled it back. Service restored at 03:00 UTC.
Full post-mortem scheduled for March 17.

Summary:

This will work reliably zero-shot because summarisation and bullet formatting are deeply represented in pre-training. But notice that even here, the prompt is precise about count, format, and constraints — zero-shot does not mean vague.


One-Shot Prompting

One-shot prompting adds exactly one example before the task. It occupies the space between zero-shot and few-shot: cheaper than a full few-shot battery, but more reliable than no examples at all.

The example functions as a template, not just an instruction. You are showing the model the exact shape of the output you want.

Classify the sentiment of the customer review. Output a JSON object with keys
"sentiment" (Positive/Negative/Neutral), "confidence" (0.0–1.0), and "reason" (one sentence).

Example:
Review: "Arrived two days late but the quality is outstanding."
Output: {"sentiment": "Positive", "confidence": 0.75, "reason": "Product quality outweighs delivery delay in the customer's assessment."}

Now classify:
Review: "The customer service team refused to process my refund despite the item being defective."
Output:

The single example has done something the instruction alone could not: it established the exact JSON schema, the confidence range, and the expected length and tone of the reason field. The model now has a concrete target to match.

Use one-shot when:

  • The output format is unusual or precise
  • The task is straightforward but the schema needs anchoring
  • You cannot afford the token cost of multiple examples

Few-Shot Prompting

What It Is

Few-shot prompting provides 2–20 worked examples before the task. Each example shows an input and the correct output, letting the model infer the task, label space, format, and edge-case handling simultaneously.

Few-shot is the workhorse of production prompt engineering. For classification tasks, extraction tasks, and any task with a rigid output schema, few-shot consistently outperforms zero-shot by a wide margin.

Concrete Few-Shot Classification Prompt

You are a support ticket router. Classify each ticket into exactly one category:
BILLING, TECHNICAL, FEATURE_REQUEST, or ACCOUNT_ACCESS.

---
Ticket: "I was charged twice for my subscription this month."
Category: BILLING

Ticket: "The export to CSV function crashes when there are more than 500 rows."
Category: TECHNICAL

Ticket: "It would be great if we could schedule reports to run automatically."
Category: FEATURE_REQUEST

Ticket: "I can't log in — it says my account has been suspended but I haven't received any email."
Category: ACCOUNT_ACCESS

Ticket: "My invoice shows a different amount than what I agreed to in the contract."
Category: BILLING

Ticket: "The API keeps returning a 429 error even though I'm well under my rate limit."
Category: TECHNICAL
---

Now classify the following ticket:
Ticket: "I need to transfer my account to a different email address."
Category:

Six examples cover all four categories, include two BILLING examples (the more common class), and cover format-edge cases like billing vs account disputes. The model now has both the label space and the boundary between categories.

What Makes a Good Few-Shot Example

This is where most practitioners make mistakes. The quality of examples matters far more than the quantity.

Label distribution must reflect reality. If 70% of your actual inputs are TECHNICAL tickets, your examples should be roughly 70% TECHNICAL. If you show equal examples for each label but the real distribution is skewed, the model will be poorly calibrated. It will spread its predictions more evenly than reality warrants.

Diversity beats repetition. Two identical BILLING examples teach the model less than one standard billing example and one edge-case billing example (e.g., a refund request vs a pricing dispute). Cover the range of your input space, not just the easy centre.

Format consistency is non-negotiable. If your first three examples use Category: as the output label and example four uses Label:, the model will notice the inconsistency and may waver on which to use. Every example must follow identical formatting — same delimiters, same key names, same capitalisation.

Length should be controlled. Long examples consume context window budget that could be spent on the actual task. If an input can be shortened without losing the feature that makes it a good example, shorten it. On long-context models this matters less, but it is still good hygiene.

Include edge cases deliberately. The easy examples are the ones the model would get right anyway. The examples that move the needle are the ones at the decision boundary: a customer complaint that is simultaneously a billing issue and an account access issue, a review that is sarcastic positive. These boundary examples teach the model where you draw the line.

Few-Shot Extraction Prompt

Extract structured data from the job posting. Output JSON only, no prose.

---
Posting: "Senior Backend Engineer at Moonshot Labs — Remote (US only) — $180k–$220k — 7+ years Python, Postgres, Kafka required — Apply by July 31"
Output: {"title": "Senior Backend Engineer", "company": "Moonshot Labs", "location": "Remote (US only)", "salary_range": "$180k–$220k", "required_skills": ["Python", "Postgres", "Kafka"], "min_experience_years": 7, "application_deadline": "July 31"}

Posting: "Product Designer, Fintech — Hybrid (London) — Competitive salary — 3–5 yrs experience, Figma expert, fintech background preferred — No specified deadline"
Output: {"title": "Product Designer", "company": null, "location": "Hybrid (London)", "salary_range": null, "required_skills": ["Figma"], "min_experience_years": 3, "application_deadline": null}
---

Posting: "Staff ML Engineer at VectorIQ — San Francisco, CA — $250k–$300k total comp — 10+ years experience, PyTorch, LLM fine-tuning, distributed systems — Rolling applications"
Output:

Notice the second example deliberately includes nulls — no company name, no salary, no deadline. Without that example, the model would invent values rather than return null. Edge cases in examples prevent hallucination of missing fields.


Chain-of-Thought Prompting

The Core Idea

Chain-of-thought (CoT) prompting, introduced by Wei et al. in 2022, works on a simple insight: if you want the model to reason correctly, make it show its reasoning.

The original finding was striking. On multi-step math and logic benchmarks, large models performed dramatically better when prompted to reason step by step before giving the final answer. On some benchmarks, accuracy more than doubled. The technique worked not just with few-shot examples but with a single zero-shot addition: "Let's think step by step."

Why does generating intermediate steps help? Several mechanisms are at work:

  1. Sequential correction. Each generated token conditions all subsequent tokens. When the model writes out a correct intermediate step, the next step is more likely to be correct. Errors propagate less because each step is grounded in a written, verifiable intermediate.

  2. Computation vs recall. A model producing a direct answer to a math problem must "compress" all arithmetic into a single prediction. A model writing out calculations can effectively perform the arithmetic token by token, which is far more reliable.

  3. Longer effective computation. Transformer models have fixed depth — the number of layers limits how much "thinking" can happen for a single forward pass. But when the model generates a long reasoning chain, it effectively performs deeper computation across many forward passes, one per token.

Zero-Shot Chain-of-Thought

The simplest CoT technique requires no examples at all:

A store sells apples for $0.50 each and oranges for $0.75 each.
Emily buys 8 apples and 5 oranges. She pays with a $10 bill.
How much change does she receive?

Let's think step by step.

The phrase "Let's think step by step" is the entire intervention. The model will produce something like:

Step 1: Calculate the cost of apples.
8 apples × $0.50 = $4.00

Step 2: Calculate the cost of oranges.
5 oranges × $0.75 = $3.75

Step 3: Calculate the total cost.
$4.00 + $3.75 = $7.75

Step 4: Calculate the change.
$10.00 − $7.75 = $2.25

Emily receives $2.25 in change.

Without "Let's think step by step", the model is more likely to collapse the arithmetic and make an error. With it, the model traces each step, which catches arithmetic mistakes before they compound.

Alternatives that produce similar effects:

  • "Walk me through your reasoning before answering."
  • "Show your work."
  • "First, think about what information you have. Then reason to the answer."
  • "Think carefully before giving your final answer."

Few-Shot Chain-of-Thought

Few-shot CoT combines example provision with explicit reasoning chains. You write out not just the correct answer but the correct reasoning path, and the model learns to follow it.

Answer the following logic questions. Think step by step before giving the final answer.

---
Question: All mammals are warm-blooded. Dolphins are mammals. Are dolphins warm-blooded?
Reasoning: The first premise tells us all mammals share the property of being warm-blooded. The second premise establishes that dolphins belong to the category of mammals. Since dolphins are mammals, and all mammals are warm-blooded, dolphins must be warm-blooded.
Answer: Yes, dolphins are warm-blooded.

Question: If it rains, the ground gets wet. The ground is wet. Did it rain?
Reasoning: The premise only tells us that rain causes wet ground — if rain then wet ground. But wet ground can have other causes (sprinklers, flooding, spilled water). Wet ground is consistent with rain but does not prove rain was the cause. This is a logical fallacy known as affirming the consequent.
Answer: Not necessarily. The ground being wet is consistent with rain but does not prove it rained.
---

Question: No reptiles are warm-blooded. Snakes are reptiles. Are snakes warm-blooded?
Reasoning:

The few-shot reasoning chains show the model how to handle both valid syllogisms and logical fallacies. The model learns both the answer format and the type of reasoning to apply.

Few-shot CoT is consistently stronger than zero-shot CoT, but it requires more work: you must write correct, explicit reasoning chains for each example. On complex domains — medical reasoning, legal analysis, multi-step code debugging — this investment pays off.

When to Use CoT vs Standard Prompting

Task typeCoT helpful?Reason
Multi-step arithmeticYes, stronglySequential correction of arithmetic errors
Logical deductionYes, stronglyForces structured reasoning over recall
Factual recallNoCoT adds tokens without improving accuracy
Simple classificationNoNo reasoning steps involved
Creative writingRarelyReasoning chains disrupt fluency
Code generationSometimesHelps for algorithmic problems, not boilerplate
SummarisationNoTask doesn't require stepwise reasoning

The rule of thumb: use CoT whenever the task requires multi-step computation or logical dependency between steps. Avoid it when the task is primarily retrieval or generation without inter-step dependencies.


Self-Consistency: CoT With a Vote

A single chain-of-thought pass can still go wrong — the model commits to a reasoning path early and follows it even when it leads astray. Self-consistency, introduced by Wang et al. (2022), fixes this by generating multiple reasoning chains and taking the majority vote on the final answer.

The procedure:

  1. Run the same prompt 5–20 times at a slightly elevated temperature (0.5–0.8)
  2. Each run produces a different reasoning chain
  3. Extract the final answer from each chain
  4. Return the most common final answer
# Pseudocode for self-consistency
answers = []
for _ in range(10):
    response = model.generate(prompt + "\nLet's think step by step.", temperature=0.7)
    final_answer = extract_answer(response)
    answers.append(final_answer)

result = majority_vote(answers)

Why does this work? Correct reasoning paths are more concentrated in the probability distribution than incorrect ones. The model can reach the wrong answer through many different faulty paths, but correct answers cluster around fewer paths. Majority voting across diverse chains amplifies the signal of correct reasoning.

Self-consistency improves accuracy by 5–15 percentage points on math and reasoning benchmarks over single-chain CoT. The cost is real — you are running the model 5–20x — so reserve it for high-stakes single questions where latency and cost are acceptable tradeoffs.


Least-to-Most Prompting

CoT works well for single-question reasoning but struggles when a problem has dependencies between subproblems — where you cannot solve step 4 without the answer to step 2.

Least-to-most prompting (Zhou et al., 2022) decomposes complex problems explicitly:

  1. First, prompt the model to decompose the problem into ordered subproblems
  2. Solve the easiest/earliest subproblem
  3. Use that answer as context to solve the next subproblem
  4. Continue until the full problem is solved
Problem: A company's revenue grew 20% in year 1, then fell 15% in year 2,
then grew 30% in year 3. If the starting revenue was $500,000,
what was the revenue at the end of year 3?

Step 1: What are the subproblems we need to solve in order?

Model output:

Subproblem 1: Calculate revenue after year 1 (20% growth from $500,000)
Subproblem 2: Calculate revenue after year 2 (15% decline from year 1 result)
Subproblem 3: Calculate revenue after year 3 (30% growth from year 2 result)

Then you feed each subproblem sequentially, using the previous answer as input to the next. The result is significantly more reliable than asking the model to solve the full problem in one CoT pass, because each subproblem is simpler and the intermediate answers are grounded in explicit, verified prior steps.

Least-to-most excels at:

  • Multi-step word problems with numerical dependencies
  • Code that requires designing a solution before implementing it
  • Research questions that require answering sub-questions in order
  • Planning tasks where early decisions constrain later options

Tree of Thoughts: The Model as a Search Algorithm

Tree of Thoughts (ToT) (Yao et al., 2023) extends CoT from a linear chain to a tree structure. Instead of following one reasoning path to completion, the model:

  1. Generates multiple candidate next steps at each decision point
  2. Evaluates which candidates are most promising
  3. Expands promising branches further
  4. Backtracks from dead ends
  5. Returns the best complete path found

This turns the model into a search algorithm over reasoning space — closer to how humans solve hard problems (exploring options, abandoning bad approaches, returning to forks in the road) than to how standard CoT works (committing to a path and following it straight through).

ToT is overkill for most tasks but genuinely valuable for:

  • Creative writing with specific constraints (word games, constrained poetry)
  • Mathematical proofs where multiple approaches must be evaluated
  • Planning problems with many interdependent decisions
  • Any task where a single wrong turn early in reasoning invalidates the whole answer

The implementation complexity is higher than other techniques — you need to orchestrate multiple model calls, implement the branching and evaluation logic, and manage a growing tree of partial solutions. In 2026, several agent frameworks expose ToT-style reasoning natively, reducing the engineering overhead.


Decision Table: Which Technique for Which Task

Task TypeRecommended TechniqueWhy
Sentiment classificationZero-shotWell-represented in pre-training; no special format
Custom-category classificationFew-shotModel needs examples of non-standard labels
Extraction with rigid JSON schemaOne-shot or few-shotSchema needs anchoring via example
SummarisationZero-shotStrong pre-training signal; format is flexible
Multi-step arithmeticZero-shot CoT"Let's think step by step" is sufficient
Complex logic / syllogismsFew-shot CoTDomain reasoning chains clarify the reasoning style
Translation (major languages)Zero-shotAbundant pre-training data
Translation (rare languages or style)Few-shotExamples anchor the target register and style
High-stakes single questionSelf-consistencyMajority vote over multiple chains
Multi-step with dependenciesLeast-to-mostSequential subproblem solving
Open-ended problem with many approachesTree of ThoughtsBranching exploration over reasoning space
Format-sensitive outputFew-shotExamples are the most reliable format anchor
Code generation (algorithmic)Zero-shot or CoTProblem decomposition helps; examples often unnecessary
Named-entity extractionZero-shotStandard task; add few-shot only if entity types are unusual

Technique Comparison at a Glance

DimensionZero-ShotOne-ShotFew-ShotCoT (Zero-Shot)Few-Shot CoTSelf-Consistency
Token costLowestLowMediumLowMedium-HighVery High
LatencyLowestLowLowLowLowVery High
Format controlPoorGoodExcellentPoorGoodGood
Reasoning accuracyBaselineBaselineBaseline++++++++++++
Example writing effortNoneMinimalMediumNoneHighNone
Best forStandard tasksAnchoring formatCustom labels/schemasMulti-step reasoningDomain reasoningHigh-stakes single Q

The 2026 Context: When the Model Does It For You

Modern frontier models have changed the calculus around manual prompting techniques. Claude's extended thinking mode, GPT-5's reasoning settings, and similar features in other models apply chain-of-thought-style reasoning automatically before producing a response. For many tasks, turning on the model's thinking mode eliminates the need to manually add "Let's think step by step."

This does not make understanding CoT obsolete — it makes it more important. When you understand why CoT works, you can:

  • Diagnose when a thinking mode is failing (wrong reasoning style for the domain)
  • Write few-shot CoT examples for domain-specific reasoning the model otherwise handles poorly
  • Choose the right effort level / thinking budget for a given task

For more on selecting between reasoning modes and model variants, see Claude Effort Parameter and Model Selection.

The broader picture is context engineering — assembling everything the model sees (examples, retrieved documents, tool definitions, constraints) into the tightest, most signal-dense package possible. Few-shot prompting is one component of that assembly. See Context Engineering: Why Clean Prompts Matter for the full stack.

If you are building agent systems that call models in loops — where each call's output becomes the next call's input — the prompting techniques here are applied at every node of the loop. The Agent Harness Complete Guide covers how to structure those loops so examples and reasoning chains are passed effectively across steps.

For Claude-specific prompting patterns including the 4-block structure and XML conventions, Master Prompt Engineering with Claude covers the model-specific conventions that amplify everything discussed here.


Common Mistakes and How to Fix Them

Using few-shot when zero-shot is sufficient. If a task is standard (summarisation, translation, simple classification), few-shot examples add token cost without meaningful accuracy gains. Try zero-shot first; only switch to few-shot if quality is consistently off.

Picking examples that are too similar to each other. Five examples of the same type of BILLING ticket teaches the model less than one each of five different billing scenarios. Diversity in examples matters more than quantity.

Inconsistent example formatting. Mixing Answer:, Output:, and Result: as output labels across examples confuses the model about what to produce. Pick one and use it consistently across every example.

Adding CoT to tasks that don't need reasoning. Chain-of-thought adds tokens and sometimes hurts performance on tasks that are primarily retrieval (factual questions where the answer is a single entity) or creative generation (where reasoning chains disrupt fluency). Apply CoT selectively.

Forgetting to extract the final answer in CoT. When using CoT, specify where the final answer should appear: "After your reasoning, provide the final answer on a line beginning with 'Final answer:'". Without this, parsing the answer from the reasoning chain programmatically is brittle.

Using self-consistency for every query. Self-consistency is 5–20x more expensive than a single call. It is worth the cost for high-stakes single questions; it is not worth it for bulk classification, extraction, or any task where a single well-prompted pass is already reliable.


Putting It Together: A Practical Workflow

When approaching a new task, work through this sequence:

  1. Start zero-shot. Write the clearest possible instruction and try it on 10–20 representative inputs. If output quality is consistently good and format is correct, stop here.

  2. Add one example if format is wrong. If the model produces the right content but wrong format, one-shot is usually enough to anchor the schema. If one-shot doesn't fix it, move to few-shot.

  3. Move to few-shot if labels or categories are wrong. If the model is misclassifying, extracting wrong fields, or ignoring your label space, add 5–10 carefully selected examples covering the full distribution.

  4. Add CoT if multi-step reasoning is failing. If the task requires arithmetic, logic, or sequential dependencies, add "Let's think step by step" (zero-shot CoT) or write explicit reasoning chains into your examples (few-shot CoT).

  5. Add self-consistency for high-stakes single questions. If you need maximum accuracy on a specific question and cost/latency are acceptable, run self-consistency with 5–10 samples and take the majority vote.

  6. Consider least-to-most or ToT for complex structured problems. If CoT still fails because the problem has deep dependency structure or requires exploration, move to decomposition-based approaches.

This workflow avoids over-engineering. Most production tasks resolve at step 1 or 2. The more exotic techniques (self-consistency, ToT, least-to-most) are precision tools for specific hard problems, not defaults.


Example Prompt Library

A compact reference of ready-to-use prompts for each technique.

Zero-Shot Translation

Translate the following English text to formal Brazilian Portuguese.
Preserve technical terms in English.

Text: "The API endpoint accepts a JSON payload with a required 'query' field and an optional 'filters' array."

Translation:

Zero-Shot CoT for Logic

Determine whether the following argument is logically valid or invalid. Explain your reasoning step by step before giving your final verdict.

Argument: "All engineers know Python. Sarah knows Python. Therefore, Sarah is an engineer."

Let's think step by step.

Few-Shot Classification

Classify the priority of the following bug reports as P0 (production down), P1 (major feature broken), P2 (minor issue), or P3 (cosmetic).

Report: "Users cannot complete checkout — payment form throws a 500 error."
Priority: P0

Report: "The bulk export feature crashes for files over 100MB."
Priority: P1

Report: "Tooltip text on the settings page is truncated on smaller screens."
Priority: P3

Report: "Dark mode toggle doesn't persist across sessions."
Priority: P2

Now classify:
Report: "Search results return in random order instead of relevance order."
Priority:

Few-Shot CoT for Reasoning

Solve the following rate problems. Show your work step by step before giving the final answer.

Problem: A train travels at 60 mph. How long does it take to travel 150 miles?
Reasoning: Time = Distance ÷ Speed = 150 miles ÷ 60 mph = 2.5 hours.
Answer: 2.5 hours

Problem: A worker completes a task in 4 hours. A second worker completes the same task in 6 hours. How long does it take them working together?
Reasoning: Worker 1 completes 1/4 of the task per hour. Worker 2 completes 1/6 of the task per hour. Together: 1/4 + 1/6 = 3/12 + 2/12 = 5/12 per hour. Time = 1 ÷ (5/12) = 12/5 = 2.4 hours.
Answer: 2.4 hours

Problem: A pump fills a tank in 8 hours. A drain empties the same tank in 12 hours. If both are running simultaneously with the tank starting full, how long until the tank is empty?
Reasoning:

Understanding these techniques at the mechanism level — not just as named methods but as specific manipulations of the model's probability distribution — is what separates practitioners who can debug failing prompts from those who can only follow recipes. The techniques are composable: few-shot with CoT, self-consistency over few-shot CoT chains, least-to-most with CoT at each subproblem step. The decision about which combination to apply follows from understanding what each one actually does.

Related posts