Topic hub · 10 claims
Prompt engineering — patterns that work
The prompting patterns that survived 2022-2025 contact with production systems. Each is a published research finding (not a Medium-post folk recipe) — Chain-of-Thought, ReAct, Tree of Thoughts, instruction-tuning, few-shot, in-context learning.
Why prompting still matters
Frontier models in 2025-2026 are massively more capable than 2022-23 ancestors, but prompt structure still dramatically affects output quality. The reason: the model's training distribution rewards certain shapes of input (step-by-step reasoning, structured examples, explicit role assignments). Prompt patterns that align with the training distribution out-perform raw queries.
The foundational patterns
Chain-of-Thought (Wei et al., 2022) — append 'let's think step by step' and watch reasoning benchmarks jump. ReAct (Yao et al., 2022) — interleave reasoning + action steps for tool-use agents. Tree of Thoughts (Yao et al., 2023) — generalize CoT to branching exploration for deliberate problem-solving. InstructGPT (Ouyang et al., 2022) — RLHF training on instruction-response pairs is why models follow instructions at all.
What still doesn't work reliably
Self-evaluation (asking the model 'are you sure?') is poorly calibrated. Few-shot prompting beats zero-shot for narrow extraction but doesn't help open-ended generation. 'Adversarial' prompts that try to bypass safety training increasingly fail on aligned models. Prompt engineering ≠ jailbreaking; the patterns that survive are the ones grounded in published research.
Defined terms (4)
- Chain-of-Thought (CoT)
- Prompting technique that elicits step-by-step reasoning before the final answer. Wei et al. (Google Brain, 2022) found dramatic reasoning-benchmark gains from this single technique.
- ReAct
- Reasoning + Acting interleaved pattern (Yao et al., Princeton+Google 2022). Foundational to agent loops — the model emits Thought → Action → Observation cycles.
- In-context learning
- The capability of LLMs to learn new patterns from examples in the prompt without weight updates. Emerged at GPT-3 scale; remains the primary mechanism for few-shot prompting.
- Instruction tuning
- Fine-tuning a pretrained LM on instruction-response pairs (often RLHF-augmented) so the model follows natural-language instructions. The InstructGPT paper (2022) is the canonical reference.
All claims in this topic (10)
- Chain-of-Thought (CoT)·introduced in Wei et al. 2022 — Chain-of-Thought Prompting Elicits Reasoning in Large Language Models(1.00 · 2 sources)
- Chain-of-Thought prompting·introduced in paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)(1.00 · 2 sources)
- DeepSeek-R1·released on 2025-01-20 with reasoning chain-of-thought capabilities(1.00 · 2 sources)
- Flamingo·introduced in Alayrac et al. 2022 — DeepMind few-shot vision-language model(1.00 · 2 sources)
- GPT-3·introduced in paper Language Models are Few-Shot Learners (Brown et al., 2020)(1.00 · 2 sources)
- ReAct (Reasoning + Acting)·introduced in paper ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)(1.00 · 2 sources)
- ReAct prompting pattern·introduced in Yao et al. 2022 — synergizing reasoning and acting in language models(1.00 · 2 sources)
- Stanford Alpaca·publicly released on 2023-03-13 — instruction-tuned LLaMA 7B from Stanford CRFM(1.00 · 2 sources)
- Tree of Thoughts·introduced in Yao et al. 2023 — deliberate problem solving with LLMs(1.00 · 2 sources)
- Vercel v0·publicly released on 2023-10-31 by Vercel — AI tool generating React + Tailwind UI from text prompts(1.00 · 2 sources)