PROMPT AND CONTEXT ENGINEERING

LLM Parameters Explained: Temperature, Top-P, Top-K, Max Tokens

LLM parameter controls diagram showing temperature, top-p, top-k settings for AI model configuration and fine-tuning

If you’ve ever used ChatGPT, Claude, or any AI chatbot, you’ve probably noticed something interesting: sometimes they give you creative, unexpected answers, and other times they’re very precise and factual. What’s the secret? It’s all about LLM parameters – the hidden settings that control how AI thinks and responds.

Think of these parameters like the dials on a mixing board. Adjust them, and you completely change the output. In this guide, we’ll break down temperature, top-p, top-k, max tokens, frequency penalty, and presence penalty in plain English, with real examples you can use today.

What Are LLM Parameters? (The Basics)

Before diving into specific settings, let’s understand what’s happening. When an AI model generates text, it doesn’t just pick one word – it considers thousands of possible next words, each with a probability score.

For example, if you type “The sky is…”, the AI might think:

  • “blue” – 40% probability
  • “clear” – 30% probability
  • “cloudy” – 15% probability
  • “dark” – 10% probability
  • “purple” – 5% probability

Parameters control which word the AI picks from this list. Let’s explore each one.

Temperature: The Creativity Dial

What Is Temperature?

Temperature controls randomness and creativity. It’s like adjusting how adventurous the AI should be when choosing words.

  • Low temperature (0.0 – 0.3): The AI becomes very predictable and focused. It always picks the most likely word.
  • Medium temperature (0.4 – 0.7): Balanced. The AI is reliable but occasionally surprising.
  • High temperature (0.8 – 2.0): The AI gets creative and unpredictable. It might pick unusual words.

Real-World Example

Prompt: “Write a story about a robot.”

Temperature = 0.2 (Factual)
“The robot was programmed to clean floors. It moved systematically from room to room, using sensors to detect obstacles.”

Temperature = 0.9 (Creative)
“The robot dreamed in binary. Each night, as humans slept, it pondered the meaning of dust and whether cleanliness was truly next to godliness.”

When to Use Each Setting

TaskRecommended TemperatureWhy
Code generation0.1 – 0.3You want precise, working code
Technical docs0.2 – 0.4Accuracy matters more than style
Blog posts0.5 – 0.7Balance between facts and engagement
Creative writing0.7 – 1.0Originality and surprises are good
Brainstorming0.8 – 1.2You want wild, unexpected ideas

Top-P (Nucleus Sampling): The Smart Filter

What Is Top-P?

Top-P (also called nucleus sampling) is like setting a probability threshold. Instead of looking at all possible words, the AI only considers words whose combined probability adds up to your chosen percentage.

For example, with Top-P = 0.9 (90%), the AI looks at the most likely words until their probabilities total 90%, then ignores the rest.

How It Works

Going back to our “The sky is…” example:

  • “blue” (40%) + “clear” (30%) + “cloudy” (15%) = 85%
  • Add “dark” (10%) = 95% ✓

With Top-P = 0.9, the AI would only choose from: blue, clear, cloudy, or dark. The word “purple” (5%) gets excluded because we’ve already hit our 95% threshold.

Top-P vs Temperature

  • Temperature changes how confident the AI is about each word
  • Top-P limits which words are even considered

Pro tip: Most experts recommend adjusting either temperature or top-p, not both. OpenAI and Anthropic suggest leaving one at default.

Best Settings

  • Top-P = 0.1 – 0.3: Very focused, only the safest choices
  • Top-P = 0.5 – 0.7: Balanced, filters out nonsense
  • Top-P = 0.9 – 0.95: Default for most tasks
  • Top-P = 0.99: Almost no filtering, very diverse

Top-K: The Fixed Word Limit

What Is Top-K?

Top-K is simpler than Top-P. It says: “Only consider the K most likely words,” where K is a fixed number.

  • Top-K = 1: Always pick the most likely word (same as temperature = 0)
  • Top-K = 10: Choose from the 10 most likely words
  • Top-K = 50: Common default, balances quality and variety
  • Top-K = 100+: Very diverse, might include weird words

Top-K vs Top-P: Which Is Better?

Top-K problem: It picks a fixed number of words regardless of their quality. If the AI is very confident (one word has 90% probability), Top-K = 50 still forces it to consider 49 other unlikely words.

Top-P advantage: It adapts. When the AI is confident, it narrows down choices. When uncertain, it expands options.

Most modern LLMs prefer Top-P for this reason, but both can work together.

Max Tokens: Setting Length Limits

What Are Tokens?

A token is not always one word. It can be:

  • A full word: “hello” = 1 token
  • Part of a word: “understanding” = 2 tokens (“under” + “standing”)
  • A punctuation mark: “!” = 1 token

Rule of thumb: 1 English word ≈ 1.3 tokens on average.

What Is Max Tokens?

Max tokens controls how long the AI’s response can be. If you set max tokens = 100, the AI will stop after generating about 75-80 words, even if the answer is incomplete.

Context Window vs Max Tokens

  • Context window: Total tokens the AI can “see” (your prompt + its response combined)
  • Max tokens: Maximum tokens for the AI’s response only

Modern models (as of November 2025) have large context windows: GPT-4o and GPT-5 support 128K tokens, Claude Sonnet 4.5 handles 200K tokens, and Gemini 2.5 Pro can process up to 1 million tokens. This means you can include entire documents in your prompts.

Recommended Settings

Use CaseMax Tokens
Short answers (Q&A)50 – 150
Social media posts100 – 280
Paragraph summaries150 – 300
Blog sections400 – 800
Long-form articles1,000 – 2,000
Code generation500 – 1,500

Frequency Penalty: Fighting Repetition

What Is Frequency Penalty?

Frequency penalty punishes words that appear too often. The more a word is used, the less likely the AI will use it again.

Range: -2.0 to 2.0

  • 0.0: No penalty (default)
  • 0.5 – 1.0: Moderate penalty, reduces repetition
  • 1.5 – 2.0: Strong penalty, forces variety
  • Negative values: Encourage repetition (rarely used)

Example

Prompt: “Write about good leadership qualities.”

Frequency Penalty = 0.0
“A good leader is someone who inspires. A good leader listens. A good leader makes decisions. A good leader…”

Frequency Penalty = 1.5
“A good leader inspires their team. Effective managers listen actively. Great executives make decisive choices. Strong supervisors…”

Notice how the second version uses different words (leader → manager → executive → supervisor) instead of repeating “leader.”

Presence Penalty: Introducing New Topics

What Is Presence Penalty?

Presence penalty is similar to frequency penalty, but simpler. It applies the same penalty to any word that has appeared before, regardless of how many times.

  • A word used once gets penalized the same as a word used ten times
  • Range: -2.0 to 2.0

Frequency vs Presence: Key Difference

  • Frequency penalty: “You said ‘robot’ five times, so I’ll really avoid it now.”
  • Presence penalty: “You said ‘robot’ once, so I’ll try something else.”

Use presence penalty when you want the AI to cover more topics. Use frequency penalty when you want to reduce word repetition.

When to Use

  • Brainstorming: High presence penalty (1.0 – 2.0) → generates diverse ideas
  • Creative writing: Moderate (0.5 – 1.0) → avoids boring repetition
  • Technical docs: Low (0.0 – 0.3) → can repeat terms like “API” or “database”

Best Parameter Configurations for Different Tasks

1. Code Generation

  • Temperature: 0.1 – 0.3
  • Top-P: 0.1 – 0.3
  • Top-K: 10 – 30
  • Max Tokens: 500 – 1,500
  • Frequency Penalty: 0.0
  • Presence Penalty: 0.0

Why: You want precise, reliable code that works, not creative experiments.

2. Creative Writing (Stories, Novels)

  • Temperature: 0.7 – 1.0
  • Top-P: 0.8 – 0.95
  • Top-K: 50 – 100
  • Max Tokens: 800 – 2,000
  • Frequency Penalty: 0.5 – 1.0
  • Presence Penalty: 0.3 – 0.7

Why: Creativity matters. You want unique phrases, unexpected twists, and varied vocabulary.

3. Business Emails & Professional Writing

  • Temperature: 0.3 – 0.5
  • Top-P: 0.5 – 0.7
  • Top-K: 30 – 50
  • Max Tokens: 150 – 400
  • Frequency Penalty: 0.2 – 0.5
  • Presence Penalty: 0.1 – 0.3

Why: Professional tone is key. Slightly varied but not too creative.

4. Brainstorming & Idea Generation

  • Temperature: 0.8 – 1.2
  • Top-P: 0.9 – 0.99
  • Top-K: 80 – 150
  • Max Tokens: 300 – 800
  • Frequency Penalty: 1.0 – 1.5
  • Presence Penalty: 1.0 – 2.0

Why: You want wild, unexpected ideas. High penalties prevent the AI from circling back to the same concepts.

5. Q&A / Customer Support Chatbots

  • Temperature: 0.2 – 0.4
  • Top-P: 0.3 – 0.5
  • Top-K: 20 – 40
  • Max Tokens: 100 – 300
  • Frequency Penalty: 0.0 – 0.3
  • Presence Penalty: 0.0 – 0.2

Why: Accuracy and helpfulness matter. Users want correct answers, not creative experiments.

6. Blog Posts & Content Marketing

  • Temperature: 0.5 – 0.7
  • Top-P: 0.7 – 0.9
  • Top-K: 40 – 60
  • Max Tokens: 600 – 1,200
  • Frequency Penalty: 0.3 – 0.7
  • Presence Penalty: 0.2 – 0.5

Why: Balance between engaging writing and factual accuracy. You want readable content that doesn’t sound robotic.

Common Mistakes to Avoid

1. Adjusting Both Temperature AND Top-P

OpenAI and most AI companies recommend changing one or the other, not both. They affect similar aspects of the output.

Better approach: Start with temperature. If results aren’t right, try top-p instead.

2. Setting Temperature Too High

Above 1.5, outputs become chaotic and nonsensical. Even creative tasks rarely need temperature above 1.0.

3. Using Both Frequency AND Presence Penalty

Like temperature and top-p, these work similarly. Adjust one at a time.

4. Ignoring Max Tokens

If you get cut-off answers, increase max tokens. If you’re paying per token and getting unnecessarily long responses, lower it.

Quick Reference Chart

ParameterWhat It DoesRangeDefault
TemperatureControls randomness/creativity0.0 – 2.01.0
Top-PLimits word choices by probability0.0 – 1.00.9 – 0.95
Top-KLimits to K most likely words1 – 100+40 – 50
Max TokensMaximum response length1 – context limitVaries by use
Frequency PenaltyReduces word repetition-2.0 – 2.00.0
Presence PenaltyEncourages new topics-2.0 – 2.00.0

Model-Specific Tuning Tips (2025)

Different AI models respond differently to the same parameters. Here’s what works best with current models:

OpenAI Models

  • GPT-4o & GPT-5: Handle higher temperatures well (up to 0.9) without degrading quality. Great for balanced creative + factual tasks.
  • o1 & o3 (Reasoning models): Use lower temperature (0.1-0.3) for complex logic, math, and coding. These models already “think” deeply, so high creativity isn’t needed.
  • GPT-4.1: Best kept at 0.5-0.7 for most applications. Very stable and consistent.

Anthropic Claude

  • Claude Sonnet 4.5: Excellent at 0.5-0.8 for coding and agentic tasks. Can sustain complex multi-step workflows for 30+ hours.
  • Claude Opus 4.1: Premium model for specialized reasoning. Use 0.3-0.6 for best results on complex analysis.
  • Claude Haiku 4.5: Fast and cost-effective. Works great at 0.4-0.7 for routine tasks.

Google Gemini

  • Gemini 2.5 Pro: Best all-rounder. Use 0.5-0.8 for most tasks. Excellent multimodal capabilities.
  • Gemini 2.0 Flash: Most cost-effective option. Keep temperature at 0.4-0.7 for quality results.
  • Gemini 2.0 Flash Thinking: Reasoning model – use 0.2-0.4 for logic-heavy tasks.

Open-Source Models

  • DeepSeek R1 & V3: Keep temperature lower (0.3-0.6) for stability. These models are more sensitive to high temperatures.
  • Llama 3.1+: Works well at 0.4-0.7. Good balance of performance and cost.
  • Qwen, Mistral: Best results at 0.3-0.6 range. More conservative settings produce better outputs.

Special Note: Reasoning Models

New reasoning models (OpenAI o1/o3, Gemini Thinking, DeepSeek R1) work differently. They spend time “thinking” before responding, which means:

  • Lower temperature is better (0.1-0.4) – they already explore solution spaces internally
  • Use them for: Complex math, advanced coding, multi-step logic, research analysis
  • Don’t use them for: Simple Q&A, creative writing, quick responses (they’re slower and more expensive)

Testing Your Settings: A Practical Exercise

Want to see these parameters in action? Try this experiment:

Prompt: “Write a paragraph about coffee.”

Test 1 (Robot Mode)

  • Temperature: 0.2
  • Top-P: 0.3
  • Frequency Penalty: 0.0

Test 2 (Creative Mode)

  • Temperature: 0.9
  • Top-P: 0.95
  • Frequency Penalty: 1.0

Compare the results. You’ll immediately see how these parameters change the AI’s “personality.”

Conclusion: Finding Your Perfect Settings

Understanding LLM parameters is like learning to drive. At first, there are many knobs and buttons. But once you get the hang of it, you’ll instinctively know which settings work for your needs.

Remember:

  • Start with default settings
  • Change ONE parameter at a time
  • Test with the same prompt to see differences
  • Save your favorite configurations for different tasks

The best part? There’s no “wrong” setting – only what works best for your specific use case. Experiment, learn, and adjust. That’s how you become an AI power user.

Now go forth and fine-tune those models like a pro!