If you’ve ever used ChatGPT, Claude, or any AI chatbot, you’ve probably noticed something interesting: sometimes they give you creative, unexpected answers, and other times they’re very precise and factual. What’s the secret? It’s all about LLM parameters – the hidden settings that control how AI thinks and responds.
Think of these parameters like the dials on a mixing board. Adjust them, and you completely change the output. In this guide, we’ll break down temperature, top-p, top-k, max tokens, frequency penalty, and presence penalty in plain English, with real examples you can use today.
What Are LLM Parameters? (The Basics)
Before diving into specific settings, let’s understand what’s happening. When an AI model generates text, it doesn’t just pick one word – it considers thousands of possible next words, each with a probability score.
For example, if you type “The sky is…”, the AI might think:
- “blue” – 40% probability
- “clear” – 30% probability
- “cloudy” – 15% probability
- “dark” – 10% probability
- “purple” – 5% probability
Parameters control which word the AI picks from this list. Let’s explore each one.
Temperature: The Creativity Dial
What Is Temperature?
Temperature controls randomness and creativity. It’s like adjusting how adventurous the AI should be when choosing words.
- Low temperature (0.0 – 0.3): The AI becomes very predictable and focused. It always picks the most likely word.
- Medium temperature (0.4 – 0.7): Balanced. The AI is reliable but occasionally surprising.
- High temperature (0.8 – 2.0): The AI gets creative and unpredictable. It might pick unusual words.
Real-World Example
Prompt: “Write a story about a robot.”
Temperature = 0.2 (Factual)
“The robot was programmed to clean floors. It moved systematically from room to room, using sensors to detect obstacles.”
Temperature = 0.9 (Creative)
“The robot dreamed in binary. Each night, as humans slept, it pondered the meaning of dust and whether cleanliness was truly next to godliness.”
When to Use Each Setting
| Task | Recommended Temperature | Why |
|---|---|---|
| Code generation | 0.1 – 0.3 | You want precise, working code |
| Technical docs | 0.2 – 0.4 | Accuracy matters more than style |
| Blog posts | 0.5 – 0.7 | Balance between facts and engagement |
| Creative writing | 0.7 – 1.0 | Originality and surprises are good |
| Brainstorming | 0.8 – 1.2 | You want wild, unexpected ideas |
Top-P (Nucleus Sampling): The Smart Filter
What Is Top-P?
Top-P (also called nucleus sampling) is like setting a probability threshold. Instead of looking at all possible words, the AI only considers words whose combined probability adds up to your chosen percentage.
For example, with Top-P = 0.9 (90%), the AI looks at the most likely words until their probabilities total 90%, then ignores the rest.
How It Works
Going back to our “The sky is…” example:
- “blue” (40%) + “clear” (30%) + “cloudy” (15%) = 85%
- Add “dark” (10%) = 95% ✓
With Top-P = 0.9, the AI would only choose from: blue, clear, cloudy, or dark. The word “purple” (5%) gets excluded because we’ve already hit our 95% threshold.
Top-P vs Temperature
- Temperature changes how confident the AI is about each word
- Top-P limits which words are even considered
Pro tip: Most experts recommend adjusting either temperature or top-p, not both. OpenAI and Anthropic suggest leaving one at default.
Best Settings
- Top-P = 0.1 – 0.3: Very focused, only the safest choices
- Top-P = 0.5 – 0.7: Balanced, filters out nonsense
- Top-P = 0.9 – 0.95: Default for most tasks
- Top-P = 0.99: Almost no filtering, very diverse
Top-K: The Fixed Word Limit
What Is Top-K?
Top-K is simpler than Top-P. It says: “Only consider the K most likely words,” where K is a fixed number.
- Top-K = 1: Always pick the most likely word (same as temperature = 0)
- Top-K = 10: Choose from the 10 most likely words
- Top-K = 50: Common default, balances quality and variety
- Top-K = 100+: Very diverse, might include weird words
Top-K vs Top-P: Which Is Better?
Top-K problem: It picks a fixed number of words regardless of their quality. If the AI is very confident (one word has 90% probability), Top-K = 50 still forces it to consider 49 other unlikely words.
Top-P advantage: It adapts. When the AI is confident, it narrows down choices. When uncertain, it expands options.
Most modern LLMs prefer Top-P for this reason, but both can work together.
Max Tokens: Setting Length Limits
What Are Tokens?
A token is not always one word. It can be:
- A full word: “hello” = 1 token
- Part of a word: “understanding” = 2 tokens (“under” + “standing”)
- A punctuation mark: “!” = 1 token
Rule of thumb: 1 English word ≈ 1.3 tokens on average.
What Is Max Tokens?
Max tokens controls how long the AI’s response can be. If you set max tokens = 100, the AI will stop after generating about 75-80 words, even if the answer is incomplete.
Context Window vs Max Tokens
- Context window: Total tokens the AI can “see” (your prompt + its response combined)
- Max tokens: Maximum tokens for the AI’s response only
Modern models (as of November 2025) have large context windows: GPT-4o and GPT-5 support 128K tokens, Claude Sonnet 4.5 handles 200K tokens, and Gemini 2.5 Pro can process up to 1 million tokens. This means you can include entire documents in your prompts.
Recommended Settings
| Use Case | Max Tokens |
|---|---|
| Short answers (Q&A) | 50 – 150 |
| Social media posts | 100 – 280 |
| Paragraph summaries | 150 – 300 |
| Blog sections | 400 – 800 |
| Long-form articles | 1,000 – 2,000 |
| Code generation | 500 – 1,500 |
Frequency Penalty: Fighting Repetition
What Is Frequency Penalty?
Frequency penalty punishes words that appear too often. The more a word is used, the less likely the AI will use it again.
Range: -2.0 to 2.0
- 0.0: No penalty (default)
- 0.5 – 1.0: Moderate penalty, reduces repetition
- 1.5 – 2.0: Strong penalty, forces variety
- Negative values: Encourage repetition (rarely used)
Example
Prompt: “Write about good leadership qualities.”
Frequency Penalty = 0.0
“A good leader is someone who inspires. A good leader listens. A good leader makes decisions. A good leader…”
Frequency Penalty = 1.5
“A good leader inspires their team. Effective managers listen actively. Great executives make decisive choices. Strong supervisors…”
Notice how the second version uses different words (leader → manager → executive → supervisor) instead of repeating “leader.”
Presence Penalty: Introducing New Topics
What Is Presence Penalty?
Presence penalty is similar to frequency penalty, but simpler. It applies the same penalty to any word that has appeared before, regardless of how many times.
- A word used once gets penalized the same as a word used ten times
- Range: -2.0 to 2.0
Frequency vs Presence: Key Difference
- Frequency penalty: “You said ‘robot’ five times, so I’ll really avoid it now.”
- Presence penalty: “You said ‘robot’ once, so I’ll try something else.”
Use presence penalty when you want the AI to cover more topics. Use frequency penalty when you want to reduce word repetition.
When to Use
- Brainstorming: High presence penalty (1.0 – 2.0) → generates diverse ideas
- Creative writing: Moderate (0.5 – 1.0) → avoids boring repetition
- Technical docs: Low (0.0 – 0.3) → can repeat terms like “API” or “database”
Best Parameter Configurations for Different Tasks
1. Code Generation
- Temperature: 0.1 – 0.3
- Top-P: 0.1 – 0.3
- Top-K: 10 – 30
- Max Tokens: 500 – 1,500
- Frequency Penalty: 0.0
- Presence Penalty: 0.0
Why: You want precise, reliable code that works, not creative experiments.
2. Creative Writing (Stories, Novels)
- Temperature: 0.7 – 1.0
- Top-P: 0.8 – 0.95
- Top-K: 50 – 100
- Max Tokens: 800 – 2,000
- Frequency Penalty: 0.5 – 1.0
- Presence Penalty: 0.3 – 0.7
Why: Creativity matters. You want unique phrases, unexpected twists, and varied vocabulary.
3. Business Emails & Professional Writing
- Temperature: 0.3 – 0.5
- Top-P: 0.5 – 0.7
- Top-K: 30 – 50
- Max Tokens: 150 – 400
- Frequency Penalty: 0.2 – 0.5
- Presence Penalty: 0.1 – 0.3
Why: Professional tone is key. Slightly varied but not too creative.
4. Brainstorming & Idea Generation
- Temperature: 0.8 – 1.2
- Top-P: 0.9 – 0.99
- Top-K: 80 – 150
- Max Tokens: 300 – 800
- Frequency Penalty: 1.0 – 1.5
- Presence Penalty: 1.0 – 2.0
Why: You want wild, unexpected ideas. High penalties prevent the AI from circling back to the same concepts.
5. Q&A / Customer Support Chatbots
- Temperature: 0.2 – 0.4
- Top-P: 0.3 – 0.5
- Top-K: 20 – 40
- Max Tokens: 100 – 300
- Frequency Penalty: 0.0 – 0.3
- Presence Penalty: 0.0 – 0.2
Why: Accuracy and helpfulness matter. Users want correct answers, not creative experiments.
6. Blog Posts & Content Marketing
- Temperature: 0.5 – 0.7
- Top-P: 0.7 – 0.9
- Top-K: 40 – 60
- Max Tokens: 600 – 1,200
- Frequency Penalty: 0.3 – 0.7
- Presence Penalty: 0.2 – 0.5
Why: Balance between engaging writing and factual accuracy. You want readable content that doesn’t sound robotic.
Common Mistakes to Avoid
1. Adjusting Both Temperature AND Top-P
OpenAI and most AI companies recommend changing one or the other, not both. They affect similar aspects of the output.
Better approach: Start with temperature. If results aren’t right, try top-p instead.
2. Setting Temperature Too High
Above 1.5, outputs become chaotic and nonsensical. Even creative tasks rarely need temperature above 1.0.
3. Using Both Frequency AND Presence Penalty
Like temperature and top-p, these work similarly. Adjust one at a time.
4. Ignoring Max Tokens
If you get cut-off answers, increase max tokens. If you’re paying per token and getting unnecessarily long responses, lower it.
Quick Reference Chart
| Parameter | What It Does | Range | Default |
|---|---|---|---|
| Temperature | Controls randomness/creativity | 0.0 – 2.0 | 1.0 |
| Top-P | Limits word choices by probability | 0.0 – 1.0 | 0.9 – 0.95 |
| Top-K | Limits to K most likely words | 1 – 100+ | 40 – 50 |
| Max Tokens | Maximum response length | 1 – context limit | Varies by use |
| Frequency Penalty | Reduces word repetition | -2.0 – 2.0 | 0.0 |
| Presence Penalty | Encourages new topics | -2.0 – 2.0 | 0.0 |
Model-Specific Tuning Tips (2025)
Different AI models respond differently to the same parameters. Here’s what works best with current models:
OpenAI Models
- GPT-4o & GPT-5: Handle higher temperatures well (up to 0.9) without degrading quality. Great for balanced creative + factual tasks.
- o1 & o3 (Reasoning models): Use lower temperature (0.1-0.3) for complex logic, math, and coding. These models already “think” deeply, so high creativity isn’t needed.
- GPT-4.1: Best kept at 0.5-0.7 for most applications. Very stable and consistent.
Anthropic Claude
- Claude Sonnet 4.5: Excellent at 0.5-0.8 for coding and agentic tasks. Can sustain complex multi-step workflows for 30+ hours.
- Claude Opus 4.1: Premium model for specialized reasoning. Use 0.3-0.6 for best results on complex analysis.
- Claude Haiku 4.5: Fast and cost-effective. Works great at 0.4-0.7 for routine tasks.
Google Gemini
- Gemini 2.5 Pro: Best all-rounder. Use 0.5-0.8 for most tasks. Excellent multimodal capabilities.
- Gemini 2.0 Flash: Most cost-effective option. Keep temperature at 0.4-0.7 for quality results.
- Gemini 2.0 Flash Thinking: Reasoning model – use 0.2-0.4 for logic-heavy tasks.
Open-Source Models
- DeepSeek R1 & V3: Keep temperature lower (0.3-0.6) for stability. These models are more sensitive to high temperatures.
- Llama 3.1+: Works well at 0.4-0.7. Good balance of performance and cost.
- Qwen, Mistral: Best results at 0.3-0.6 range. More conservative settings produce better outputs.
Special Note: Reasoning Models
New reasoning models (OpenAI o1/o3, Gemini Thinking, DeepSeek R1) work differently. They spend time “thinking” before responding, which means:
- Lower temperature is better (0.1-0.4) – they already explore solution spaces internally
- Use them for: Complex math, advanced coding, multi-step logic, research analysis
- Don’t use them for: Simple Q&A, creative writing, quick responses (they’re slower and more expensive)
Testing Your Settings: A Practical Exercise
Want to see these parameters in action? Try this experiment:
Prompt: “Write a paragraph about coffee.”
Test 1 (Robot Mode)
- Temperature: 0.2
- Top-P: 0.3
- Frequency Penalty: 0.0
Test 2 (Creative Mode)
- Temperature: 0.9
- Top-P: 0.95
- Frequency Penalty: 1.0
Compare the results. You’ll immediately see how these parameters change the AI’s “personality.”
Conclusion: Finding Your Perfect Settings
Understanding LLM parameters is like learning to drive. At first, there are many knobs and buttons. But once you get the hang of it, you’ll instinctively know which settings work for your needs.
Remember:
- Start with default settings
- Change ONE parameter at a time
- Test with the same prompt to see differences
- Save your favorite configurations for different tasks
The best part? There’s no “wrong” setting – only what works best for your specific use case. Experiment, learn, and adjust. That’s how you become an AI power user.
Now go forth and fine-tune those models like a pro!