LLM optimization | Amir Teymoori

Context window management for 200K token large language models

Context Engineering: Mastering the 200K Token Era

With Claude 3.5 Sonnet supporting 200K tokens and Gemini 2.5 reaching 2M tokens, context engineering has become as important as prompt engineering....

PROMPT AND CONTEXT ENGINEERING

LoRA fine-tuning for efficient large language model training and optimization

Fine-Tuning LLMs with LoRA: 2025 Guide

Low-Rank Adaptation (LoRA) has revolutionized how we fine-tune large language models in 2025. This technique allows developers to adapt models like Llama...

LLM MODELS, PROVIDERS AND TRAINING

LLM inference optimization strategies to reduce AI costs

LLM Inference: Cut AI Costs by 80%

AI costs are crushing startups. One company I talked to was spending $47,000/month on LLM API calls—more than their entire engineering payroll....

INFERENCE, SERVING AND COST CONTROL