Beyond the Tutorial: Production Challenges
Every OpenAI tutorial shows you how to make a single API call. But in production, you face rate limits, token budgets, error handling, and costs that can spiral. This guide covers what I learned building an AI chatbot for e-commerce and a multi-model AI aggregator.
Basic Integration Setup
import openai
from django.conf import settings
client = openai.OpenAI(api_key=settings.OPENAI_API_KEY)
def chat_completion(messages, model="gpt-4o-mini", max_tokens=500):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=0.7,
)
return {
'content': response.choices[0].message.content,
'tokens': response.usage.total_tokens,
'model': model,
}
except openai.RateLimitError:
raise ServiceUnavailable("AI service rate limited")
except openai.APIError as e:
logger.error("OpenAI API error: %s", e)
raise ServiceUnavailable("AI service temporarily unavailable")
Rate Limiting and Retries
OpenAI rate limits are per-minute and vary by model. Implement exponential backoff:
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except openai.RateLimitError:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
logger.warning(
"Rate limited, retrying in %ds (attempt %d/%d)",
delay, attempt + 1, max_retries
)
time.sleep(delay)
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
def chat_completion(messages, **kwargs):
# ... same as above
Token Budget Management
Tokens directly translate to cost. Track and budget them:
class AIUsage(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
model = models.CharField(max_length=50)
tokens_input = models.IntegerField()
tokens_output = models.IntegerField()
cost_usd = models.DecimalField(max_digits=8, decimal_places=6)
created_at = models.DateTimeField(auto_now_add=True)
# Pricing per 1M tokens (as of 2026)
MODEL_PRICING = {
'gpt-4o-mini': {'input': 0.15, 'output': 0.60},
'gpt-4o': {'input': 2.50, 'output': 10.00},
}
def calculate_cost(model, input_tokens, output_tokens):
pricing = MODEL_PRICING[model]
return (
input_tokens / 1_000_000 * pricing['input'] +
output_tokens / 1_000_000 * pricing['output']
)
Caching Strategies
Cache identical or similar queries to save both latency and money:
import hashlib
from django.core.cache import cache
def cached_completion(messages, model="gpt-4o-mini", ttl=3600):
cache_key = hashlib.md5(
f"{model}:{str(messages)}".encode()
).hexdigest()
cached = cache.get(f"ai:{cache_key}")
if cached:
return cached
result = chat_completion(messages, model=model)
cache.set(f"ai:{cache_key}", result, ttl)
return result
Streaming Responses
For chat interfaces, streaming provides a much better UX:
from django.http import StreamingHttpResponse
def stream_chat(request):
messages = build_messages(request)
def generate():
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
yield f"data: {chunk.choices[0].delta.content}\n\n"
yield "data: [DONE]\n\n"
return StreamingHttpResponse(
generate(), content_type='text/event-stream'
)
Cost Optimization Tips
- Use gpt-4o-mini for most tasks — it is 20x cheaper than gpt-4o and handles 90% of use cases
- Truncate conversation history — send only the last 10 messages, not the full history
- Cache FAQ-like queries — many users ask the same questions
- Set max_tokens limits — prevent runaway responses
- Monitor daily spend — set alerts at budget thresholds
Production Monitoring
def log_ai_request(user, model, tokens, latency_ms, success):
logger.info(
"AI request: user=%s model=%s tokens=%d latency=%dms success=%s",
user.id, model, tokens, latency_ms, success
)
AIUsage.objects.create(
user=user, model=model,
tokens_input=tokens, tokens_output=0,
cost_usd=calculate_cost(model, tokens, 0)
)
The key to production OpenAI integration is treating it like any other external service: expect failures, budget resources, cache aggressively, and monitor everything. Need help integrating AI into your product? Check out my AI integration services or get in touch.