Get Claude API Running in Your First 5 Minutes
I spent almost an entire morning digging through the docs of various AI APIs before trying Claude. The verdict: Claude API is the easiest to get started with — a clean SDK, consistent responses, and few unexpected quirks. If you’ve used the OpenAI SDK before, you’ll feel right at home.
Install the library first:
pip install anthropic
Done. Now let’s make an API call:
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-api03-...")
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain what Docker is in 3 short sentences"}
]
)
print(message.content[0].text)
Run it — you’ll see an answer appear within a few seconds. That’s everything you need to have a working AI application. The rest is about making it good.
Get your API key at console.anthropic.com → API Keys → Create Key. Never hardcode the key into your code — use environment variables:
# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_API_KEY="sk-ant-api03-..."
# The SDK reads this variable automatically if api_key is not passed
client = anthropic.Anthropic() # No api_key needed!
Understanding the API Structure to Avoid Common Mistakes
Messages API — Using It Correctly
Claude uses the Messages API, not the Completion API used by older models. A conversation is structured as a list of alternating user and assistant messages:
messages = [
{"role": "user", "content": "I'm learning Python"},
{"role": "assistant", "content": "Python is a great choice! Where would you like to start?"},
{"role": "user", "content": "Can we start with list comprehensions?"}
]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=messages
)
Rule: the first message must be user, and messages must alternate between user and assistant. Breaking this rule throws an error immediately — I hit this bug while building a chatbot because I forgot to properly trim the history.
System Prompt — Your Main Tool for Controlling Output
The system prompt is not inside messages but is a separate parameter:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are a senior DevOps engineer. Keep answers concise, use bullet points. Always mention security implications.",
messages=[
{"role": "user", "content": "How do I set up an nginx reverse proxy?"}
]
)
Many other APIs stuff the system message at the top of the messages list like a regular message. Claude separates it into an independent parameter — the model processes this instruction with higher priority and is less likely to “forget” your requirements as the conversation grows longer.
Streaming — Essential for Good UX
Nobody wants to stare at a blank screen for 10 seconds before text appears. Streaming solves this:
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a PostgreSQL database backup script"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # New line after completion
With Flask or FastAPI, you can stream directly to the browser using Server-Sent Events:
from flask import Flask, Response, stream_with_context
import anthropic
import json
app = Flask(__name__)
client = anthropic.Anthropic()
@app.route("/chat")
def chat():
def generate():
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
) as stream:
for text in stream.text_stream:
yield f"data: {json.dumps({'text': text})}\n\n"
yield "data: [DONE]\n\n"
return Response(
stream_with_context(generate()),
mimetype="text/event-stream"
)
Advanced Techniques for Production Applications
Prompt Caching — Cut Costs by 90%
Got a system prompt that’s several thousand tokens long and you’re calling it hundreds of times a day? You’re throwing money out the window. I enabled caching on a project with a ~3,000-token system prompt at around 120 requests/day — the bill dropped from $2.1 to $0.4 per day:
# Enable prompt caching for long system prompts
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": """You are a technical assistant for an e-commerce system.
[... 5000 words of system documentation ...]""",
"cache_control": {"type": "ephemeral"} # Cache it!
}
],
messages=[{"role": "user", "content": query}]
)
The first request costs the full price to build the cache. From the second request onward, the cached portion is charged at just 10% of the normal rate. With a 3,000-token system prompt and 100 requests per day, that’s roughly $50 in savings per month.
Automatic Retry with Exponential Backoff
Rate limits and network errors are a fact of life. Don’t let your application crash because of transient failures:
import anthropic
import time
def call_claude_with_retry(messages, max_retries=3, **kwargs):
client = anthropic.Anthropic()
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages,
**kwargs
)
except anthropic.RateLimitError:
if attempt == max_retries - 1:
raise
wait_time = (2 ** attempt) * 1 # 1s, 2s, 4s
print(f"Rate limit hit, waiting {wait_time}s...")
time.sleep(wait_time)
except anthropic.APIConnectionError as e:
if attempt == max_retries - 1:
raise
time.sleep(1)
return None
Structured Output with JSON Mode
When you need to parse a response into structured data, don’t use regex — use JSON in the prompt:
import json
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="Always respond with valid JSON only, no text outside the JSON.",
messages=[{
"role": "user",
"content": """Analyze the following sentence and return JSON:
'PostgreSQL server version 14.5 crashed with OOM error at 3am on February 15th'
Format: {\"component\": str, \"version\": str, \"error_type\": str, \"time\": str}"""
}]
)
data = json.loads(response.content[0].text)
print(data["error_type"]) # "OOM"
Practical Tips from Real-World Project Experience
Getting the API call working is just the first step. The more mentally taxing part — and the one easily ignored until the bill arrives — is managing costs and keeping output quality consistent. Here’s what I’ve learned, mostly the hard way:
- Choose the right model for the task: Haiku for simple tasks (classify, extract), Sonnet for medium tasks (summarize, translate, code review), Opus for complex tasks (analysis, reasoning). Using Opus for everything costs 15x more than Haiku.
- Set
max_tokensclose to your actual needs: If a response only needs 200 tokens, don’t set 4096. You’re only billed for actual output — but setting max_tokens high tends to make the model generate more verbosely than necessary, increasing latency and costing more. - Log token usage: The response object returns
message.usagewithinput_tokensandoutput_tokens. Log these to track costs and spot any prompts consuming abnormally high token counts. - Test with
claude-haiku-4-5-20251001first: While developing, use Haiku to iterate quickly and cheaply. Only switch to Sonnet/Opus when you need to test real quality. - Limit conversation history: Keep only the most recent 10-20 messages instead of the entire history. The context window is large, but costs accumulate quickly.
# Efficient conversation history management pattern
def trim_history(messages, max_pairs=10):
"""Keep only the most recent max_pairs user/assistant pairs"""
if len(messages) > max_pairs * 2:
return messages[-(max_pairs * 2):]
return messages
# Log usage to track costs
response = client.messages.create(...)
print(f"Input: {response.usage.input_tokens} tokens | Output: {response.usage.output_tokens} tokens")
One more thing I often see beginners do wrong: never expose your API key in source code committed to GitHub. Anthropic has an automated scanning system that will immediately revoke the key if detected. Use python-dotenv with a .env file added to .gitignore.
# .env
ANTHROPIC_API_KEY=sk-ant-api03-...
# .gitignore
.env
*.env
# main.py
from dotenv import load_dotenv
load_dotenv()
client = anthropic.Anthropic() # Reads automatically from .env
The quick start above is enough to get a prototype running in a few hours. From there, expand in order of priority: streaming first (improves UX immediately), then caching when costs start to climb, and finally retry and error handling when deploying to production. Claude API is quite stable — the codebase I wrote 8 months ago still runs fine, with no breaking changes requiring major migrations.

