A Practical Guide to Using OpenAI API with Python: From Setup to Production – ITFROMZERO

Table of Contents

Real-World Problems When Starting with the OpenAI API

I once spent an entire afternoon debugging an AuthenticationError — turns out it was caused by an extra space when copying the API key. Another time, code worked fine locally but kept timing out on the server because I hadn’t implemented retry handling. If you’re just getting started with the OpenAI API, this guide will help you avoid the traps I fell into.

The OpenAI API is more than just Chat Completions — there’s also Embeddings, Image generation, Speech-to-text, and much more. But in practice, within a DevOps team, 80% of use cases revolve around Chat Completions for automation: code reviews, generating documentation, parsing error logs, or summarizing reports. That’s what I’ll focus on here.

Core Concepts You Need to Know First

Models and Pricing

OpenAI offers multiple models, each with its own trade-offs between speed, cost, and quality:

gpt-4o — the most powerful, multimodal (handles images too), suitable for complex tasks
gpt-4o-mini — approximately 15x cheaper than gpt-4o, sufficient for most simple tasks
gpt-3.5-turbo — older and cheaper, but in my experience gpt-4o-mini has completely replaced it

Costs are calculated in tokens — the smallest unit of text (roughly 1 token ≈ 0.75 English words; non-Latin languages tend to use more tokens due to more unique characters). There are 2 types: input tokens (the prompt you send) and output tokens (the response returned) — priced differently, with output typically costing more.

Message Structure

The Chat Completions API uses a conversational format with 3 roles:

system — sets overall behavior instructions for the model (e.g., “You are a senior DevOps engineer”)
user — messages from the user / your code
assistant — the model’s response (used when maintaining conversation history)

Hands-On Walkthrough

Step 1: Installation and Getting Your API Key

Install the official OpenAI library:

pip install openai
# Or if using uv
uv add openai

Get your API key at platform.openai.com → API keys → Create new secret key. Important note: the key is only shown once at creation time and cannot be retrieved afterward.

Store the key in an environment variable — never hardcode it in your code:

# .env file (add to .gitignore)
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxx

# Or export directly in shell
export OPENAI_API_KEY=sk-proj-xxxxxxxxxxxx

Step 2: Basic API Call

The minimal code to confirm the API is working:

from openai import OpenAI

# Client automatically reads OPENAI_API_KEY from environment
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a DevOps support assistant."},
        {"role": "user", "content": "Explain what the command 'docker system prune -af' does."}
    ]
)

print(response.choices[0].message.content)

Run it and check the output. If you get an AuthenticationError, verify your API key — use echo $OPENAI_API_KEY to confirm the environment variable is set.

Step 3: Building a Reusable Function with Error Handling

Hard-won lesson: when your codebase has 20+ scattered API calls, every time you want to add retry logic or logging you have to update each one individually. Write a wrapper once from the start — it saves a lot of headaches later:

import time
import logging
from openai import OpenAI, RateLimitError, APITimeoutError, APIConnectionError

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

client = OpenAI(timeout=30.0)  # 30-second timeout

def call_openai(
    prompt: str,
    system: str = "You are a helpful assistant.",
    model: str = "gpt-4o-mini",
    max_retries: int = 3
) -> str | None:
    """Call OpenAI API with retry logic."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.7,
                max_tokens=2000
            )
            return response.choices[0].message.content

        except RateLimitError:
            wait = 2 ** attempt  # exponential backoff: 1s, 2s, 4s
            log.warning(f"Rate limit hit, retry {attempt+1}/{max_retries} after {wait}s")
            time.sleep(wait)

        except (APITimeoutError, APIConnectionError) as e:
            log.error(f"Connection error: {e}, retry {attempt+1}/{max_retries}")
            time.sleep(1)

        except Exception as e:
            log.error(f"Unexpected error: {e}")
            return None

    log.error("Max retries exceeded, skipping.")
    return None


# Usage
result = call_openai(
    prompt="Review this SQL query and find potential N+1 queries: SELECT * FROM users WHERE id IN (...)",
    system="You are a senior backend developer. Review code concisely and point out specific issues."
)
if result:
    print(result)

Step 4: Using Streaming to Display Responses in Real Time

For tasks that generate long output (like writing reports or generating code), streaming greatly improves UX — the response appears incrementally rather than waiting until completion:

def call_openai_stream(prompt: str, model: str = "gpt-4o-mini") -> str:
    """Stream response, print each chunk, return full text."""
    full_response = []

    with client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True
    ) as stream:
        for chunk in stream:
            delta = chunk.choices[0].delta.content
            if delta:
                print(delta, end="", flush=True)
                full_response.append(delta)

    print()  # trailing newline
    return "".join(full_response)


result = call_openai_stream("Write a Python script to monitor disk usage and send alerts.")

Step 5: Handling Structured JSON Output

When you need to parse output into structured data (rather than plain text), use response_format:

import json

def analyze_log_entry(log_line: str) -> dict | None:
    """Parse a log line and return a structured dict."""
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "Parse the log entry into JSON. Return: {severity, service, message, is_error}"
                },
                {"role": "user", "content": log_line}
            ],
            response_format={"type": "json_object"},
            temperature=0  # JSON output needs to be deterministic
        )
        return json.loads(response.choices[0].message.content)
    except Exception as e:
        log.error(f"Parse failed: {e}")
        return None

# Test
result = analyze_log_entry(
    "2024-01-15 14:32:01 ERROR payment-service Connection timeout to database after 30s"
)
print(result)
# Output: {'severity': 'ERROR', 'service': 'payment-service', 'message': 'Connection timeout...', 'is_error': True}

Tracking Token Usage and Costs

Token tracking is often overlooked at first. But as you scale up, costs rise faster than you’d expect — a pipeline processing 10,000 requests/day with 500-token prompts can easily consume $20–50/month on input alone. I’ve been logging usage from day one to avoid surprises:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

usage = response.usage
log.info(
    f"Tokens — input: {usage.prompt_tokens}, "
    f"output: {usage.completion_tokens}, "
    f"total: {usage.total_tokens}"
)

# gpt-4o-mini pricing: ~$0.15/1M input, ~$0.60/1M output (verify at platform.openai.com)
input_cost = usage.prompt_tokens * 0.15 / 1_000_000
output_cost = usage.completion_tokens * 0.60 / 1_000_000
log.info(f"Estimated cost: ${input_cost + output_cost:.6f}")

Conclusion

The API itself isn’t technically difficult. What tends to get overlooked are 4 things: choosing the right model, writing clear system prompts, handling errors properly, and tracking costs from the start — don’t wait until you get a surprise bill at the end of the month.

My team runs ~95% of automation tasks on gpt-4o-mini — roughly 15x cheaper than gpt-4o and noticeably faster. We only step up to gpt-4o when we need image processing or tasks that demand complex reasoning.

Next steps if you want to go deeper: explore Function Calling (to let the model invoke your tools), the Embeddings API (to build semantic search), and the Assistants API (to create agents with memory). Each one opens up a whole new set of use cases.