AI API Cost Comparison: OpenAI vs Claude vs Gemini — 6 Months of Production Experience

Artificial Intelligence tutorial - IT technology blog
Artificial Intelligence tutorial - IT technology blog

Three Completely Different Pricing Approaches

When I first integrated AI APIs, I thought comparing input/output token prices was enough. Wrong. After six months running all three platforms in production — an internal chatbot, a pipeline processing thousands of documents daily, an automated content generation tool — I realized each provider has its own distinct pricing strategy. Miss this, and your end-of-month invoice will catch you off guard.

OpenAI: Tiered Pricing, Plenty of Models to Choose From

OpenAI has the richest model catalog — from the cheap GPT-3.5-turbo to the expensive o1. The real differentiator is the Batch API: a straight 50% cost reduction for workloads that don’t need real-time responses. I use this regularly to process documents in bulk overnight. A pipeline of 50,000 requests running at 2 AM, results ready by morning — saving $30–40 compared to synchronous daytime calls.

Claude: Long Context Window + Prompt Caching

Anthropic targets the long document processing use case with a 200K token context. The Prompt Caching feature was something I initially overlooked — but it turns out to be more valuable than I expected. Cached system prompt tokens are charged at just 10% of the standard input rate. For a chatbot with a ~3,000-token system prompt called thousands of times per day, the savings add up quickly — especially when using Sonnet at $3/1M input tokens.

Gemini: Free Tier + Ultra-Cheap Flash

Google competes with Gemini Flash’s very low pricing and a 1 million token context window. Many developers overlook this advantage without ever trying it. For simple tasks like classification, extraction, and summarization — Flash typically responds in under a second and is significantly cheaper than every comparable model.

Pricing Comparison and Real-World Usage

The table below summarizes USD/1M token pricing I collected during actual use. Note: AI API pricing changes frequently — always check the official pricing page before budgeting for a real project.

Model Input ($/1M tokens) Output ($/1M tokens) Context Window
GPT-4o $2.50 $10.00 128K
GPT-4o mini $0.15 $0.60 128K
Claude 3.5 Sonnet $3.00 $15.00 200K
Claude 3.5 Haiku $0.80 $4.00 200K
Gemini 1.5 Pro $3.50 $10.50 1M
Gemini 1.5 Flash $0.075 $0.30 1M

Gemini Flash is roughly 2x cheaper than GPT-4o mini on both input and output. But price isn’t the only factor — output quality and fit for each specific task are what ultimately matter.

Quick Cost Estimation with Python

This script is what I use to estimate costs before choosing a model for a new pipeline:

def estimate_cost(input_tokens: int, output_tokens: int, model: str) -> float:
    """Estimate the cost of a single API call (USD)"""
    pricing = {
        "gpt-4o":           {"input": 2.50,  "output": 10.00},
        "gpt-4o-mini":      {"input": 0.15,  "output": 0.60},
        "claude-3-5-sonnet":{"input": 3.00,  "output": 15.00},
        "claude-3-5-haiku": {"input": 0.80,  "output": 4.00},
        "gemini-1.5-flash": {"input": 0.075, "output": 0.30},
        "gemini-1.5-pro":   {"input": 3.50,  "output": 10.50},
    }
    rates = pricing[model]
    return (input_tokens / 1_000_000 * rates["input"] +
            output_tokens / 1_000_000 * rates["output"])

# Example: 5,000 token input document, 1,000 token response
for model in ["gpt-4o-mini", "claude-3-5-haiku", "gemini-1.5-flash"]:
    cost = estimate_cost(5000, 1000, model)
    print(f"{model:25s}: ${cost:.6f}")
gpt-4o-mini              : $0.001350
claude-3-5-haiku         : $0.008000
gemini-1.5-flash         : $0.000675

For the same task, Gemini Flash is 12x cheaper than Claude Haiku and 2x cheaper than GPT-4o mini. Multiply that across 100,000 requests per month, and the cost difference reaches several hundred dollars.

Pros and Cons of Each Provider

OpenAI — Best Ecosystem, Reasonable Pricing

  • Strengths: Most mature Python/JS SDK, stable function calling, Batch API cuts 50% for async jobs, fine-tuning available
  • Weaknesses: No real free tier for production; GPT-4o isn’t the cheapest in the high-capability tier
  • Best for: Production chatbots requiring high stability, teams already familiar with the OpenAI SDK, or projects needing many pre-built integrations

Claude — High-Quality Output, Long Document Processing

  • Strengths: Most natural text output of the three providers in my experience; Prompt Caching yields significant savings with long system prompts; 200K context is ideal for code review and document analysis
  • Weaknesses: Sonnet’s output price ($15/1M) is the highest in the group; no Batch API discount like OpenAI
  • Best for: Content generation, code review, document Q&A, and chatbots with complex system prompts that benefit from caching

Gemini — Cheapest, Massive Context Window

  • Strengths: Flash is the cheapest in the group; 1M token context lets you fit an entire codebase into one request; free tier is sufficient for dev/testing
  • Weaknesses: SDK isn’t as polished as OpenAI’s; complex tasks can sometimes produce less consistent output than GPT-4o or Claude Sonnet
  • Best for: High-volume classification, extraction, and summarization; long file/video processing; cost-sensitive projects

Which AI API Should You Choose?

After plenty of trial and error, I’ve settled on a simple framework: choose by task type first, budget second.

By Task Type

  • Simple, high-volume (classification, tagging, extraction): Gemini Flash
  • Complex, reasoning-heavy (code generation, complex Q&A): Claude 3.5 Sonnet or GPT-4o
  • Chatbot with long context, fixed system prompt: Claude with Prompt Caching
  • Processing extremely long files (100K+ tokens): Gemini 1.5 Pro (1M context window)
  • Async bulk processing without real-time requirements: OpenAI Batch API (50% discount)

By Budget

  • MVP / side project: Gemini Flash + free tier to test, then scale as needed
  • Mid-scale production: GPT-4o mini — the sweet spot between price and quality for 80% of use cases
  • Highest quality required: Claude 3.5 Sonnet or GPT-4o, but only for tasks that truly need it

My first month using Claude Sonnet for everything resulted in a $340 invoice — mostly simple classification tasks that didn’t need such an expensive model. After switching to model routing (simple tasks go to Flash/Haiku, complex tasks go to Sonnet), costs dropped to ~$90/month with the same throughput. A $250/month difference, just from routing tasks correctly.

Implementation Guide: Tracking Real Costs

Logging the cost of every request is a non-negotiable step when deploying AI APIs to production. Fortunately, all providers return usage data in the response — you just need to read the right field.

Reading Usage Data from Each Provider

# --- OpenAI ---
from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY")
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Summarize: ..."}]
)
usage = response.usage
print(f"Input: {usage.prompt_tokens}, Output: {usage.completion_tokens}")

# --- Anthropic Claude ---
import anthropic
client = anthropic.Anthropic(api_key="YOUR_KEY")
response = client.messages.create(
    model="claude-3-5-haiku-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Analyze code: ..."}]
)
print(f"Input: {response.usage.input_tokens}, Output: {response.usage.output_tokens}")

# --- Google Gemini ---
import google.generativeai as genai
genai.configure(api_key="YOUR_KEY")
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content("Classify: ...")
meta = response.usage_metadata
print(f"Input: {meta.prompt_token_count}, Output: {meta.candidates_token_count}")

A Simple Cost Tracker for Ongoing Monitoring

import json
from datetime import datetime
from pathlib import Path

COST_LOG = Path("api_costs.jsonl")

def log_cost(provider: str, model: str, task: str,
             input_tokens: int, output_tokens: int, cost_usd: float):
    entry = {
        "ts": datetime.now().isoformat(),
        "provider": provider, "model": model, "task": task,
        "input": input_tokens, "output": output_tokens, "cost": cost_usd
    }
    with open(COST_LOG, "a") as f:
        f.write(json.dumps(entry) + "\n")

def get_summary() -> dict:
    summary = {}
    if not COST_LOG.exists():
        return summary
    with open(COST_LOG) as f:
        for line in f:
            e = json.loads(line)
            summary[e["provider"]] = summary.get(e["provider"], 0) + e["cost"]
    return {k: round(v, 4) for k, v in summary.items()}

# View total cost by provider
print(get_summary())
# {'openai': 12.34, 'anthropic': 34.56, 'google': 5.12}

With this log in place, you can immediately spot which tasks are burning budget unnecessarily and easily switch to cheaper models without sacrificing quality.

Summary

No single provider is best for every situation. Here’s the setup I’m currently using:

  • Bulk classification/extraction → Gemini Flash
  • User-facing chatbot → GPT-4o mini or Claude Haiku
  • Complex code review, document analysis → Claude 3.5 Sonnet
  • Overnight async batch jobs (OpenAI Batch API) → GPT-4o or GPT-4o mini at 50% discount

If you’re just getting started: use the Gemini Flash free tier to test your logic, benchmark against GPT-4o mini, then commit to a primary provider. Scaling from 1,000 to 1,000,000 requests per month, the cost difference between models can easily reach thousands of dollars. Do the math upfront — don’t wait until you’re refactoring an entire pipeline to think about this.

Share: