Hướng dẫn sử dụng OpenAI API với Python: Từ setup đến production – ITFROMZERO

Table of Contents

Vấn đề thực tế khi bắt đầu dùng OpenAI API

Mình đã từng mất cả buổi chiều chỉ để debug cái lỗi AuthenticationError — hóa ra do copy API key bị thừa dấu cách. Rồi lần khác, code chạy ổn trên máy local nhưng lên server thì timeout liên tục vì không handle retry. Nếu bạn đang bắt đầu làm việc với OpenAI API, bài này sẽ giúp bạn tránh những cái bẫy mình đã vấp.

OpenAI API không chỉ có Chat Completions — còn có Embeddings, Image generation, Speech-to-text, và nhiều thứ khác. Nhưng thực tế trong team DevOps, 80% use case đều xoay quanh Chat Completions để tự động hóa: review code, sinh documentation, parse log lỗi, hay tóm tắt báo cáo. Đó là thứ mình sẽ tập trung ở đây.

Khái niệm cốt lõi cần nắm trước

Model và pricing

OpenAI có nhiều model, mỗi model có trade-off riêng về tốc độ/chi phí/chất lượng:

gpt-4o — mạnh nhất, multimodal (xử lý cả ảnh), phù hợp task phức tạp
gpt-4o-mini — rẻ hơn ~15 lần so với gpt-4o, đủ dùng cho phần lớn task đơn giản
gpt-3.5-turbo — cũ hơn, rẻ, nhưng mình thấy gpt-4o-mini đã thay thế hoàn toàn vai trò này

Chi phí tính theo token — đơn vị nhỏ nhất của văn bản (khoảng 1 token ≈ 0.75 từ tiếng Anh; tiếng Việt thường tốn token hơn vì nhiều ký tự đặc thù hơn). Có 2 loại: input token (prompt bạn gửi lên) và output token (response trả về) — giá khác nhau, output thường đắt hơn.

Cấu trúc message

API Chat Completions dùng định dạng hội thoại với 3 role:

system — chỉ dẫn hành vi tổng thể cho model (ví dụ: “Bạn là senior DevOps engineer”)
user — tin nhắn từ phía người dùng/code của bạn
assistant — response của model (dùng khi cần duy trì lịch sử hội thoại)

Thực hành chi tiết

Bước 1: Cài đặt và lấy API key

Cài thư viện chính thức của OpenAI:

pip install openai
# Hoặc nếu dùng uv
uv add openai

Lấy API key tại platform.openai.com → API keys → Create new secret key. Lưu ý quan trọng: key chỉ hiển thị một lần duy nhất khi tạo, sau đó không xem lại được.

Lưu key vào biến môi trường, không bao giờ hardcode vào code:

# .env file (thêm vào .gitignore)
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxx

# Hoặc export trực tiếp trong shell
export OPENAI_API_KEY=sk-proj-xxxxxxxxxxxx

Bước 2: Gọi API cơ bản

Đoạn code tối giản để xác nhận API đang hoạt động:

from openai import OpenAI

# Client tự đọc OPENAI_API_KEY từ environment
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Bạn là assistant hỗ trợ DevOps."},
        {"role": "user", "content": "Giải thích lệnh 'docker system prune -af' làm gì?"}
    ]
)

print(response.choices[0].message.content)

Chạy thử và xem output. Nếu gặp AuthenticationError, kiểm tra lại API key — dùng echo $OPENAI_API_KEY để xác nhận biến môi trường đã được set.

Bước 3: Xây dựng hàm tái sử dụng với error handling

Bài học xương máu: khi codebase có 20+ chỗ gọi API rải rác, mỗi lần muốn thêm retry hay logging là phải sửa ở từng chỗ một. Viết wrapper một lần ngay từ đầu, tiết kiệm rất nhiều đau đầu về sau:

import time
import logging
from openai import OpenAI, RateLimitError, APITimeoutError, APIConnectionError

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

client = OpenAI(timeout=30.0)  # timeout 30 giây

def call_openai(
    prompt: str,
    system: str = "You are a helpful assistant.",
    model: str = "gpt-4o-mini",
    max_retries: int = 3
) -> str | None:
    """Gọi OpenAI API với retry logic."""
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.7,
                max_tokens=2000
            )
            return response.choices[0].message.content

        except RateLimitError:
            wait = 2 ** attempt  # exponential backoff: 1s, 2s, 4s
            log.warning(f"Rate limit hit, retry {attempt+1}/{max_retries} sau {wait}s")
            time.sleep(wait)

        except (APITimeoutError, APIConnectionError) as e:
            log.error(f"Connection error: {e}, retry {attempt+1}/{max_retries}")
            time.sleep(1)

        except Exception as e:
            log.error(f"Unexpected error: {e}")
            return None

    log.error("Hết retry, bỏ qua.")
    return None


# Sử dụng
result = call_openai(
    prompt="Review đoạn SQL này và tìm potential N+1 query: SELECT * FROM users WHERE id IN (...)",
    system="Bạn là senior backend developer, review code ngắn gọn, chỉ ra vấn đề cụ thể."
)
if result:
    print(result)

Bước 4: Dùng Streaming để hiển thị response realtime

Với task tạo output dài (như viết báo cáo, sinh code), streaming giúp UX tốt hơn nhiều — response hiện dần thay vì chờ đến khi xong mới in ra:

def call_openai_stream(prompt: str, model: str = "gpt-4o-mini") -> str:
    """Stream response, in ra từng chunk, trả về full text."""
    full_response = []

    with client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True
    ) as stream:
        for chunk in stream:
            delta = chunk.choices[0].delta.content
            if delta:
                print(delta, end="", flush=True)
                full_response.append(delta)

    print()  # newline cuối
    return "".join(full_response)


result = call_openai_stream("Viết script Python để monitor disk usage và gửi alert.")

Bước 5: Xử lý JSON output có cấu trúc

Khi cần parse output thành dữ liệu có cấu trúc (thay vì text thuần), dùng response_format:

import json

def analyze_log_entry(log_line: str) -> dict | None:
    """Phân tích log line và trả về dict có cấu trúc."""
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "Parse log entry thành JSON. Trả về: {severity, service, message, is_error}"
                },
                {"role": "user", "content": log_line}
            ],
            response_format={"type": "json_object"},
            temperature=0  # output JSON cần deterministic
        )
        return json.loads(response.choices[0].message.content)
    except Exception as e:
        log.error(f"Parse failed: {e}")
        return None

# Test
result = analyze_log_entry(
    "2024-01-15 14:32:01 ERROR payment-service Connection timeout to database after 30s"
)
print(result)
# Output: {'severity': 'ERROR', 'service': 'payment-service', 'message': 'Connection timeout...', 'is_error': True}

Theo dõi token usage và chi phí

Token tracking thường bị bỏ qua lúc đầu. Nhưng khi scale lên, chi phí tăng lẹ hơn bạn nghĩ — pipeline xử lý 10.000 request/ngày với prompt 500 token có thể ngốn $20–50/tháng chỉ riêng input. Mình log usage từ ngày đầu để không bị bất ngờ:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

usage = response.usage
log.info(
    f"Tokens — input: {usage.prompt_tokens}, "
    f"output: {usage.completion_tokens}, "
    f"total: {usage.total_tokens}"
)

# gpt-4o-mini pricing: ~$0.15/1M input, ~$0.60/1M output (verify tại platform.openai.com)
input_cost = usage.prompt_tokens * 0.15 / 1_000_000
output_cost = usage.completion_tokens * 0.60 / 1_000_000
log.info(f"Ước tính chi phí: ${input_cost + output_cost:.6f}")

Kết luận

API này về kỹ thuật không khó. Cái hay bị bỏ sót là 4 điểm: chọn đúng model, viết system prompt rõ ràng, handle lỗi đúng cách, và track chi phí ngay từ đầu — đừng đợi đến khi nhận bill bất ngờ cuối tháng mới để ý.

Team mình chạy ~95% task tự động hóa trên gpt-4o-mini — rẻ hơn gpt-4o khoảng 15 lần, tốc độ lại nhanh hơn đáng kể. Chỉ khi cần xử lý ảnh hoặc task đòi reasoning phức tạp mới cần leo lên.

Bước tiếp theo nếu bạn muốn đi sâu hơn: tìm hiểu về Function Calling (để model gọi tool của bạn), Embeddings API (để xây search semantic), và Assistants API (để tạo agent có memory). Mỗi thứ lại mở ra một đống use case khác nhau.