Building a Python Chatbot with LLMs: Managing Conversation History and Context Memory from Scratch

Artificial Intelligence tutorial - IT technology blog
Artificial Intelligence tutorial - IT technology blog

I remember the first time I called an LLM API, thinking that was all I needed to have a chatbot — but it kept forgetting everything from the previous exchange. A user would type “my name is Nam,” then follow up with “so what’s my name?” — and the bot would respond as if it had never known. That’s when it hit me: getting the API to work is just the first step; the harder part is making a chatbot that actually remembers conversation context.

This article goes straight to that problem — building a Python chatbot with an LLM that genuinely has memory, from architecture to working code you can run immediately.

The Problem: Why Do Chatbots Keep ‘Losing Their Memory’?

Most LLM APIs are stateless — with each call, the model knows nothing about the previous conversation. You have to resend the entire conversation history with every request. It sounds cumbersome, but this is intentional design: you have full control over the context the model sees.

If you’re calling the API in a “one question — one answer” pattern without including history, you’re building a single-question answering machine, not a chatbot. That’s why automated code-writing bots, customer support assistants, and similar tools all have to solve this conversation management problem.

Core Concepts to Understand

What Is Conversation History?

At its core, it’s just a Python list. Each element is a message with a role (user/assistant) and content. You send the entire list to the API — the model reads it all before responding.

conversation_history = [
    {"role": "user",      "content": "Tên tôi là Nam"},
    {"role": "assistant", "content": "Chào Nam! Mình có thể giúp gì?"},
    {"role": "user",      "content": "Vậy tên tôi là gì?"},
]
# Gửi cả list → model biết context → trả lời đúng "Tên bạn là Nam"

System Prompt

The system prompt is a special message with role: "system", placed at the beginning of the conversation. Simply put: this is where you tell the model “who you are, what you do, and what’s allowed.” Whether you want the bot to respond briefly or in detail, in one language or another, strictly on-topic or flexibly — it’s all defined here.

Context Window

Each model has an input token limit — Claude Haiku at 200K, GPT-4o at 128K. In practice, 20 back-and-forth messages (~3,000–5,000 tokens) adds up with each request: trivial for 1 user, but do the math for 1,000 users. Conversations that run too long can also trigger a context_length_exceeded error — so you need to trim the history yourself.

Hands-On: Building a Python Chatbot with Conversation Memory

Installing Dependencies

I’m using the Anthropic SDK because its API design is clean and the message architecture is easy to follow. If you’re using OpenAI or Gemini, the structure is similar — just different function names.

pip install anthropic python-dotenv

Create a .env file to store your API key:

ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

Basic Chatbot with Conversation Memory

Create a chatbot.py file:

import os
from dotenv import load_dotenv
import anthropic

load_dotenv()

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

SYSTEM_PROMPT = """Bạn là trợ lý IT thân thiện, chuyên giải thích kỹ thuật
bằng tiếng Việt dễ hiểu. Trả lời ngắn gọn, đúng trọng tâm."""

def chat(history: list, user_message: str) -> str:
    history.append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=history
    )

    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply


def main():
    print("Chatbot IT — gõ 'quit' để thoát\n")
    history = []

    while True:
        user_input = input("Bạn: ").strip()
        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            print("Tạm biệt!")
            break
        print(f"Bot: {chat(history, user_input)}\n")


if __name__ == "__main__":
    main()

Run it:

python chatbot.py
Chatbot IT — gõ 'quit' để thoát

Bạn: Docker là gì?
Bot: Docker là nền tảng container hóa ứng dụng...

Bạn: Cho ví dụ cụ thể hơn
Bot: (Tiếp tục từ câu hỏi về Docker, không bị mất context)

Adding Conversation Length Limits

Long conversations consume a lot of tokens and cost money. The simple solution: only keep the N most recent message pairs.

MAX_HISTORY_PAIRS = 10  # Giữ 10 cặp hỏi-đáp gần nhất

def trim_history(history: list) -> list:
    max_messages = MAX_HISTORY_PAIRS * 2  # 1 cặp = 1 user + 1 assistant
    return history[-max_messages:] if len(history) > max_messages else history


def chat(history: list, user_message: str) -> str:
    history.append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=trim_history(history)  # Gửi history đã trim
    )

    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

Saving and Reloading Conversations Between Sessions

This is the most commonly overlooked part when building your own bot — close the terminal and all context is gone. A user spends a whole session debugging, builds up a long context, then reopens and has to start from scratch. Saving history to JSON solves this cleanly:

import json
from pathlib import Path

HISTORY_FILE = Path("chat_history.json")

def save_history(history: list):
    HISTORY_FILE.write_text(
        json.dumps(history, ensure_ascii=False, indent=2),
        encoding="utf-8"
    )

def load_history() -> list:
    if HISTORY_FILE.exists():
        return json.loads(HISTORY_FILE.read_text(encoding="utf-8"))
    return []

Complete Code (~60 Lines)

Putting it all together into a complete chatbot.py:

import os, json
from pathlib import Path
from dotenv import load_dotenv
import anthropic

load_dotenv()

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
HISTORY_FILE = Path("chat_history.json")
MAX_HISTORY_PAIRS = 10

SYSTEM_PROMPT = """Bạn là trợ lý IT thân thiện, giải thích kỹ thuật bằng tiếng Việt
dễ hiểu. Trả lời ngắn gọn, đúng trọng tâm, dùng ví dụ thực tế khi cần."""


def trim_history(history: list) -> list:
    max_msg = MAX_HISTORY_PAIRS * 2
    return history[-max_msg:] if len(history) > max_msg else history


def chat(history: list, user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=trim_history(history)
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply


def save_history(history: list):
    HISTORY_FILE.write_text(
        json.dumps(history, ensure_ascii=False, indent=2), encoding="utf-8"
    )


def load_history() -> list:
    return (
        json.loads(HISTORY_FILE.read_text(encoding="utf-8"))
        if HISTORY_FILE.exists() else []
    )


def main():
    print("Chatbot IT — 'quit' thoát | 'clear' xóa lịch sử\n")
    history = load_history()
    if history:
        print(f"(Tải {len(history) // 2} cặp hội thoại cũ)\n")

    while True:
        user_input = input("Bạn: ").strip()
        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            save_history(history)
            print("Đã lưu lịch sử. Tạm biệt!")
            break
        if user_input.lower() == "clear":
            history.clear()
            print("Đã xóa lịch sử hội thoại.\n")
            continue
        print(f"Bot: {chat(history, user_input)}\n")


if __name__ == "__main__":
    main()

Testing the Result

Run it for the first time, have a conversation, then type quit. Run it again — the bot remembers the entire previous conversation:

python chatbot.py
# (Tải 5 cặp hội thoại cũ)
# Bạn: Hồi nãy mình hỏi gì vậy?
# Bot: Bạn hỏi về Docker và cách dùng volume...

Conclusion

A lot of people stop at “got the API call working” — but what makes a chatbot actually useful is proper conversation history management. Just ~60 lines of Python is enough to build a bot with memory, session-persistent history, and no context window overflows.

From here, the natural next steps are clear: add tool use so the bot can actually call APIs or execute code, RAG to read your internal documents, or wrap the whole thing into a REST API with FastAPI. But before going further — truly understanding the fundamentals of stateless APIs and self-managed history will help you debug far faster when the bot starts misbehaving.

Share: