When LLMs “Decide” to Return Unexpected Data
AI developers are likely all too familiar with the struggle of parsing results from GPT or Claude. In reality, the most frustrating issue isn’t that the AI lacks intelligence. On the contrary, it’s often too “creative” with the data it returns. Sometimes it adds unnecessary preamble, misses brackets, or worse, arbitrarily changes user_id to customer_id, causing downstream code to crash.
I once worked on a project to automate customer support tickets. The system would occasionally throw a 500 error simply because the bot returned inconsistent date formats. At times, the data parsing error rate reached 15-20%, forcing me to write dozens of messy regex functions and try-except blocks just to clean up the mess before saving it to the database.
The root cause is the mismatch between two worlds. LLMs operate on probability, while software runs on rigid logic. Without a protective layer in between, your AI Agent system will always remain in a state of “hit or miss”.
The Old Ways: Manual Parsing or Heavy Frameworks?
Before discovering PydanticAI, many developers, myself included, usually oscillated between two options:
- Manual Parsing: Using
json.loads()and then manually checking every key. This is tedious, results in long and messy code, and is extremely hard to maintain as the schema grows. - Using Heavy Frameworks: LangChain or LlamaIndex have built-in
OutputParsers, but frankly, they can be overkill. Too many layers of abstraction make debugging a nightmare, especially when you need deep control over the data flow.
My hard-earned lesson: Don’t just try to “teach” the AI to return the right format through prompting alone. Force it to strictly adhere to a data structure right at the framework level.
PydanticAI – The Solution for Type-Safety
PydanticAI was born from the team behind Pydantic—the “go-to” data validation library for Python developers. Its philosophy is pragmatic: bring the rigor of type-hinting into the volatile world of LLMs.
The biggest selling point is that PydanticAI places result validation at the core of the Agent. If the AI returns an incorrect schema, the framework automatically identifies the error and asks the AI to fix it (retry). This ensures your system always receives 100% clean data, minimizing the risk of production bugs.
Real-world Implementation: Ticket Analysis Agent in 5 Minutes
First, quickly install the library via pip:
pip install pydantic-ai
Here is how I structure an Agent so it understands context while returning perfectly formatted data every time:
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from typing import List
# 1. Define the desired data structure
class TicketAnalysis(BaseModel):
priority: str = Field(description="Priority level: Low, Medium, High")
category: str = Field(description="Category: Technical, Billing, Account")
tags: List[str] = Field(default_factory=list, description="Related keywords")
summary_vi: str = Field(description="Summary of the content")
# 2. Initialize the Agent with Model and Result Schema
agent = Agent(
'openai:gpt-4o',
result_type=TicketAnalysis,
system_prompt='You are a ticket analysis expert. Analyze and return structured data.',
)
# 3. Run the Agent
def analyze_customer_issue():
user_input = "I cannot log in even though I changed my password. The app shows a 500 error."
result = agent.run_sync(user_input)
# The returned data is an Object, with IntelliSense support
data = result.data
print(f"[{data.priority.upper()}] Category: {data.category}")
print(f"Summary: {data.summary_vi}")
analyze_customer_issue()
Dependency Injection: A Feature Worth Its Weight in Gold
One thing I really love about PydanticAI is how it handles Dependencies. When an Agent needs to access a database or call an external API, you no longer have to pass tricky global variables.
You can define a Deps class and inject it directly into the Agent. This approach is extremely useful for writing Unit Tests or when you need to switch database configurations flexibly between Dev and Prod environments.
from dataclasses import dataclass
@dataclass
class MyDeps:
db_session: str # Actual DB Session
api_key: str
agent_with_deps = Agent('openai:gpt-4o', deps_type=MyDeps)
@agent_with_deps.tool
def get_user_info(ctx, user_id: int) -> str:
# Access deps cleanly via ctx.deps
return f"User {user_id} from database {ctx.deps.db_session} is a VIP"
Why Is This the Top Choice Today?
After months of working with various frameworks, I’ve narrowed down three reasons why PydanticAI is so worth using:
- High Reliability: The automatic retry mechanism when validation fails eliminates 90% of runtime errors related to data formatting.
- Better Coding Experience: By using Pydantic, VS Code or PyCharm will provide excellent code completion. You no longer have to remember if a field is an
intorstr, or exactly what its name is. - Easy to Control: The framework doesn’t hide logic behind mysterious layers of abstraction. It’s still pure Python, making it easy to integrate into your existing CI/CD workflows.
If you are starting a serious AI Agent project, skip the unpredictable strings and try this Type-safe approach. Trust me, it will help you sleep better every time you hit that deploy button to Production.

