GenAI & LLM Tracing π΅οΈ
flowyml provides built-in observability for Large Language Models (LLMs), giving you X-ray vision into your GenAI applications.
Comprehensive Guide Available
For full integration docs covering LangGraph, LangChain, OpenAI, CrewAI, session tracing, auto-evaluations, and more β see the GenAI Observability Guide.
What you'll learn
How to track token usage, costs, and latency for every LLM call. LLMs are black boxes β tracing turns them into transparent, measurable components.
Architecture
flowchart TB
subgraph Decorators["Decorator Layer"]
TL["@trace_llm"]
OB["@observe"]
TG["trace_graph()"]
TO["TracedOpenAI"]
end
TL -->|captures| E[Trace Event / Span]
OB -->|captures| E
TG -->|captures| E
TO -->|captures| E
E -->|stores| S[SQLite Metadata Store]
S -->|serves| API[REST API /api/traces]
API -->|renders| UI[Traces Dashboard]
E -->|bridge| B[TraceBridge]
B -->|converts| D[EvalDataset]
D -->|scores| R[Evaluation Result]
Why Tracing Matters
Without tracing: - Hidden costs: "Why is our OpenAI bill $500 this month?" - Latency spikes: "Why is the chatbot taking 10 seconds?" - Quality issues: "What exact prompt caused this hallucination?"
With flowyml tracing: - Cost transparency: See cost per call, per user, or per pipeline - Performance metrics: Pinpoint slow steps in your RAG chain - Full context: See the exact prompt and completion for every interaction
π΅οΈ LLM Call Tracing
You can trace any function as an LLM call or a chain of calls using the @trace_llm decorator. flowyml automatically captures:
- β Prompts & Completions: Full text of the inputs and outputs.
- β Token Usage: Breakdown of prompt, completion, and total tokens.
- β Cost Estimation: Automatic cost calculation based on the model tier.
- β Latency: Precise timing of each call.
Basic Usage
Advanced: Nesting Traces (Chains)
For complex workflows like RAG (Retrieval-Augmented Generation), you can nest traces to see exactly where time and money are spent.
Pro Tip
Use event_type="chain" for the parent function and "llm" or "tool" for children. This creates a beautiful nested waterfall view in the UI.
π Viewing Traces
Traces are automatically persisted to the metadata store and can be visualized in the flowyml UI.
In the UI
Navigate to the Traces tab in the flowyml Dashboard (http://localhost:8080/traces). You will see:
- Timeline View: A waterfall chart of your traces.
- Latency & Cost: Aggregated metrics for each trace.
- Inputs & Outputs: Full inspection of prompts and completions.
- Token Usage: Detailed breakdown of prompt vs. completion tokens.
Programmatic Access
You can also retrieve traces via the Python API for analysis.
π·οΈ Trace Attributes
You can add custom attributes to your traces for better filtering and analysis.
π Trace β Evaluation (TraceBridge)
Convert traced LLM interactions into evaluation datasets for automated quality auditing:
Continuous Monitoring
Combine evaluate_traces() with EvalSchedule to automatically evaluate new traces every night.
π° Cost Reference
| Model | Input (per 1K) | Output (per 1K) |
|---|---|---|
gpt-4o |
$0.005 | $0.015 |
gpt-4o-mini |
$0.00015 | $0.0006 |
gpt-4-turbo |
$0.01 | $0.03 |
claude-3-opus |
$0.015 | $0.075 |
claude-3-sonnet |
$0.003 | $0.015 |
Custom Cost Models
Override pricing with @trace_llm(cost_per_1k_input=..., cost_per_1k_output=...).
π REST API
| Endpoint | Method | Description |
|---|---|---|
/api/traces/ |
GET | List traces (filterable by project, type, trace_id) |
/api/traces/{trace_id} |
GET | Get a specific trace tree |
/api/traces/ |
POST | Create or update a trace event |
π Event Types Reference
| Event Type | Description | Use Case |
|---|---|---|
llm |
A direct LLM API call | openai.chat.completions.create() |
chat_model |
A chat model call | ChatOpenAI.invoke() |
chain |
A parent wrapper/chain | RAG pipeline, multi-step workflow |
tool |
A tool/function call | Vector DB search, API call, calc |
agent |
An autonomous agent loop | ReAct agent, planning loops |
agent_action |
A specific agent action | Tool selection within agent loop |
embedding |
Embedding generation | openai.embeddings.create() |
retriever |
RAG retriever | Vector DB search |
graph_node |
LangGraph node execution | State machine transitions |
session |
Session-level event | Multi-turn conversation |
custom |
User-defined event | Any custom span |
Next Steps
- GenAI Observability Guide β Full integration docs for LangGraph, LangChain, OpenAI, CrewAI, session tracing, auto-evals
- Evaluations β Built-in scorers for quality, toxicity, and relevance
- Notifications β Set up alerts when traces breach quality thresholds