AI Executive Assistant
Project Summary
Type: Portfolio / Demo Project
Focus: Production-Grade AI Pipeline
Key Features:
- End-to-end AI pipeline with email ingestion and classification (95%+ accuracy)
- Advanced retrieval: Contextual Retrieval, Self-Query, Reciprocal Rank Fusion
- Custom workflow graphs with self-correction mechanisms
- Multi-model orchestration with type-safe structured outputs
- Full observability pipeline (token usage, latency, quality tracking)
- Security layer with PII detection and prompt injection protection
This project demonstrates a production-ready AI system that ingests emails, classifies them with high accuracy, and retrieves context from a 10,000+ chunk knowledge base using advanced RAG techniques. Built from scratch with custom workflow orchestration similar to LangGraph, showcasing enterprise-grade AI architecture.
The Problem
Most RAG implementations are "weekend demos"—they work in a notebook but fall apart in production. Real enterprise deployments need:
- Robust retrieval that handles ambiguous queries
- Multiple retrieval strategies that complement each other
- Scalable architecture that doesn't break under load
- Clear separation of concerns for maintainability
Architecture
flowchart TB
subgraph ingestion [Email Ingestion & Processing]
emails[Email Input] --> classifier[Email Classifier<br/>95%+ Accuracy]
classifier --> chunker[Smart Chunker]
chunker --> enricher[Context Enricher]
enricher --> embedder[Embedding Generator]
embedder --> vectordb[(pgVector<br/>10,000+ Chunks)]
end
subgraph retrieval [Advanced Retrieval]
query[User Query] --> contextual[Contextual Retrieval]
query --> selfquery[Self-Query]
query --> semantic[Semantic Search]
contextual --> rrf[RRF Fusion]
selfquery --> rrf
semantic --> rrf
vectordb --> contextual
vectordb --> selfquery
vectordb --> semantic
end
subgraph orchestration [Workflow Orchestration]
rrf --> workflow[Custom Workflow Graph<br/>Self-Correction]
workflow --> multi[Multi-Model<br/>Orchestration]
multi --> structured[Type-Safe<br/>Structured Outputs]
end
subgraph observability [Observability Layer]
structured --> langfuse[Langfuse<br/>Token Usage<br/>Latency<br/>Quality Tracking]
end
subgraph security [Security Layer]
langfuse --> pii[PII Detection]
langfuse --> injection[Prompt Injection<br/>Protection LLM Guard]
pii --> response[Response]
injection --> response
end
Technical Approach
Email Classification Pipeline
The system ingests emails and classifies them with 95%+ accuracy using a multi-stage classification pipeline. This ensures that only relevant emails trigger the RAG retrieval process, reducing noise and improving response quality.
Advanced Retrieval Techniques
The system employs three complementary retrieval strategies:
- Contextual Retrieval: Enriches document chunks with context about their position in the broader document structure, dramatically improving retrieval accuracy for complex queries
- Self-Query: Allows the system to decompose complex queries into structured filters and semantic search components
- Reciprocal Rank Fusion (RRF): Combines results from multiple retrieval strategies, giving better results than any single method alone
Custom Workflow Orchestration
Built custom workflow graphs from scratch (similar to LangGraph) with self-correction mechanisms. The system can detect when initial outputs don't meet quality thresholds and automatically retry with adjusted parameters or alternative strategies.
Multi-Model Orchestration
The system orchestrates multiple LLM calls with type-safe structured outputs, ensuring consistent data formats and enabling complex multi-step reasoning workflows.
Production Architecture
- Clean separation between ingestion, retrieval, orchestration, and generation
- Comprehensive logging for debugging and monitoring
- Graceful error handling at every layer with automatic retries
- Configuration-driven behavior for easy deployment
Results: Naive vs Advanced Approach
| Metric | Naive RAG | This Implementation |
|---|---|---|
| Email classification accuracy | ~70% | 95%+ |
| Knowledge base size | < 1,000 chunks | 10,000+ chunks |
| Retrieval accuracy on ambiguous queries | ~60% | ~85% |
| Retrieval strategies | Single method | 3 methods + RRF fusion |
| Error recovery | Crashes | Self-correction mechanisms |
| Observability | Basic logging | Full pipeline tracking |
| Security | None | PII detection + injection protection |
| Production readiness | Manual deployment | Docker + CI/CD ready |
Observability
Full observability pipeline built with Langfuse tracks:
- Token Usage: Monitor API costs and usage patterns across all LLM calls
- Latency: Track response times at each stage of the pipeline
- Quality Metrics: Measure retrieval relevance, classification accuracy, and response quality
- Error Tracking: Comprehensive error logging with context for debugging
This observability layer enables data-driven optimization and cost management, critical for production AI systems.
Security
Enterprise-grade security layer protects against common AI vulnerabilities:
- PII Detection: Automatically detects and redacts personally identifiable information before processing
- Prompt Injection Protection: LLM Guard integration prevents malicious prompt injection attacks
- Input Validation: Strict validation at every pipeline stage
These security measures ensure the system can handle sensitive enterprise data safely.
Tech Stack
Python FastAPI Celery OpenAI API PostgreSQL pgVector Docker Langfuse LLM Guard
Video Walkthrough
Coming Soon
A 5-10 minute video demo walking through the architecture and showing the system in action. Check back soon!
Key Learnings
This project reinforced that the gap between "working demo" and "production system" is where most AI projects fail. The techniques I've implemented here—contextual retrieval, RRF, robust error handling—are exactly what enterprises need but rarely get from typical AI vendors.
-
Want something like this for your company?
I build production-ready RAG systems for scale-up companies. Let's discuss your AI challenges.