LAYER 0
Interaction & Channel
↗
Human Users
- Web / Chat UI
- Mobile App
- Voice Interface
- Email / Slack / Teams
External Agents
- 3rd-party AI Agents
- Partner Systems
- Automation Bots
- Scheduled Jobs
Event Sources
- Webhooks / APIs
- IoT / Sensors
- DB Change Streams
- File / Queue Triggers
Integration Hub
- REST / GraphQL / gRPC
- Event Bus (Kafka)
- WebSocket streams
- MCP Tool Endpoints
- A2A Agent Mesh
Auth · Rate-Limit · Identity · Routing
LAYER 1
API Gateway & Security Perimeter
↗
API Gateway
- OAuth2 / JWT / SAML
- Rate Limiting & Throttling
- Load Balancing
- SSL Termination
- Request Routing
Identity & Access
- IAM / RBAC / ABAC
- Agent Identity (DIDs)
- Secret Management
- Zero-Trust Policies
Prompt Firewall
- Injection Detection
- PII Scrubbing
- Jailbreak Detection
- Content Pre-filtering
Observability Gateway
- Distributed Tracing
- Token & Cost Metering
- Latency Monitoring
- Audit Log Stream
Orchestration Bus · A2A Protocol
LAYER 2
Agent Orchestration & Multi-Agent System
↗
Supervisor / Orchestrator Agent
- Goal decomposition (ReAct / CoT / ToT)
- Task planning & delegation
- A2A agent discovery
- Human-in-the-Loop (HiTL) escalation
- Session state management
Research Agent
- Web / Doc search
- Agentic RAG queries
- Data extraction
- Fact verification
Action Agent
- API / tool calls
- Code execution
- Form / UI automation
- Write to systems
Generation Agent
- Report drafting
- Code generation
- Email & comms
- Summarisation
Eval / Critic Agent
- Output validation
- LLM-as-judge
- Confidence scoring
- Feedback loops
MCP Tool Registry ~ Universal Tool Interface
- Enterprise: Salesforce · SAP · ServiceNow · Jira · Oracle ERP
- Data: SQL · NoSQL · Elasticsearch · Web Search APIs
- Comms: Gmail · Slack · Teams · Notion · Confluence
- Custom: Internal APIs · Legacy Adapters · IoT Interfaces
Inference Requests · Model Routing · Context Management
LAYER 3
AI Model Layer ~ The Brain
↗
Foundation LLMs (Cloud)
- GPT-4o / o3 (OpenAI)
- Claude 3.7 (Anthropic)
- Gemini Pro / Ultra (Google)
- Llama 3.x via API (Meta)
- Mistral Large
SLMs ~ On-Prem / Edge
- Phi-4 Mini (Microsoft)
- Llama 3.2 1B–3B (Meta)
- Gemma 2 2B (Google)
- Mistral 7B Instruct
- LoRA domain fine-tunes
Model Router
- Task-based routing
- Cost optimisation
- Latency SLA routing
- Fallback chains
- A/B model testing
Memory System
- Short-term (in-context)
- Long-term (vector store)
- Episodic (session logs)
- Semantic (knowledge graph)
- Procedural (skill store)
Retrieval Pipeline · Vector Search · Structured Queries
LAYER 4
Data Ingestion, RAG & Knowledge Layer
↗
Data Ingestion Pipeline
- Collect → Extract → Transform → Chunk → Embed → Index
- PDF / Word / HTML / Email (Docling, Unstructured.io)
- Structured: SQL, CSV, JSON, APIs
- Streaming: Kafka, Kinesis, CDC
- Embeddings: text-embedding-3, E5, BGE
Agentic RAG Engine
- Naive → Advanced → Agentic RAG
- Hybrid search (dense + BM25)
- GraphRAG (knowledge graph traversal)
- Query rewriting & HyDE
- Cross-encoder re-ranking
- Self-RAG (reflection loops)
Vector Databases
- Pinecone / Weaviate / Qdrant
- pgvector (Postgres)
- Chroma (local / SME)
- Milvus (self-hosted)
Knowledge Graph
- Neo4j · Amazon Neptune
- GraphRAG (Microsoft)
- Entity / relation extraction
- Cognee (deterministic)
Policy Enforcement · Compliance Controls
LAYER 5
Guardrails, Governance & Trust
↗
Input Guardrails
- Prompt injection blocking
- Toxic content filter
- PII detection & masking
- Off-topic rejection
- NeMo / Guardrails AI
Output Guardrails
- Hallucination detection
- Fact grounding check
- Sensitive data redaction
- Schema validation
- Response toxicity scan
Compliance & Audit
- GDPR / HIPAA / SOC2
- Immutable audit trail
- Explainability (XAI)
- Agent approval flows
- Data residency control
Observability & Evals
- LangFuse / LangSmith traces
- Prompt versioning
- A/B model testing
- Cost dashboards
- Drift & degradation alerts
Security Controls
- Agent permission sandboxing
- Least-privilege tool access
- MCP supply chain security
- MAESTRO / OWASP LLM Top 10
- Red-teaming & adversarial evals
Infrastructure Services
LAYER 6
Infrastructure, Deployment & Operations
↗
Deployment Models
- Enterprise Cloud: AWS / Azure / GCP · multi-region K8s autoscale
- Enterprise Hybrid: on-prem SLMs + cloud LLMs via private VPC
- SME SaaS: managed single-tenant · serverless · pay-per-use
- Edge / Air-gapped: Ollama + SLM · local vector DB · no internet
Containers & Orchestration
- Docker / Kubernetes / Helm
- Serverless (Lambda / Cloud Run)
- GPU node pools (vLLM)
- Service mesh (Istio)
CI/CD & MLOps
- GitHub Actions / GitLab CI
- Prompt versioning (DSPy)
- Canary model deployments
- IaC: Terraform / Pulumi
- MLflow / W&B versioning
Cost Control
- Token budget management
- SLM fallback routing
- Semantic caching
- Batch vs. streaming splits
- Cloud spend dashboards
Industry Standards & Interoperability Protocols ~ Cross-Cutting
MCP ~ Model Context Protocol
A2A ~ Agent-to-Agent (Google)
ACP ~ Agent Comm Protocol (IBM)
ANP ~ Agent Network Protocol
Event-Driven Architecture
OWASP LLM Top 10
MAESTRO (CSA)
ISO/IEC 42001 · EU AI Act
Enterprise vs. SME Deployment Strategy
🏛 Enterprise Stack
Multi-agent orchestration (LangGraph + AutoGen + Semantic Kernel)
Hybrid cloud: on-prem SLMs + cloud LLMs with intelligent router
Full MLOps: fine-tuning, versioning, A/B model deployment
Advanced RAG: GraphRAG + Knowledge Graph + Self-RAG
Full compliance: GDPR, HIPAA, SOC2, EU AI Act, ISO 42001
Event-driven architecture with Kafka / Kinesis backbone
Multi-tenant, RBAC/ABAC, audit trail, agent identity (DIDs)
🏪 SME Stack
Single/dual-agent setup (CrewAI or simple LangGraph)
API-only LLMs (OpenAI / Anthropic) ~ no self-hosting needed
Managed RAG (LlamaCloud / LangChain hosted)
Chroma / pgvector for vector storage (zero-ops)
NeMo Guardrails for basic input/output safety filters
Docker Compose or serverless (pay-per-use) deployment
Low-code builders: Flowise / Dify / LangFlow for rapid iteration