AI System Overview¶
How the AI layer is structured, what models it uses, how it retrieves and cites, and where each component fits in the product surface.
1. Architecture at a glance¶
flowchart TD
subgraph UserLayer[User-Facing Layer]
BrokerUI[Broker Chat<br/>Web + WhatsApp]
AnalytixUI[Analytix Dashboard]
FracUI[Fractional AI Overlay]
end
subgraph AgentLayer[Agent Layer]
BrokerAgent[Honest Broker Agent<br/>orchestrator]
AnalytixAgent[Analytix Query Agent<br/>dashboard queries]
FracAgent[Fractional Summary Agent<br/>asset cards]
end
subgraph ToolLayer[Tool Layer]
SearchTool[Attribute Search<br/>canonical store queries]
RAGTool[Document RAG<br/>cited retrieval]
CompareTool[Comparable Engine<br/>similarity + ranking]
SimTool[Simulation Engine<br/>Monte Carlo + IRR]
ScoreTool[Score Lookup<br/>trust, title, risk, persona fit]
CostCalc[Cost Calculator<br/>stamp, GST, hidden fees]
GRTool[GR Intelligence<br/>policy feed]
SentimentTool[Sentiment Aggregator]
end
subgraph DataLayer[Data Layer]
CanonStore[Canonical Attribute Store<br/>~140 attrs + quality passport]
VectorStore[Vector Store<br/>document chunks]
DocStore[Document Store<br/>PDFs + metadata]
end
UserLayer --> AgentLayer
AgentLayer --> ToolLayer
ToolLayer --> DataLayer
2. Design principles¶
2.1 Tool-using agents, not monolithic prompts¶
Each product surface is powered by a thin agent that orchestrates deterministic tools. The agent handles conversation management, tool selection, and output formatting. The tools do the actual computation and retrieval.
Why: - Auditability: every tool call is logged with inputs/outputs, enabling "why did the bot say X?" investigations - Citability: tools return structured data with source metadata; the agent weaves citations into natural language - Testability: tools can be unit-tested independently; agents are eval'd end-to-end - Compliance: the agent layer enforces the Honest Broker rules (no recommendation language, probability framing)
2.2 Citation by construction, not post-hoc¶
The system does NOT generate text and then try to find sources. It retrieves sources FIRST and then generates text grounded in them. This is the critical architectural choice that makes "shows its work" possible.
Flow:
User query → Agent plans retrieval → Tools fetch data + docs with sources → Agent generates grounded response with inline citations → Post-generation verification → Serve to user
2.3 Deterministic where possible, probabilistic where necessary¶
| Computation type | Approach | Example |
|---|---|---|
| Lookups | Deterministic tool | "What's the RERA number for project X?" |
| Scores | Deterministic formula | Developer Trust Score, Title Clarity |
| Aggregates | Deterministic with range | Median price ₹/sqft with sample size |
| Comparisons | Deterministic ranking | Top 10 comparables by similarity |
| Narratives | LLM generation (grounded) | "Why this, what could go wrong" |
| Projections | Stochastic simulation | Monte Carlo wealth trajectories |
The system uses LLMs only where deterministic computation is insufficient (narratives, explanations, entity resolution in ambiguous cases). Every other computation is formula-driven.
3. Component deep-dive¶
3.1 Agent layer¶
Three agents, one per product surface, all sharing the tool layer.
| Agent | Input | Output | Tools used |
|---|---|---|---|
| Honest Broker Agent | Natural-language user query + persona context | Cited, Honest-Broker-compliant response | All tools |
| Analytix Query Agent | Structured dashboard query (micromarket, segment, time range) | Data payload for dashboard widgets | SearchTool, CompareTool, GRTool, SentimentTool |
| Fractional Summary Agent | Asset ID | AI summary card content | SearchTool, RAGTool, ScoreTool, SimTool |
Each agent has:
- A system prompt encoding the Honest Broker rules (from ../00-soul/SOUL.md)
- A tool manifest describing available tools
- A compliance filter applied post-generation (see section 5)
- A conversation memory (for Broker — multi-turn; for Analytix and Fractional — single-turn)
3.2 Tool layer¶
SearchTool — Canonical Attribute Store queries¶
Input: { entity_type, entity_id?, filters, attributes_requested }
Output: { results: [ { entity_id, attributes: { ... }, quality_passport: { ... } } ] }
Queries the canonical store. Always returns quality passport with every value. Supports filtering by micromarket, segment, date range, developer, etc.
RAGTool — Document retrieval with citation¶
Input: { query, entity_ids?, doc_types?, top_k }
Output: { chunks: [ { text, doc_id, page, source_url, relevance_score } ] }
Retrieves document chunks from the vector store. Each chunk carries its source metadata so the agent can cite "per Form B v3, page 4."
Implementation: - Documents chunked at page level (for PDFs) or paragraph level (for text) - Embeddings: text-embedding-3-large or equivalent - Vector store: Vishal's choice (pgvector / Qdrant / Pinecone) - Hybrid search: dense (semantic) + sparse (BM25) for entity-name recall
CompareTool — Comparable engine¶
Input: { entity_id, k, filters? }
Output: { comparables: [ { entity_id, similarity_score, key_diffs: { ... } } ] }
Returns K most similar projects/transactions. Uses embedding similarity over feature vectors (sector, size, price band, micromarket, developer tier). See ai.comparable_set in ../20-data/derived-attributes-spec.md.
SimTool — Monte Carlo simulation¶
Input: { asset_attributes, assumptions, horizon_years, n_paths }
Output: { wealth_paths: { p20, p50, p80 }, irr: { p20, p50, p80 }, assumptions_used }
Runs Monte Carlo per the wealth trajectory spec. Returns distribution summaries, not raw paths. Always includes the assumptions used so the agent can display them.
ScoreTool — Derived score lookup¶
Input: { entity_id, score_ids }
Output: { scores: { score_id: { value, inputs, version, confidence } } }
Looks up precomputed derived scores (Developer Trust, Title Clarity, Zone Risk, Persona Fit, etc.). Returns with full inputs for explainability.
CostCalc — Hidden cost calculator¶
Input: { property_value, property_type, city, buyer_type, loan_details? }
Output: { breakdown: { stamp_duty, registration, gst, legal, brokerage, society_transfer, maintenance_deposit, ... }, total, notes }
Deterministic calculator. Encodes Maharashtra stamp duty rules, GST rules, and typical fees. Updated when rates change.
GRTool — Government Resolution intelligence¶
Input: { micromarket_id?, district?, department?, days, impact_direction? }
Output: { grs: [ { gr_id, title, department, date, impact, summary, source_url } ] }
Returns classified GRs for a geography/timeframe. Each GR has a pre-computed impact direction and summary from the NLP classifier.
SentimentTool — Aggregated sentiment¶
Input: { entity_type, entity_id, days }
Output: { sentiment_score, mention_count, top_topics: [ { topic, sentiment, count } ], sources }
Returns aggregated sentiment per ai.sentiment_score spec.
3.3 Data layer¶
| Store | Technology (Vishal's choice) | Contents |
|---|---|---|
| Canonical Attribute Store | Postgres / data warehouse | ~140 attributes with quality passport, per entity |
| Vector Store | pgvector / Qdrant / Pinecone | Document chunks with embeddings + metadata |
| Document Store | S3 / object storage | Raw PDFs + metadata |
| Conflict Log | Postgres | Source disagreements |
| Audit Log | Append-only store | Every tool call, every generation, every verification |
4. LLM usage patterns¶
4.1 Where LLMs are used¶
| Use case | Model tier | Latency target | Cost sensitivity |
|---|---|---|---|
| Broker conversation | Frontier (GPT-4o / Claude Sonnet) | < 3s first token | Medium — cached context helps |
| Narrative generation (alpha, risk, developer summary) | Frontier | < 5s | Medium |
| Document extraction (OCR + extraction from PDFs) | Frontier | Batch, offline | High — volume is large |
| GR classification | Fine-tuned mid-tier or frontier | < 2s | High — daily volume |
| Title chain explanation | Frontier | On-demand, < 5s | Low volume |
| Sentiment analysis | Mid-tier or fine-tuned | Batch | High |
| Translation (Marathi ↔ English) | Frontier or fine-tuned | Batch + on-demand | Medium |
| Compliance filter | Regex + small classifier | < 100ms | N/A |
4.2 LLM cost management¶
- Caching: cache responses for identical or near-identical queries (semantic dedup). MahaRERA project narratives change slowly — cache for 7 days.
- Batch vs real-time: precompute narratives and scores offline where possible; serve from cache.
- Model routing: use cheaper models for classification/extraction, frontier for user-facing generation.
- Context management: don't stuff entire documents into context — use RAG to retrieve relevant chunks only.
- Token budgets: set per-request token limits (input + output) with graceful degradation.
4.3 LLM provider strategy¶
- Primary: OpenAI or Anthropic (Vishal to decide — see
../90-memory/open-questions.md) - Fallback: secondary provider for availability
- Abstraction: provider-agnostic abstraction layer so we can switch without product changes
- Self-hosted fine-tune: evaluate for high-volume tasks (GR classification, extraction) where a fine-tuned open-source model (Llama, Mistral) may be cheaper at volume
- Data residency: for PII-adjacent queries, ensure prompts don't leak PII; use India-region endpoints where available
5. Compliance enforcement in the AI layer¶
5.1 Pre-generation¶
- System prompt includes Honest Broker rules as hard constraints
- User persona data included only with explicit consent
- PII (PAN, Aadhaar, phone) never included in LLM prompts — pseudonymised
5.2 Post-generation verification¶
Every LLM output passes through:
1. CITATION VERIFICATION
- Extract all factual claims
- Verify each against canonical store
- Reject if any claim can't be traced
2. TONE COMPLIANCE
- Regex check for banned phrases ("you should", "I recommend", "guaranteed", etc.)
- Classifier for advisory language
- Reject and regenerate if violated
3. DEFAMATION GUARD
- Check for developer/entity references
- Verify percentile framing (not absolute judgments)
- Verify source citations present
- Flag any "fraudulent" / "scam" language
4. PROBABILITY FRAMING
- Check projections have ranges and confidence
- No point estimates without bounds
5.3 Rejection handling¶
When post-generation verification fails: 1. Regenerate with stricter prompt (up to 2 retries) 2. If still fails: serve partial response with failed claims removed + "I couldn't verify some details — here's what I can confirm" 3. Log all failures for eval team review
6. Conversation management (Broker)¶
6.1 Multi-turn memory¶
The Broker maintains conversation context across turns: - Short-term: current session context (entities discussed, comparisons in progress, user preferences expressed) - Long-term: user persona (with consent), past queries, saved properties - No PII in LLM context: persona stored separately; only persona features (goal, horizon, risk) injected
6.2 Conversation state machine¶
stateDiagram-v2
[*] --> Welcome
Welcome --> PersonaCapture: first time
Welcome --> Ready: returning user
PersonaCapture --> Ready: consent given
PersonaCapture --> Ready: consent declined (no persona features)
Ready --> PropertyLookup: "tell me about project X"
Ready --> Comparison: "compare A and B"
Ready --> Simulation: "what if I invest X"
Ready --> DeveloperReview: "is this developer reliable"
Ready --> PolicyQuery: "any new GRs affecting Hinjewadi"
Ready --> CostBreakdown: "what are all the costs"
Ready --> TitleWalkthrough: "walk me through the title"
Ready --> GeneralQuestion: anything else
PropertyLookup --> Comparison: "compare with alternatives"
PropertyLookup --> Simulation: "run a scenario"
PropertyLookup --> CostBreakdown: "show me costs"
Comparison --> Simulation: "simulate the best option"
PropertyLookup --> Ready: done
Comparison --> Ready: done
Simulation --> Ready: done
6.3 Handoff patterns¶
When the Broker can't help: - Legal questions → "This needs a lawyer. Here's what I know, but get professional advice on [specific issue]." - Tax structuring → "This needs a CA. Here are the inputs they'll want from you." - Transaction execution → "I don't handle bookings. Contact [developer office / broker]. Here's what to verify before you commit."
7. Infrastructure considerations¶
| Concern | Approach |
|---|---|
| Latency | < 3s for simple queries (cached), < 8s for complex (simulation, multi-tool) |
| Availability | 99.5% uptime target for Broker; 99.9% for Analytix API |
| Scaling | Horizontal scaling of agent layer; tool layer scales independently |
| Cost | Budget ₹2-5 per complex Broker query; ₹0.1-0.5 for cached/simple; ₹0.01-0.05 per Analytix API call |
| Security | No PII in LLM prompts; encrypted at rest and in transit; audit logging |
| Observability | Every tool call logged with latency, inputs, outputs; LLM usage tracked per query |
8. Build sequence¶
| Phase | What | Timeline |
|---|---|---|
| Phase 0 | Tool layer stubs + canonical store integration + RAG pipeline | Months 1-2 |
| Phase 1 | Broker agent v0 (single-turn, Pune residential, 3 tools) | Month 3 |
| Phase 2 | Broker agent v1 (multi-turn, all tools, compliance filter) | Month 4-5 |
| Phase 3 | Analytix query agent + dashboard API | Month 4-6 |
| Phase 4 | Fractional summary agent | Month 5-6 |
| Phase 5 | Bilingual, voice, WhatsApp | Month 7+ |
See also:
- agent-design.md — deep-dive on the Honest Broker Agent
- evaluation-framework.md — how we measure honesty + accuracy
- ../20-data/pipeline-spec-for-vishal.md — data layer contract
- ../00-soul/SOUL.md — the identity the AI embodies