AI System Overview¶

How the AI layer is structured, what models it uses, how it retrieves and cites, and where each component fits in the product surface.

1. Architecture at a glance¶

flowchart TD
    subgraph UserLayer[User-Facing Layer]
        BrokerUI[Broker Chat<br/>Web + WhatsApp]
        AnalytixUI[Analytix Dashboard]
        FracUI[Fractional AI Overlay]
    end

    subgraph AgentLayer[Agent Layer]
        BrokerAgent[Honest Broker Agent<br/>orchestrator]
        AnalytixAgent[Analytix Query Agent<br/>dashboard queries]
        FracAgent[Fractional Summary Agent<br/>asset cards]
    end

    subgraph ToolLayer[Tool Layer]
        SearchTool[Attribute Search<br/>canonical store queries]
        RAGTool[Document RAG<br/>cited retrieval]
        CompareTool[Comparable Engine<br/>similarity + ranking]
        SimTool[Simulation Engine<br/>Monte Carlo + IRR]
        ScoreTool[Score Lookup<br/>trust, title, risk, persona fit]
        CostCalc[Cost Calculator<br/>stamp, GST, hidden fees]
        GRTool[GR Intelligence<br/>policy feed]
        SentimentTool[Sentiment Aggregator]
    end

    subgraph DataLayer[Data Layer]
        CanonStore[Canonical Attribute Store<br/>~140 attrs + quality passport]
        VectorStore[Vector Store<br/>document chunks]
        DocStore[Document Store<br/>PDFs + metadata]
    end

    UserLayer --> AgentLayer
    AgentLayer --> ToolLayer
    ToolLayer --> DataLayer

2. Design principles¶

2.1 Tool-using agents, not monolithic prompts¶

Each product surface is powered by a thin agent that orchestrates deterministic tools. The agent handles conversation management, tool selection, and output formatting. The tools do the actual computation and retrieval.

Why: - Auditability: every tool call is logged with inputs/outputs, enabling "why did the bot say X?" investigations - Citability: tools return structured data with source metadata; the agent weaves citations into natural language - Testability: tools can be unit-tested independently; agents are eval'd end-to-end - Compliance: the agent layer enforces the Honest Broker rules (no recommendation language, probability framing)

2.2 Citation by construction, not post-hoc¶

The system does NOT generate text and then try to find sources. It retrieves sources FIRST and then generates text grounded in them. This is the critical architectural choice that makes "shows its work" possible.

Flow:

User query → Agent plans retrieval → Tools fetch data + docs with sources → Agent generates grounded response with inline citations → Post-generation verification → Serve to user

2.3 Deterministic where possible, probabilistic where necessary¶

Computation type	Approach	Example
Lookups	Deterministic tool	"What's the RERA number for project X?"
Scores	Deterministic formula	Developer Trust Score, Title Clarity
Aggregates	Deterministic with range	Median price ₹/sqft with sample size
Comparisons	Deterministic ranking	Top 10 comparables by similarity
Narratives	LLM generation (grounded)	"Why this, what could go wrong"
Projections	Stochastic simulation	Monte Carlo wealth trajectories

The system uses LLMs only where deterministic computation is insufficient (narratives, explanations, entity resolution in ambiguous cases). Every other computation is formula-driven.

3. Component deep-dive¶

3.1 Agent layer¶

Three agents, one per product surface, all sharing the tool layer.

Agent	Input	Output	Tools used
Honest Broker Agent	Natural-language user query + persona context	Cited, Honest-Broker-compliant response	All tools
Analytix Query Agent	Structured dashboard query (micromarket, segment, time range)	Data payload for dashboard widgets	SearchTool, CompareTool, GRTool, SentimentTool
Fractional Summary Agent	Asset ID	AI summary card content	SearchTool, RAGTool, ScoreTool, SimTool

Each agent has: - A system prompt encoding the Honest Broker rules (from ../00-soul/SOUL.md) - A tool manifest describing available tools - A compliance filter applied post-generation (see section 5) - A conversation memory (for Broker — multi-turn; for Analytix and Fractional — single-turn)

3.2 Tool layer¶

SearchTool — Canonical Attribute Store queries¶

Input:  { entity_type, entity_id?, filters, attributes_requested }
Output: { results: [ { entity_id, attributes: { ... }, quality_passport: { ... } } ] }

Queries the canonical store. Always returns quality passport with every value. Supports filtering by micromarket, segment, date range, developer, etc.

RAGTool — Document retrieval with citation¶

Input:  { query, entity_ids?, doc_types?, top_k }
Output: { chunks: [ { text, doc_id, page, source_url, relevance_score } ] }

Retrieves document chunks from the vector store. Each chunk carries its source metadata so the agent can cite "per Form B v3, page 4."

Implementation: - Documents chunked at page level (for PDFs) or paragraph level (for text) - Embeddings: text-embedding-3-large or equivalent - Vector store: Vishal's choice (pgvector / Qdrant / Pinecone) - Hybrid search: dense (semantic) + sparse (BM25) for entity-name recall

CompareTool — Comparable engine¶

Input:  { entity_id, k, filters? }
Output: { comparables: [ { entity_id, similarity_score, key_diffs: { ... } } ] }

Returns K most similar projects/transactions. Uses embedding similarity over feature vectors (sector, size, price band, micromarket, developer tier). See ai.comparable_set in ../20-data/derived-attributes-spec.md.

SimTool — Monte Carlo simulation¶

Input:  { asset_attributes, assumptions, horizon_years, n_paths }
Output: { wealth_paths: { p20, p50, p80 }, irr: { p20, p50, p80 }, assumptions_used }

Runs Monte Carlo per the wealth trajectory spec. Returns distribution summaries, not raw paths. Always includes the assumptions used so the agent can display them.

ScoreTool — Derived score lookup¶

Input:  { entity_id, score_ids }
Output: { scores: { score_id: { value, inputs, version, confidence } } }

Looks up precomputed derived scores (Developer Trust, Title Clarity, Zone Risk, Persona Fit, etc.). Returns with full inputs for explainability.

CostCalc — Hidden cost calculator¶

Input:  { property_value, property_type, city, buyer_type, loan_details? }
Output: { breakdown: { stamp_duty, registration, gst, legal, brokerage, society_transfer, maintenance_deposit, ... }, total, notes }

Deterministic calculator. Encodes Maharashtra stamp duty rules, GST rules, and typical fees. Updated when rates change.

GRTool — Government Resolution intelligence¶

Input:  { micromarket_id?, district?, department?, days, impact_direction? }
Output: { grs: [ { gr_id, title, department, date, impact, summary, source_url } ] }

Returns classified GRs for a geography/timeframe. Each GR has a pre-computed impact direction and summary from the NLP classifier.

SentimentTool — Aggregated sentiment¶

Input:  { entity_type, entity_id, days }
Output: { sentiment_score, mention_count, top_topics: [ { topic, sentiment, count } ], sources }

Returns aggregated sentiment per ai.sentiment_score spec.

3.3 Data layer¶

Store	Technology (Vishal's choice)	Contents
Canonical Attribute Store	Postgres / data warehouse	~140 attributes with quality passport, per entity
Vector Store	pgvector / Qdrant / Pinecone	Document chunks with embeddings + metadata
Document Store	S3 / object storage	Raw PDFs + metadata
Conflict Log	Postgres	Source disagreements
Audit Log	Append-only store	Every tool call, every generation, every verification

4. LLM usage patterns¶

4.1 Where LLMs are used¶

Use case	Model tier	Latency target	Cost sensitivity
Broker conversation	Frontier (GPT-4o / Claude Sonnet)	< 3s first token	Medium — cached context helps
Narrative generation (alpha, risk, developer summary)	Frontier	< 5s	Medium
Document extraction (OCR + extraction from PDFs)	Frontier	Batch, offline	High — volume is large
GR classification	Fine-tuned mid-tier or frontier	< 2s	High — daily volume
Title chain explanation	Frontier	On-demand, < 5s	Low volume
Sentiment analysis	Mid-tier or fine-tuned	Batch	High
Translation (Marathi ↔ English)	Frontier or fine-tuned	Batch + on-demand	Medium
Compliance filter	Regex + small classifier	< 100ms	N/A

4.2 LLM cost management¶

Caching: cache responses for identical or near-identical queries (semantic dedup). MahaRERA project narratives change slowly — cache for 7 days.
Batch vs real-time: precompute narratives and scores offline where possible; serve from cache.
Model routing: use cheaper models for classification/extraction, frontier for user-facing generation.
Context management: don't stuff entire documents into context — use RAG to retrieve relevant chunks only.
Token budgets: set per-request token limits (input + output) with graceful degradation.

4.3 LLM provider strategy¶

Primary: OpenAI or Anthropic (Vishal to decide — see ../90-memory/open-questions.md)
Fallback: secondary provider for availability
Abstraction: provider-agnostic abstraction layer so we can switch without product changes
Self-hosted fine-tune: evaluate for high-volume tasks (GR classification, extraction) where a fine-tuned open-source model (Llama, Mistral) may be cheaper at volume
Data residency: for PII-adjacent queries, ensure prompts don't leak PII; use India-region endpoints where available

5. Compliance enforcement in the AI layer¶

5.1 Pre-generation¶

System prompt includes Honest Broker rules as hard constraints
User persona data included only with explicit consent
PII (PAN, Aadhaar, phone) never included in LLM prompts — pseudonymised

5.2 Post-generation verification¶

Every LLM output passes through:

1. CITATION VERIFICATION
   - Extract all factual claims
   - Verify each against canonical store
   - Reject if any claim can't be traced

2. TONE COMPLIANCE
   - Regex check for banned phrases ("you should", "I recommend", "guaranteed", etc.)
   - Classifier for advisory language
   - Reject and regenerate if violated

3. DEFAMATION GUARD
   - Check for developer/entity references
   - Verify percentile framing (not absolute judgments)
   - Verify source citations present
   - Flag any "fraudulent" / "scam" language

4. PROBABILITY FRAMING
   - Check projections have ranges and confidence
   - No point estimates without bounds

5.3 Rejection handling¶

When post-generation verification fails: 1. Regenerate with stricter prompt (up to 2 retries) 2. If still fails: serve partial response with failed claims removed + "I couldn't verify some details — here's what I can confirm" 3. Log all failures for eval team review

6. Conversation management (Broker)¶

6.1 Multi-turn memory¶

The Broker maintains conversation context across turns: - Short-term: current session context (entities discussed, comparisons in progress, user preferences expressed) - Long-term: user persona (with consent), past queries, saved properties - No PII in LLM context: persona stored separately; only persona features (goal, horizon, risk) injected

6.2 Conversation state machine¶

stateDiagram-v2
    [*] --> Welcome
    Welcome --> PersonaCapture: first time
    Welcome --> Ready: returning user
    PersonaCapture --> Ready: consent given
    PersonaCapture --> Ready: consent declined (no persona features)

    Ready --> PropertyLookup: "tell me about project X"
    Ready --> Comparison: "compare A and B"
    Ready --> Simulation: "what if I invest X"
    Ready --> DeveloperReview: "is this developer reliable"
    Ready --> PolicyQuery: "any new GRs affecting Hinjewadi"
    Ready --> CostBreakdown: "what are all the costs"
    Ready --> TitleWalkthrough: "walk me through the title"
    Ready --> GeneralQuestion: anything else

    PropertyLookup --> Comparison: "compare with alternatives"
    PropertyLookup --> Simulation: "run a scenario"
    PropertyLookup --> CostBreakdown: "show me costs"
    Comparison --> Simulation: "simulate the best option"

    PropertyLookup --> Ready: done
    Comparison --> Ready: done
    Simulation --> Ready: done

6.3 Handoff patterns¶

When the Broker can't help: - Legal questions → "This needs a lawyer. Here's what I know, but get professional advice on [specific issue]." - Tax structuring → "This needs a CA. Here are the inputs they'll want from you." - Transaction execution → "I don't handle bookings. Contact [developer office / broker]. Here's what to verify before you commit."

7. Infrastructure considerations¶

Concern	Approach
Latency	< 3s for simple queries (cached), < 8s for complex (simulation, multi-tool)
Availability	99.5% uptime target for Broker; 99.9% for Analytix API
Scaling	Horizontal scaling of agent layer; tool layer scales independently
Cost	Budget ₹2-5 per complex Broker query; ₹0.1-0.5 for cached/simple; ₹0.01-0.05 per Analytix API call
Security	No PII in LLM prompts; encrypted at rest and in transit; audit logging
Observability	Every tool call logged with latency, inputs, outputs; LLM usage tracked per query

8. Build sequence¶

Phase	What	Timeline
Phase 0	Tool layer stubs + canonical store integration + RAG pipeline	Months 1-2
Phase 1	Broker agent v0 (single-turn, Pune residential, 3 tools)	Month 3
Phase 2	Broker agent v1 (multi-turn, all tools, compliance filter)	Month 4-5
Phase 3	Analytix query agent + dashboard API	Month 4-6
Phase 4	Fractional summary agent	Month 5-6
Phase 5	Bilingual, voice, WhatsApp	Month 7+

See also: - agent-design.md — deep-dive on the Honest Broker Agent - evaluation-framework.md — how we measure honesty + accuracy - ../20-data/pipeline-spec-for-vishal.md — data layer contract - ../00-soul/SOUL.md — the identity the AI embodies