Skip to content

AI System Overview

How the AI layer is structured, what models it uses, how it retrieves and cites, and where each component fits in the product surface.


1. Architecture at a glance

flowchart TD
    subgraph UserLayer[User-Facing Layer]
        BrokerUI[Broker Chat<br/>Web + WhatsApp]
        AnalytixUI[Analytix Dashboard]
        FracUI[Fractional AI Overlay]
    end

    subgraph AgentLayer[Agent Layer]
        BrokerAgent[Honest Broker Agent<br/>orchestrator]
        AnalytixAgent[Analytix Query Agent<br/>dashboard queries]
        FracAgent[Fractional Summary Agent<br/>asset cards]
    end

    subgraph ToolLayer[Tool Layer]
        SearchTool[Attribute Search<br/>canonical store queries]
        RAGTool[Document RAG<br/>cited retrieval]
        CompareTool[Comparable Engine<br/>similarity + ranking]
        SimTool[Simulation Engine<br/>Monte Carlo + IRR]
        ScoreTool[Score Lookup<br/>trust, title, risk, persona fit]
        CostCalc[Cost Calculator<br/>stamp, GST, hidden fees]
        GRTool[GR Intelligence<br/>policy feed]
        SentimentTool[Sentiment Aggregator]
    end

    subgraph DataLayer[Data Layer]
        CanonStore[Canonical Attribute Store<br/>~140 attrs + quality passport]
        VectorStore[Vector Store<br/>document chunks]
        DocStore[Document Store<br/>PDFs + metadata]
    end

    UserLayer --> AgentLayer
    AgentLayer --> ToolLayer
    ToolLayer --> DataLayer

2. Design principles

2.1 Tool-using agents, not monolithic prompts

Each product surface is powered by a thin agent that orchestrates deterministic tools. The agent handles conversation management, tool selection, and output formatting. The tools do the actual computation and retrieval.

Why: - Auditability: every tool call is logged with inputs/outputs, enabling "why did the bot say X?" investigations - Citability: tools return structured data with source metadata; the agent weaves citations into natural language - Testability: tools can be unit-tested independently; agents are eval'd end-to-end - Compliance: the agent layer enforces the Honest Broker rules (no recommendation language, probability framing)

2.2 Citation by construction, not post-hoc

The system does NOT generate text and then try to find sources. It retrieves sources FIRST and then generates text grounded in them. This is the critical architectural choice that makes "shows its work" possible.

Flow:

User query → Agent plans retrieval → Tools fetch data + docs with sources → Agent generates grounded response with inline citations → Post-generation verification → Serve to user

2.3 Deterministic where possible, probabilistic where necessary

Computation type Approach Example
Lookups Deterministic tool "What's the RERA number for project X?"
Scores Deterministic formula Developer Trust Score, Title Clarity
Aggregates Deterministic with range Median price ₹/sqft with sample size
Comparisons Deterministic ranking Top 10 comparables by similarity
Narratives LLM generation (grounded) "Why this, what could go wrong"
Projections Stochastic simulation Monte Carlo wealth trajectories

The system uses LLMs only where deterministic computation is insufficient (narratives, explanations, entity resolution in ambiguous cases). Every other computation is formula-driven.


3. Component deep-dive

3.1 Agent layer

Three agents, one per product surface, all sharing the tool layer.

Agent Input Output Tools used
Honest Broker Agent Natural-language user query + persona context Cited, Honest-Broker-compliant response All tools
Analytix Query Agent Structured dashboard query (micromarket, segment, time range) Data payload for dashboard widgets SearchTool, CompareTool, GRTool, SentimentTool
Fractional Summary Agent Asset ID AI summary card content SearchTool, RAGTool, ScoreTool, SimTool

Each agent has: - A system prompt encoding the Honest Broker rules (from ../00-soul/SOUL.md) - A tool manifest describing available tools - A compliance filter applied post-generation (see section 5) - A conversation memory (for Broker — multi-turn; for Analytix and Fractional — single-turn)

3.2 Tool layer

SearchTool — Canonical Attribute Store queries

Input:  { entity_type, entity_id?, filters, attributes_requested }
Output: { results: [ { entity_id, attributes: { ... }, quality_passport: { ... } } ] }

Queries the canonical store. Always returns quality passport with every value. Supports filtering by micromarket, segment, date range, developer, etc.

RAGTool — Document retrieval with citation

Input:  { query, entity_ids?, doc_types?, top_k }
Output: { chunks: [ { text, doc_id, page, source_url, relevance_score } ] }

Retrieves document chunks from the vector store. Each chunk carries its source metadata so the agent can cite "per Form B v3, page 4."

Implementation: - Documents chunked at page level (for PDFs) or paragraph level (for text) - Embeddings: text-embedding-3-large or equivalent - Vector store: Vishal's choice (pgvector / Qdrant / Pinecone) - Hybrid search: dense (semantic) + sparse (BM25) for entity-name recall

CompareTool — Comparable engine

Input:  { entity_id, k, filters? }
Output: { comparables: [ { entity_id, similarity_score, key_diffs: { ... } } ] }

Returns K most similar projects/transactions. Uses embedding similarity over feature vectors (sector, size, price band, micromarket, developer tier). See ai.comparable_set in ../20-data/derived-attributes-spec.md.

SimTool — Monte Carlo simulation

Input:  { asset_attributes, assumptions, horizon_years, n_paths }
Output: { wealth_paths: { p20, p50, p80 }, irr: { p20, p50, p80 }, assumptions_used }

Runs Monte Carlo per the wealth trajectory spec. Returns distribution summaries, not raw paths. Always includes the assumptions used so the agent can display them.

ScoreTool — Derived score lookup

Input:  { entity_id, score_ids }
Output: { scores: { score_id: { value, inputs, version, confidence } } }

Looks up precomputed derived scores (Developer Trust, Title Clarity, Zone Risk, Persona Fit, etc.). Returns with full inputs for explainability.

CostCalc — Hidden cost calculator

Input:  { property_value, property_type, city, buyer_type, loan_details? }
Output: { breakdown: { stamp_duty, registration, gst, legal, brokerage, society_transfer, maintenance_deposit, ... }, total, notes }

Deterministic calculator. Encodes Maharashtra stamp duty rules, GST rules, and typical fees. Updated when rates change.

GRTool — Government Resolution intelligence

Input:  { micromarket_id?, district?, department?, days, impact_direction? }
Output: { grs: [ { gr_id, title, department, date, impact, summary, source_url } ] }

Returns classified GRs for a geography/timeframe. Each GR has a pre-computed impact direction and summary from the NLP classifier.

SentimentTool — Aggregated sentiment

Input:  { entity_type, entity_id, days }
Output: { sentiment_score, mention_count, top_topics: [ { topic, sentiment, count } ], sources }

Returns aggregated sentiment per ai.sentiment_score spec.

3.3 Data layer

Store Technology (Vishal's choice) Contents
Canonical Attribute Store Postgres / data warehouse ~140 attributes with quality passport, per entity
Vector Store pgvector / Qdrant / Pinecone Document chunks with embeddings + metadata
Document Store S3 / object storage Raw PDFs + metadata
Conflict Log Postgres Source disagreements
Audit Log Append-only store Every tool call, every generation, every verification

4. LLM usage patterns

4.1 Where LLMs are used

Use case Model tier Latency target Cost sensitivity
Broker conversation Frontier (GPT-4o / Claude Sonnet) < 3s first token Medium — cached context helps
Narrative generation (alpha, risk, developer summary) Frontier < 5s Medium
Document extraction (OCR + extraction from PDFs) Frontier Batch, offline High — volume is large
GR classification Fine-tuned mid-tier or frontier < 2s High — daily volume
Title chain explanation Frontier On-demand, < 5s Low volume
Sentiment analysis Mid-tier or fine-tuned Batch High
Translation (Marathi ↔ English) Frontier or fine-tuned Batch + on-demand Medium
Compliance filter Regex + small classifier < 100ms N/A

4.2 LLM cost management

  • Caching: cache responses for identical or near-identical queries (semantic dedup). MahaRERA project narratives change slowly — cache for 7 days.
  • Batch vs real-time: precompute narratives and scores offline where possible; serve from cache.
  • Model routing: use cheaper models for classification/extraction, frontier for user-facing generation.
  • Context management: don't stuff entire documents into context — use RAG to retrieve relevant chunks only.
  • Token budgets: set per-request token limits (input + output) with graceful degradation.

4.3 LLM provider strategy

  • Primary: OpenAI or Anthropic (Vishal to decide — see ../90-memory/open-questions.md)
  • Fallback: secondary provider for availability
  • Abstraction: provider-agnostic abstraction layer so we can switch without product changes
  • Self-hosted fine-tune: evaluate for high-volume tasks (GR classification, extraction) where a fine-tuned open-source model (Llama, Mistral) may be cheaper at volume
  • Data residency: for PII-adjacent queries, ensure prompts don't leak PII; use India-region endpoints where available

5. Compliance enforcement in the AI layer

5.1 Pre-generation

  • System prompt includes Honest Broker rules as hard constraints
  • User persona data included only with explicit consent
  • PII (PAN, Aadhaar, phone) never included in LLM prompts — pseudonymised

5.2 Post-generation verification

Every LLM output passes through:

1. CITATION VERIFICATION
   - Extract all factual claims
   - Verify each against canonical store
   - Reject if any claim can't be traced

2. TONE COMPLIANCE
   - Regex check for banned phrases ("you should", "I recommend", "guaranteed", etc.)
   - Classifier for advisory language
   - Reject and regenerate if violated

3. DEFAMATION GUARD
   - Check for developer/entity references
   - Verify percentile framing (not absolute judgments)
   - Verify source citations present
   - Flag any "fraudulent" / "scam" language

4. PROBABILITY FRAMING
   - Check projections have ranges and confidence
   - No point estimates without bounds

5.3 Rejection handling

When post-generation verification fails: 1. Regenerate with stricter prompt (up to 2 retries) 2. If still fails: serve partial response with failed claims removed + "I couldn't verify some details — here's what I can confirm" 3. Log all failures for eval team review


6. Conversation management (Broker)

6.1 Multi-turn memory

The Broker maintains conversation context across turns: - Short-term: current session context (entities discussed, comparisons in progress, user preferences expressed) - Long-term: user persona (with consent), past queries, saved properties - No PII in LLM context: persona stored separately; only persona features (goal, horizon, risk) injected

6.2 Conversation state machine

stateDiagram-v2
    [*] --> Welcome
    Welcome --> PersonaCapture: first time
    Welcome --> Ready: returning user
    PersonaCapture --> Ready: consent given
    PersonaCapture --> Ready: consent declined (no persona features)

    Ready --> PropertyLookup: "tell me about project X"
    Ready --> Comparison: "compare A and B"
    Ready --> Simulation: "what if I invest X"
    Ready --> DeveloperReview: "is this developer reliable"
    Ready --> PolicyQuery: "any new GRs affecting Hinjewadi"
    Ready --> CostBreakdown: "what are all the costs"
    Ready --> TitleWalkthrough: "walk me through the title"
    Ready --> GeneralQuestion: anything else

    PropertyLookup --> Comparison: "compare with alternatives"
    PropertyLookup --> Simulation: "run a scenario"
    PropertyLookup --> CostBreakdown: "show me costs"
    Comparison --> Simulation: "simulate the best option"

    PropertyLookup --> Ready: done
    Comparison --> Ready: done
    Simulation --> Ready: done

6.3 Handoff patterns

When the Broker can't help: - Legal questions → "This needs a lawyer. Here's what I know, but get professional advice on [specific issue]." - Tax structuring → "This needs a CA. Here are the inputs they'll want from you." - Transaction execution → "I don't handle bookings. Contact [developer office / broker]. Here's what to verify before you commit."


7. Infrastructure considerations

Concern Approach
Latency < 3s for simple queries (cached), < 8s for complex (simulation, multi-tool)
Availability 99.5% uptime target for Broker; 99.9% for Analytix API
Scaling Horizontal scaling of agent layer; tool layer scales independently
Cost Budget ₹2-5 per complex Broker query; ₹0.1-0.5 for cached/simple; ₹0.01-0.05 per Analytix API call
Security No PII in LLM prompts; encrypted at rest and in transit; audit logging
Observability Every tool call logged with latency, inputs, outputs; LLM usage tracked per query

8. Build sequence

Phase What Timeline
Phase 0 Tool layer stubs + canonical store integration + RAG pipeline Months 1-2
Phase 1 Broker agent v0 (single-turn, Pune residential, 3 tools) Month 3
Phase 2 Broker agent v1 (multi-turn, all tools, compliance filter) Month 4-5
Phase 3 Analytix query agent + dashboard API Month 4-6
Phase 4 Fractional summary agent Month 5-6
Phase 5 Bilingual, voice, WhatsApp Month 7+

See also: - agent-design.md — deep-dive on the Honest Broker Agent - evaluation-framework.md — how we measure honesty + accuracy - ../20-data/pipeline-spec-for-vishal.md — data layer contract - ../00-soul/SOUL.md — the identity the AI embodies