Source Field Mapping — What We Have vs What We Need¶
Analysis of actual data available from MahaRERA IT APIs and IGR Free Search portal, mapped against our 140-attribute schema. Based on HAR captures from Marvel Zephyr, Kharadi (Survey 66/1).
Summary¶
| Source | Fields available | Maps to our schema | Extra fields (not in our schema) |
|---|---|---|---|
| MahaRERA IT APIs (46 endpoints) | ~120 fields | 52 attributes directly | ~68 fields unmapped (documented below) |
| IGR Free Search (search table + Index-II PDFs) | ~18 fields per deed | 14 attributes directly | ~4 fields unmapped |
| Combined coverage | 66 of 90 raw attributes (73%) | ||
| Derivable from these sources | ~30 additional derived | ||
| Remaining gaps | ~44 attributes need other sources |
MahaRERA → Our Schema (direct mapping)¶
| Our attribute ID | MahaRERA API endpoint | MahaRERA field | Confidence |
|---|---|---|---|
proj.maharera_number |
getProjectGeneralDetails | projectRegistartionNo |
0.99 |
proj.name |
getProjectGeneralDetails | projectName |
0.99 |
proj.type |
getProjectGeneralDetails | projectTypeName |
0.99 |
proj.promised_completion_dates |
getProjectGeneralDetails | projectProposeComplitionDate + getProjectPreviousExtensionDetails |
0.95 |
proj.actual_completion_date |
getProjectCurrentStatus | statusName (= "Completed") |
0.95 |
proj.towers_count |
getProjectGeneralPlanSummary | totalNoOfApprovedPlanProjectBuilding |
0.95 |
proj.building_or_wing_name |
getBuildingWingUnitSummary | buildingNameNumber |
0.95 |
loc.full_address |
getProjectLandAddressDetails | addressLine |
0.90 |
loc.pin_code |
getProjectLandAddressDetails | pinCode |
0.99 |
loc.survey_numbers |
getProjectLandHeaderDetails | finalPlotBearingNumber |
0.95 |
loc.boundaries |
getProjectLandAddressDetails | boundariesEast/West/North/South |
0.85 |
loc.lat_lng |
getProjectLegalGeoTaggingDetail | latitude, longitude |
0.95 |
loc.taluka |
getProjectLandAddressDetails | talukaId (needs MDM lookup) |
0.90 |
loc.district |
getProjectLandAddressDetails | districtId (needs MDM lookup) |
0.90 |
area.total_land_sqm |
getProjectLandHeaderDetails | landAreaSqmts |
0.95 |
area.permissible_builtup_sqm |
getProjectLandHeaderDetails | totalPermissiblePlotFsi |
0.90 |
area.sanctioned_builtup_sqm |
getProjectLandHeaderDetails | projectProposedNotSanctionedBuildUpArea |
0.90 |
area.fsi_consumed |
getBuildingWingUnitSummary | fsiWithSanctionS... |
0.85 |
area.open_space_sqm |
getProjectLandHeaderDetails | aggregateArea |
0.85 |
party.promoter_name |
fetchPromoterGeneralDetails | organizationName |
0.99 |
party.promoter_pan |
fetchPromoterGeneralDetails | panNumber (encrypted) |
0.80 |
party.land_owners |
getProjectLandOwnerDetails | landOwner + landOwnerType |
0.85 |
party.architects |
getProjectProfessionalByType | firstName (typeId=69) |
0.90 |
legal.title_search_report_url |
getMigratedDocuments | filter documentName contains "title" |
0.90 |
legal.encumbrances |
fetchProjectEncumbranceDeclaration | responseObject |
0.85 |
legal.litigation_records |
getProjectLitigationDetails | projectLitigationDtlsResponse |
0.90 |
legal.complaints_maharera |
getComplaintDetailsByProjectId | complaintDetails[] |
0.95 |
appr.commencement_certificate |
getProjectLandCCDetailsResponse | ccDocumentFileName, ccIssuedDate |
0.95 |
appr.building_plan_approval |
getMigratedDocuments / getUploadedDocuments | filter by doc type | 0.85 |
unit.total_count |
getBuildingWingUnitSummary | totalUnitCount |
0.90 |
unit.type_breakdown |
getBuildingWingUnitSummary | residentialUnitCount, nonResidentialUnitCount |
0.85 |
fin.escrow_account_details |
getProjectPromoterBankDetails | bankName, branchName, bankAddress |
0.90 |
fin.loan_disclosure |
(available in extended Form B API) | 0.80 |
Total: 32 raw attributes directly mapped from MahaRERA APIs.
IGR → Our Schema (direct mapping)¶
Fields from Index-II PDFs and search table:
| Our attribute ID | IGR source field | IGR field (Marathi→English) | Confidence |
|---|---|---|---|
fin.sale_price |
Index-II field (2) | मोबदला / Consideration | 0.95 |
fin.market_value_asr |
Index-II field (3) | बाजारभाव / Market Value (ASR) | 0.95 |
fin.sale_date |
Index-II field (9)/(10) | Execution date / Registration date | 0.99 |
fin.sale_parties (seller) |
Index-II field (7) | Executant names + PAN + address | 0.90 |
fin.sale_parties (buyer) |
Index-II field (8) | Claimant names + PAN + address | 0.90 |
fin.stamp_duty_paid |
Index-II field (12) | मुद्रांक शुल्क / Stamp Duty | 0.99 |
fin.registration_fee_paid |
Index-II field (13) | नोंदणी शुल्क / Registration Fee | 0.99 |
loc.survey_numbers |
Index-II field (4) | Survey Number within property desc | 0.90 |
area.carpet_sqft |
Index-II field (4)/(5) | Area of Constructed Property + क्षेत्रफळ | 0.85 |
lease.rent_amount_monthly |
L&L Index-II | Rent amount field | 0.90 |
lease.tenure_months |
L&L Index-II | Tenure field | 0.90 |
lease.tenant_name |
L&L Index-II | Claimant names | 0.90 |
IGR search table adds: DocNo, DName (deed type), RDate, SROName, SROCode, Status.
Total: 12 raw attributes from IGR, plus 6 search-table metadata fields.
IGR deed types found (Kharadi Survey 66/1, 2021+) — all 11 PDFs + 3 search images¶
| Deed type | Marathi | Count | Field (2) = | Key data for us |
|---|---|---|---|---|
| Sale Deed | सेल डीड | 1 | Sale price | Transaction price, carpet area, flat/floor/wing, parties+PAN |
| Agreement to Sale | अँग्रीमेंट टू सेल | 2 | Agreed price | Pre-registration price signal, same fields as sale |
| Assignment Deed | असाईनमेंट डीड | 2 | Transfer price | Built-up AND carpet area, terrace, parking slot IDs, prior deed reference |
| Transfer Deed | ट्रान्सफर डीड | 1 | Transfer price | ASR zone division number + rate per sqm (विभाग क्र. 55/669 दर ₹64990/-) |
| Leave & License | 36-अ-लिव्ह अॅड लायसन्सेस | 6 | Rent for period | Rental yield signal, tenant origin, area in sqft or sqm |
| Mortgage Deed | मॉरगेज डीड | 1 | Loan amount (₹40Cr) | Land-level encumbrance, multiple survey areas, developer mortgage |
| Confirmation Deed | कन्फर्मेशन डीड | 1 | ₹0 | Prior development agreement reference, landowner chain (7 parties) |
| Deposit of Title Deeds | (English in search table) | 1 | Loan amount (₹1.275Cr) | Bank name + MICR, flat-level mortgage, borrower PAN |
Pipeline implication: Each deed type needs a dedicated extraction template because field (2) and field (4) mean different things per type. The deed type in field (1) is the routing key.
Derived attributes achievable from MahaRERA + IGR alone¶
| Our attribute ID | Derivable? | Inputs from these sources | Notes |
|---|---|---|---|
fin.asr_gap_pct |
Yes | IGR: sale_price, market_value_asr | Direct computation |
fin.price_per_sqft_carpet |
Yes | IGR: sale_price; Index-II: area | Need area extraction from Marathi |
fin.cost_overrun_pct |
Partially | RERA: cost history (need Form B versioning) | Only if multiple versions captured |
mkt.transaction_velocity_90d |
Yes | IGR: count of deeds per micromarket | Aggregate |
mkt.median_price_per_sqft_90d |
Yes | IGR: sale_price + area, by micromarket | Aggregate |
dev.trust_score |
Partially | RERA: complaints, completion dates, extensions | Missing: credit, MCA data |
legal.title_clarity_score |
Partially | RERA: encumbrance, litigation, title docs | Missing: CERSAI, chain gaps |
unit.unsold_count |
Yes | RERA: total - sold | |
mkt.absorption_rate |
Yes | RERA: sold/launched + IGR velocity | |
ai.comparable_set |
Yes | IGR: similar transactions by area/price | |
ai.hidden_costs_breakdown |
Yes | IGR: stamp_duty, registration_fee + rules | Deterministic calculator |
loc.micromarket_id |
Yes | RERA: lat/lng + village/taluka | Clustering |
loc.infra_proximity_score |
Partially | RERA: lat/lng (have coords) | Need GIS layers for metro/SEZ |
Attributes NOT available from MahaRERA or IGR (need other sources)¶
| Category | Attributes | Required source |
|---|---|---|
| Tenant & Lease (enriched) | lease.escalation_pct, lease.lock_in_months, lease.tenant_industry, lease.tenant_credit_score, lease.tenant_anchor_quality_score |
L&L deed text extraction + MCA + CIBIL |
| Market Signals (complex) | mkt.sector_momentum_pct, mkt.cap_rate_median, mkt.yield_benchmark, mkt.micromarket_lifecycle_stage, mkt.days_on_market_median |
IGR aggregation at scale + listing portals |
| Policy & Infra | policy.*, infra.* (all 8) |
GR portals + GIS |
| Risk (environmental) | risk.flood_zone_flag, risk.forest_zone_flag, risk.crz_flag, risk.aqi_annual_avg, risk.heat_island_index, risk.rainfall_trend |
MRSAC + CPCB + IMD |
| Macro | All 6 (macro.*) |
RBI, MoSPI, NASSCOM |
| Investor Persona | All 8 (persona.*) |
User self-report |
| Fractional-specific | All 7 (frac.*) |
Internal DB |
| AI narratives | All 12 (ai.*) |
Computed from above |
Key finding: IGR Index-II is richer than the search table¶
The search table has 10 columns (DocNo, DName, RDate, SROName, Seller, Purchaser, PropertyDesc, SROCode, Status, IndexII-link).
The Index-II PDF adds: - Consideration amount (exact) - Market value / ASR value - Property type (Flat/Plot/etc.) - Exact area (sq meters) - Survey number (structured) - PAN of both parties - Full address of both parties - Stamp duty amount - Registration fee amount - Execution date (vs registration date) - Loan amount (for mortgage deeds) - Mortgagee bank name (for mortgage deeds)
Pipeline implication: always download the Index-II PDF, not just scrape the search table.
Entity resolution confidence (MahaRERA ↔ IGR)¶
For this sample (Marvel Zephyr, Kharadi, S.No. 66/1):
| Signal | Available? | Match quality |
|---|---|---|
| Survey number | Both have "66/1" | Exact |
| Village name | Both have "Kharadi" | Exact |
| Promoter name / seller | RERA: "MARVEL LANDMARKS PVT LTD"; IGR: in seller names | High (name matching) |
| Building name | RERA: "MARVEL ZEPHYR BUILDING U"; IGR: "माव्हल झेफर" in property desc | High (transliteration) |
| Lat/lng vs address | RERA has coords; IGR has only text address | One-directional |
Match confidence for this project: ~0.95. Survey number + village is a strong anchor.
See also: source-raw-fields.md for complete field inventory from both sources.