Trade Data Sources Map — Replicating ImportGenius
Trade Data Sources Map — Replicating ImportGenius
Canonical reference catalog of data sources that can replicate ImportGenius’s capabilities. Durable cross-project knowledge — applies to Frank’s Renso engagement, Renso Hub vision, and diligence work across companies.
See also [[renso-trade-intelligence-stack]] (where this gets applied), [[renso-engagement-frank-cto]] (Phase 3 deliverables this powers), [[Financial Data Sources]] (complementary macro/finance sources), [[Canadian Government Data Indexing]] (StatCan and Canadian API patterns).
(source: open-web research synthesis, 2026-05-14)
Section 1: The Core Insight — What ImportGenius Actually Is
ImportGenius is not a proprietary data collection. It is primarily:
- US CBP AMS Vessel Cargo Manifests — legally public per 19 CFR §103.31 — the spine of the product
- ~12 Latin American customs systems publishing full Bill of Lading (B/L) data at the shipment level
- Asian customs data sourced through customs broker networks (not government portals — these governments do not publish B/L data publicly)
- Entity resolution + HS classification + UI layered on top of raw manifest data
Replication is feasible because (1) and (2) are statutorily public. The hard part is not access — it is the entity resolution and consistent HS normalization that ImportGenius has invested in over years. The $0 coverage gap (Section 11) is real but bounded.
(source: open-web research synthesis, 2026-05-14)
Section 2: Primary Manifest Data — Pricing Tier Table
Commercial aggregators that sit on top of the primary manifest data:
| Source | Coverage | URL | Cost |
|---|---|---|---|
| ImportYeti | US AMS Jan 2015+ | importyeti.com (beta API: data.importyeti.com) | Free OSINT tier / ~$10/mo |
| OEC Bulk BoL | US AMS Jan 2021–Feb 2026 | oec.world/en/resources/bulk-download/bill-of-lading | Premium (gated) |
| ImportInfo | US AMS Oct 2012+ | importinfo.com | $59–$149/mo |
| ImportGenius | US + 21 countries | importgenius.com | $229–$1,999+/mo |
| Descartes Datamyne | 230 markets daily BOL | datamyne.com | Enterprise (contact) |
| Panjiva / PIERS (S&P Global) | US + 30+ countries; 16M+ records 2023 | spglobal.com | $500–$2000+/mo est. |
| Trademo | 2.3B+ shipments | trademo.com | $20K+ per purchase |
| Volza | 203 countries | volza.com | $3.50/record API |
| manifestDB | 65 countries (raw CSV) | manifestdb.com | Contact |
| ExportGenius (separate from ImportGenius) | India-focused | exportgenius.in | Custom |
| Seair Exim | 80+ countries, India-strong | seair.co.in | Subscription |
| Eximpedia | 130+ countries | eximpedia.app | Usage-based |
| Tendata | Global, APAC-strong | tendata.com | Subscription |
Decision rule: For the $0 architecture (Section 10), ImportYeti is the primary US AMS source. Peru SUNAT is the primary Latin America mirror source — already implemented in Frank’s Product_Intelligence_Engine.py.
(source: open-web research synthesis, 2026-05-14)
Section 3: Foreign Customs Systems by Country (B/L Level)
Open with Full Shipment Records (the Gold Tier)
These countries publish individual Bill of Lading records publicly or with minimal registration friction.
Peru — SUNAT/Aduanet at aduanet.gob.pe/cl-ad-itconsultadwh/ieITS01Alias
Full B/L: date, FOB value, weight, shipper, consignee, product description, HS code. Free. Already implemented in Frank’s Product_Intelligence_Engine.py. Moderate CAPTCHA barrier. Richest open Latin America source. Canonical mirror for Vietnam→Canada and China→Canada intelligence.
(source: /Users/franknguyen/renso/Product_Intelligence_Engine.py, open-web research synthesis, 2026-05-14)
Colombia — DIAN/MUISCA at dian.gov.co
Full B/L data. Free. CAPTCHA-gated.
(source: open-web research synthesis, 2026-05-14)
Ecuador — SENAE/ECUAPASS portal Full B/L data. Free. Login + NIT registration may be required. (source: open-web research synthesis, 2026-05-14)
Bolivia — Aduana Nacional Confirmed by ImportGenius coverage. Limited English documentation. (source: open-web research synthesis, 2026-05-14)
Costa Rica — DGA Full B/L confirmed. (source: open-web research synthesis, 2026-05-14)
Panama — ANA Full B/L; Colon Free Zone is a high-volume port (~242K import shipments). (source: open-web research synthesis, 2026-05-14)
Venezuela — SENIAT Listed by ImportGenius; coverage unclear given country operational instability. (source: open-web research synthesis, 2026-05-14)
Uruguay — DNA Confirmed by aggregators; third-party access only. (source: open-web research synthesis, 2026-05-14)
Paraguay — Partial Third-party access only. (source: open-web research synthesis, 2026-05-14)
Russia — FCS at eng.customs.gov.ru
Live B/L confirmed by ImportGenius (land/sea/air + prices). Sanctions complications make Western pipelines operationally risky.
(source: open-web research synthesis, 2026-05-14)
Ukraine — Customs Service Land/sea/air + prices per ImportGenius. (source: open-web research synthesis, 2026-05-14)
Turkey — GTIP codes via aggregators Source mechanism unclear. (source: open-web research synthesis, 2026-05-14)
Aggregate Only / No Public B/L (Use Mirror Data Instead)
These countries do not publish shipment-level customs records. Intelligence on them must be obtained through the mirror data technique (Section 9) or commercial subscriptions.
| Country | Portal / Authority | Status |
|---|---|---|
| Vietnam | GDVC — customs.gov.vn | Statistics only |
| China | GACC — english.customs.gov.cn | Monthly press briefings only |
| Pakistan | FBR/WeBOC | Being replaced; no public B/L |
| Bangladesh | NBR | Aggregates only |
| Sri Lanka | Customs | ”Trade info not released to third parties” — official statement |
| Philippines | Bureau of Customs | No public B/L |
| Indonesia | DJBC | Third-party claims; sources opaque |
| Thailand | Customs Department | Aggregates only |
| India | DGCIS FTDDP | Aggregates only; row-level via broker networks |
| Mexico | ANAM/SAT | Pedimento-level not public; only aggregate statistics; third parties source from customs brokers |
| Chile | SNA | Government gives aggregates; B/L via commercial only |
| Argentina | ARCA (formerly AFIP/DGA) | Partial; regime in flux — Decree 953/2024 replaced AFIP with ARCA |
| Brazil | Comex Stat — comexstat.mdic.gov.br | FREE API but aggregate HS-8 monthly only (not individual B/L); R package comexr; monthly since 1997; CIF/freight included |
Note on Brazil: Comex Stat is still valuable as a gap-fill for Brazil aggregate flows even though it lacks B/L-level data. Use as sanity-check alongside individual country B/L sources.
(source: open-web research synthesis, 2026-05-14)
Section 4: Vessel & Container Tracking
Free Tier
| Source | Notes | URL |
|---|---|---|
| aisstream.io | Free WebSocket; ~200km coastal; ~300 msg/sec worldwide | aisstream.io |
| AISHub | Free with AIS feed contribution (reciprocal); coastal | aishub.net |
| Maersk Developer | Free; Track and Trace Plus REST endpoint | developer.maersk.com |
| Hapag-Lloyd API | Free with developer registration | api-portal.hlag.com |
| CMA CGM API | Free with developer registration | api-portal.cma-cgm.com |
| Track-Trace.com | Free basic; 50+ carrier lookups | track-trace.com |
| EQUASIS | Free with registration; Port State Control records (not live tracking) | equasis.org |
Key limit: aisstream.io is coastal only (~200km). Real-time satellite AIS (Spire, Kpler) is enterprise tier.
Paid Tier
| Source | Notes |
|---|---|
| MarineTraffic / Kpler | Enterprise; removed credit system Jan 2025 |
| VesselFinder | €330 / 10K credits; archive since 2009 |
| Spire Maritime / Pole Star / exactEarth | Enterprise, satellite-class — global coverage |
| FleetMon | Subscription; acquired March 2023 |
| Shipsgo | Credit-based; 160+ carriers; 1 credit per upload then unlimited subsequent queries |
(source: open-web research synthesis, 2026-05-14)
Section 5: Customs Aggregates (Free APIs)
These are country/HS-level aggregate flows — not shipment B/L records. Use for volume baseline and sanity-checking micro sources. See also [[renso-trade-intelligence-stack]] Layer 1 (Macro).
| Source | What | Access | Rate Limit |
|---|---|---|---|
| UN Comtrade Plus | HS-6 bilateral, 200+ countries | comtradedeveloper.un.org + Python lib comtradeapicall | 500 calls/day, 100K records/call |
| US Census International Trade | US HS-level monthly | census.gov/data/developers | No key needed as of March 2025 |
| Eurostat Comext | EU bilateral HS-8 | ec.europa.eu/eurostat — SDMX 2.1+3.0 | None stated; CSV bulk back to 1988 |
| Statistics Canada WDS | Canada bilateral monthly | statcan.gc.ca — Table 12-10-0011-01 (monthly), 12-10-0143 (BEC5) | None |
| Brazil Comex Stat | Brazil bilateral HS-8 monthly since 1997 | comexstat.mdic.gov.br; R package comexr | None stated |
| USITC DataWeb | US trade + HTS classification | dataweb.usitc.gov (account required) | None stated |
| World Bank WITS / TRAINS | Trade + tariffs | wits.worldbank.org/API/V1; Python: world_trade_data | No full-DB queries |
| WTO Statistics | Policy + tariffs | stats.wto.org | Yes |
| IMF DOTS | Direction of Trade | imf.org/en/Data | Yes |
| OECD Trade Stats | OECD countries | stats.oecd.org — SDMX | Yes |
| Vietnam GSO | Vietnam aggregates | gso.gov.vn | No formal API |
| China GACC | Monthly press briefings | english.customs.gov.cn | No API |
| India DGCIS FTDDP | Annual statistical publications | ftddp.dgciskol.gov.in | No API |
(source: open-web research synthesis, 2026-05-14; source: /Users/franknguyen/renso/Master_Trade_Data_Directory.md)
Section 6: Entity Resolution & Enrichment
Entity resolution is the hard part of replicating ImportGenius — normalizing scraped shipper/consignee strings to canonical legal entities.
| Source | Coverage | Cost |
|---|---|---|
| GLEIF Golden Copy | 4M+ LEIs in 200+ jurisdictions; daily delta | Free bulk download |
| OpenCorporates | 200M companies, 140 jurisdictions; OpenRefine reconciliation API | Free non-commercial; £2,250–£12,000+/yr commercial |
| MRAS / Canadian Business Registries | All provinces, beneficial ownership since Jan 2024 | Free |
| OrgBook BC | BC verifiable credentials | Free API (already used in Frank’s stack) |
| Companies House UK | Full REST API; no key required | Free |
| OFAC SDN + EU + UN sanctions | Bulk XML/CSV; fuzzy search UI | Free |
| USPTO PatentsView | Patent assignments + inventors | Free API at patentsview.org/api |
| FCC ID database | Electronics certifications | Free at fccid.io / fcc.report (already in PIE.py) |
| Wikidata SPARQL | Aliases, identifiers | Free at query.wikidata.org |
| D&B Direct / LSEG World-Check | Enterprise tier | Paid |
| Crunchbase / Dealroom | Startup/VC-backed entities | Free limited / paid |
Recommended resolution pipeline: GLEIF Golden Copy → RapidFuzz or Elasticsearch local index → OpenCorporates reconciliation API → MRAS/OrgBook for Canadian entities → OFAC/EU/UN sanctions screening on every entity resolved.
(source: open-web research synthesis, 2026-05-14; source: /Users/franknguyen/renso/Global_Trade_Aggregator_V1_Staff_Engineered.md)
Section 7: Classification (Free-Text → HS Code)
Matching scraped product descriptions to HS codes is the classification layer. The hierarchy is HS-2 (chapter) → HS-4 (heading) → HS-6 (subheading, internationally standard) → HS-8/10 (national tariff lines).
| Source | Notes | URL |
|---|---|---|
| USITC HTS | Full text + CSV/JSON/Excel | hts.usitc.gov; data.gov dataset |
| WCO Trade Tools | HS 2022 legal text free; explanatory notes subscription | wcotradetools.org |
GitHub: datasets/harmonized-system | Community datapackage | github.com/datasets/harmonized-system |
GitHub: warrantgroup/WCO-HS-Codes | Raw CSV | github.com/warrantgroup/WCO-HS-Codes |
| WTO HS Tracker | Track HS code changes across versions | hstracker.wto.org |
| CBSA D-series memoranda | Canadian classification guidance | cbsa-asfc.gc.ca |
| LLM batch classifier | Claude Sonnet $3/MTok input; 85–92% accuracy at HS-4 with good prompting | — |
Practical note: For the $0 tier, the USITC HTS CSV as a lookup table plus a Claude Sonnet batch classifier for free-text descriptions is sufficient for HS-4 accuracy. HS-6 precision requires the WCO legal text.
(source: open-web research synthesis, 2026-05-14)
Section 8: Adjacent Intelligence
Sources adjacent to manifest/customs data that enrich trade intelligence:
| Source | What | URL |
|---|---|---|
| FDA facility registration | Search by country/product/firm | accessdata.fda.gov |
| USDA APHIS/FSIS | Foreign establishment lists | fsis.usda.gov |
| CFIA licensed establishments | Canadian food inspection | inspection.canada.ca |
| China GACC Decree 248 / CIFER | China-approved foreign food exporters; USDA FAS publishes updates | fas.usda.gov |
| EU CBAM | Carbon Border Adjustment declarations (live since Oct 2023) | ec.europa.eu/taxation_customs/cbam |
| gCaptain / MarineLink / Maritime Executive | Free trade press | gcaptain.com / marinelink.com |
| EventRegistry | 200 articles/day free | eventregistry.org |
| Google News RSS | Search-based news feed | news.google.com/rss/search?q= |
(source: open-web research synthesis, 2026-05-14)
Section 9: The Mirror Data Technique (Frank’s Core Use Case)
When both origin AND destination countries have closed customs systems, query a third country whose customs is open and which appears on either end of the trade relationship.
The canonical problem: Vietnam → Canada. Canada suppresses vessel manifest data. Vietnam is aggregate-only. Both ends are closed.
For Vietnam (Origin-Closed)
Vietnam → US (via US AMS / ImportYeti) — PRIMARY MIRROR. US is Vietnam’s #1–2 export destination. Virtually any Vietnamese factory shipping to Canada also ships to the US. HIGH signal quality. This is the first query in any Vietnam factory fingerprinting workflow.
Vietnam → Peru (Peru SUNAT — Frank’s PIE.py) — SECONDARY MIRROR. Smaller trade lane; useful for food/consumer goods. Medium signal. Already implemented.
Vietnam → Colombia / Ecuador / Bolivia — TERTIARY MIRRORS. SUNAT-family open systems. Medium signal. Useful for cross-validation.
For China (Origin-Closed)
China → US (US AMS) — PRIMARY MIRROR. Highest-volume lane in AMS database. HIGH signal quality.
China → Peru / Colombia — SECONDARY MIRRORS. Good for food/consumer goods cross-validation. Medium signal.
Recipe: Fingerprinting a Vietnamese Factory for Its Canadian Customer Base
- Query ImportYeti (US AMS) for factory name OR HS code + origin=Vietnam → get US consignees
- Cross-reference Peru SUNAT for the same shipper name → get Peruvian consignees
- Union the consignee set → factory’s global customer base (US + Latin America proxy)
- Resolve all names via GLEIF + OpenCorporates + OrgBook → canonical legal entities
- Canadian customers typically cluster near US customers → cross-reference MRAS / Canadian business registries / trade press to identify Canadian counterparts
(source: open-web research synthesis, 2026-05-14; source: /Users/franknguyen/renso/Product_Intelligence_Engine.py)
Section 10: The $0/Month Replication Architecture
Target: ~80% ImportGenius coverage on Canada–Vietnam, Canada–China, and Latin America trade lanes.
Ingestion Layer:
- ImportYeti scraper (US AMS since 2015; free OSINT tier)
- Peru SUNAT scraper (already in PIE.py — aduanet.gob.pe)
- Colombia DIAN scraper (CAPTCHA bypass needed)
- Brazil Comex Stat API (aggregate gap-fill — comexstat.mdic.gov.br)
- UN Comtrade, US Census, StatCan, Eurostat APIs (macro sanity-check)
- aisstream.io WebSocket (AIS vessel positions — coastal)
- Maersk / Hapag-Lloyd / CMA CGM developer APIs (container milestones — free)
- FCC ID scraper (fccid.io — already in PIE.py)
Resolution Layer:
- GLEIF Golden Copy bulk → indexed in RapidFuzz or Elasticsearch
- OpenCorporates API (non-commercial tier)
- OrgBook BC + MRAS for Canadian entities
- OFAC + EU + UN sanctions screening on every resolved entity
Classification Layer:
- USITC HTS CSV lookup table (hts.usitc.gov)
- Claude Sonnet batch classifier for free-text → HS-6 → HS-10
Storage Layer:
- Postgres `shipments` canonical table
- Postgres `entities` LEI-keyed table
- S3/R2 for raw scrape archives
Query Layer:
- SQL + Metabase/Superset dashboards
- Optional: Typesense or Meilisearch for full-text search
(source: open-web research synthesis, 2026-05-14)
Section 11: Coverage Gaps and Limits (Honest)
The $0 architecture MISSES:
- Vietnam, China, Pakistan, Bangladesh, Indonesia, Thailand, Philippines — no shipment-level data without commercial subscriptions. These markets are sourced through customs broker networks, not government portals. Only Panjiva, Volza, Trademo, Datamyne, Tendata reach there.
- Real-time satellite AIS — aisstream.io is coastal only (~200km from shore). Spire Maritime / Kpler / exactEarth is enterprise tier. Vessels in open ocean are invisible.
- Historical data before 2015 — ImportYeti US AMS starts Jan 2015. OEC BoL starts Jan 2021. Earlier US manifest history requires ImportInfo ($59+/mo, back to Oct 2012) or PIERS.
- AI-powered entity resolution at scale — ImportGenius has invested years in their entity graph. The GLEIF + OpenCorporates + RapidFuzz approach works but requires operational tuning and will have higher false-positive rates on ambiguous company names.
(source: open-web research synthesis, 2026-05-14)
Section 12: Open-Source Prior Art
| Repository | What |
|---|---|
frnsys/trade | Python AMS manifest analysis |
marcdacosta/ambient-shipping | AIS + BoL correlation utilities |
Both are useful starting points for the ingestion layer. Check for upstream changes before building custom scrapers.
(source: open-web research synthesis, 2026-05-14)
Related Pages
- [[renso-trade-intelligence-stack]] — where this catalog gets applied: Frank’s Product Intelligence Engine, triangulation architecture, Peru SUNAT implementation, Meta Ad Library tooling
- [[renso-engagement-frank-cto]] — Phase 3 AI/data intelligence deliverables that this powers
- [[renso-strategic-vision]] — Renso Hub data-subscription product that these pipelines would feed
- [[Financial Data Sources]] — macro financial data sources (FRED, EDGAR, Bloomberg) in complementary context
- [[Canadian Government Data Indexing]] — StatCan API patterns and Canadian government data access