How We Built Sentinel
Sentinel maps 458,500+ surveillance cameras across 183 countries. Here’s the technical architecture behind it.
The data pipeline
Ingest: 12+ Python scripts fetch data from public APIs (Caltrans, TfL, DriveBC, 511 networks), OpenStreetMap Overpass API, and processed EFF Atlas data. Each script outputs normalized JSON.
Normalization: A central script merges all sources, deduplicates by proximity (50m threshold), assigns source priority, and outputs a unified GeoJSON with 10 camera types.
Tile generation: tippecanoe converts GeoJSON to PMTiles vector tiles. We use split generation (common types + rare types) with —no-feature-limit to preserve sparse features (drones, facial recognition, cell-site simulators) at all zoom levels.
The web stack
Hosting: Cloudflare Pages (Astro static site) + Cloudflare Workers (API)
Map rendering: MapLibre GL JS with CARTO Dark Matter basemap
Vector tiles: PMTiles loaded as an ArrayBuffer client-side (bypasses Cloudflare Workers’ Range header limitation). The entire tileset (15MB compressed) is pre-fetched on page load.
Filtering: Multi-toggle filter UI (checkbox-style, not radio) for 10 camera types
Search: Nominatim geocoding with importance-based zoom levels
Design decisions
PMTiles over R2: Loading the entire tileset into an ArrayBuffer client-side is unconventional but solves the Range header problem on Workers. For datasets under 50MB, it works well.
tippecanoe split generation: Running tippecanoe twice (common types with standard settings, rare types with —no-feature-limit) and merging ensures that sparse features like drones and facial recognition are visible at all zoom levels.
On-device processing: No server receives user location. The map runs in the browser with vector tiles. This is a deliberate architecture choice, not a limitation.
The mobile architecture
Sentinel Mobile (Expo/React Native) downloads camera data as a compact JSON file (9.9MB) and processes everything on-device. Location-based scanning uses a bounding box pre-filter followed by haversine distance calculation for efficiency.
Open data commitment
All camera data comes from public sources. The pipeline is designed to be reproducible: someone with the same scripts and API access can generate the same dataset.