How We Built Sentinel

Sentinel maps 458,500+ surveillance cameras across 183 countries. Here’s the technical architecture behind it.

The data pipeline

Ingest: 12+ Python scripts fetch data from public APIs (Caltrans, TfL, DriveBC, 511 networks), OpenStreetMap Overpass API, and processed EFF Atlas data. Each script outputs normalized JSON.

Normalization: A central script merges all sources, deduplicates by proximity (50m threshold), assigns source priority, and outputs a unified GeoJSON with 10 camera types.

Tile generation: tippecanoe converts GeoJSON to PMTiles vector tiles. We use split generation (common types + rare types) with —no-feature-limit to preserve sparse features (drones, facial recognition, cell-site simulators) at all zoom levels.

The web stack

Hosting: Cloudflare Pages (Astro static site) + Cloudflare Workers (API)

Map rendering: MapLibre GL JS with CARTO Dark Matter basemap

Vector tiles: PMTiles loaded as an ArrayBuffer client-side (bypasses Cloudflare Workers’ Range header limitation). The entire tileset (15MB compressed) is pre-fetched on page load.

Filtering: Multi-toggle filter UI (checkbox-style, not radio) for 10 camera types

Search: Nominatim geocoding with importance-based zoom levels

Design decisions

PMTiles over R2: Loading the entire tileset into an ArrayBuffer client-side is unconventional but solves the Range header problem on Workers. For datasets under 50MB, it works well.

tippecanoe split generation: Running tippecanoe twice (common types with standard settings, rare types with —no-feature-limit) and merging ensures that sparse features like drones and facial recognition are visible at all zoom levels.

On-device processing: No server receives user location. The map runs in the browser with vector tiles. This is a deliberate architecture choice, not a limitation.

The mobile architecture

Sentinel Mobile (Expo/React Native) downloads camera data as a compact JSON file (9.9MB) and processes everything on-device. Location-based scanning uses a bounding box pre-filter followed by haversine distance calculation for efficiency.

Open data commitment

All camera data comes from public sources. The pipeline is designed to be reproducible: someone with the same scripts and API access can generate the same dataset.

See the result on Sentinel.