Streetscape Analysis with AI | SAGAI by Urban Geo Analytics

What is SAGAI?

SAGAI is an open-source, zero-shot workflow designed to score and map street-level urban environments using vision-language models. With no need for labeled datasets or deep learning expertise, SAGAI lets users analyze streetscapes through natural language prompts using nothing more than a geographic bounding box.

SAGAI v2.0 runs as a single Google Colab notebook that handles the entire pipeline — from street sampling to image retrieval, VLM-based scoring, and thematic mapping. It integrates UVLM (Universal Vision-Language Model Loader) directly as its inference engine, giving users access to 11 VLM checkpoints across two model families — LLaVA-NeXT and Qwen2.5-VL — from 3B to 110B parameters, all through a unified interface.

You can:

Classify environments (e.g., urban vs. rural)
Detect features (e.g., storefronts, sidewalks)
Estimate physical attributes (e.g., sidewalk width)
Benchmark multiple VLMs on the same task with identical prompts
Design any custom visual analysis task by adapting the prompt

All you need is a location—no pretraining, no fine-tuning, and no annotation required.

How It Works

SAGAI v2.0 runs as a single Google Colab notebook. The pipeline executes six stages sequentially:

1. Street Sampling Using OpenStreetMap, SAGAI automatically extracts the pedestrian street network within a user-defined bounding box and generates evenly spaced sampling points along it.

2. Image Retrieval Google Street View images are downloaded at each sampling point in multiple compass directions using the Street View Static API.

3-4-5. VLM Scoring (powered by UVLM) Each image is scored using a vision-language model selected by the user. UVLM abstracts away all architecture-specific inference details and provides a single unified function that works identically across model families. Users choose a model, set a quantization level, and define their analysis tasks through a prompt builder.

6. Aggregation and Mapping Scores are aggregated at point and street segment levels using GeoPandas and visualized as thematic maps ready for publication or further spatial analysis.

Supported Models

Through UVLM, SAGAI v2.0 supports 11 VLM checkpoints across two model families:

LLaVA-NeXT — Mistral 7B, Vicuna 7B, Vicuna 13B, 34B, LLaMA3 8B, 72B, 110B Qwen2.5-VL — 3B Instruct, 7B Instruct, 32B Instruct, 72B Instruct

Models up to 34B can run on a single Colab GPU (T4 or A100) with 4-bit quantization. Larger models require multi-GPU environments.

Scoring Tasks

SAGAI ships with predefined scoring tasks and supports four response types through UVLM’s prompt builder:

Numeric — integer or float extraction (e.g., sidewalk width in meters)
Category — classification labels (e.g., vegetation type)
Boolean — yes/no answers (e.g., storefront presence)
Text — free-form responses

All prompts are fully customizable. Any visual analysis task can be defined through prompt modification alone — building typology, cycling infrastructure, pedestrian comfort, accessibility, street furniture, and more.

Key Capabilities

Single notebook — the full pipeline from bounding box to thematic maps in one place
Multi-model inference — benchmark and compare VLMs on identical tasks using identical prompts
Consensus validation — majority voting across 2–5 repeated inferences per image for improved reliability
Chain-of-thought reasoning — adjustable token budget (up to 1,500 tokens) for complex visual reasoning tasks
Truncation detection — automatic flagging of responses that hit the generation limit, with per-task diagnostics
Resume-safe batch processing — checkpoint saving every 3 images, automatic resume on interruption
Quantization support — 4-bit, 8-bit, and FP16 precision via BitsAndBytes
Publication-ready outputs — CSV scores, GeoPackages, and thematic maps at point and street segment levels

Why SAGAI?

100% open-source (Apache 2.0)
Zero-shot — no training data, no fine-tuning, no annotation
Multi-model — 11 VLMs through UVLM, not locked to a single architecture
Runs entirely in Google Colab — no local installation, no GPU required
Prompt-driven — adapt to any urban analysis task by changing the prompt
Publication-ready — generates maps, scores, and spatial datasets out of the box

Get Started

GitHub Repository: github.com/perezjoan/SAGAI
UVLM Repository: github.com/perezjoan/UVLM
Open SAGAI.ipynb in Google Colab and follow the steps.

Citations & Publications

If you use SAGAI in your work, please cite:

Perez, J. & Fusco, G. (2025). Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes. Geomatica, Volume 77, Issue 2, 100063.
DOI: 10.1016/j.geomat.2025.100063

Read the publication

Assessing urban scenes for the 15-minute city through SAGAI (Streetscape Analysis with Generative AI) – Presentation – 24th European Colloquium on Theorietical and Quantitative Geography

PDF of the presentation