
What is SAGAI?
SAGAI is an open-source, zero-shot workflow designed to score and map street-level urban environments using vision-language models. With no need for labeled datasets or deep learning expertise, SAGAI lets users analyze streetscapes through natural language prompts using nothing more than a geographic bounding box.
SAGAI v2.0 runs as a single Google Colab notebook that handles the entire pipeline — from street sampling to image retrieval, VLM-based scoring, and thematic mapping. It integrates UVLM (Universal Vision-Language Model Loader) directly as its inference engine, giving users access to 11 VLM checkpoints across two model families — LLaVA-NeXT and Qwen2.5-VL — from 3B to 110B parameters, all through a unified interface.
You can:
- Classify environments (e.g., urban vs. rural)
- Detect features (e.g., storefronts, sidewalks)
- Estimate physical attributes (e.g., sidewalk width)
- Benchmark multiple VLMs on the same task with identical prompts
- Design any custom visual analysis task by adapting the prompt
All you need is a location—no pretraining, no fine-tuning, and no annotation required.
How It Works
SAGAI v2.0 runs as a single Google Colab notebook. The pipeline executes six stages sequentially:

1. Street Sampling Using OpenStreetMap, SAGAI automatically extracts the pedestrian street network within a user-defined bounding box and generates evenly spaced sampling points along it.
2. Image Retrieval Google Street View images are downloaded at each sampling point in multiple compass directions using the Street View Static API.
3-4-5. VLM Scoring (powered by UVLM) Each image is scored using a vision-language model selected by the user. UVLM abstracts away all architecture-specific inference details and provides a single unified function that works identically across model families. Users choose a model, set a quantization level, and define their analysis tasks through a prompt builder.
6. Aggregation and Mapping Scores are aggregated at point and street segment levels using GeoPandas and visualized as thematic maps ready for publication or further spatial analysis.

Supported Models
Through UVLM, SAGAI v2.0 supports 11 VLM checkpoints across two model families:
LLaVA-NeXT — Mistral 7B, Vicuna 7B, Vicuna 13B, 34B, LLaMA3 8B, 72B, 110B Qwen2.5-VL — 3B Instruct, 7B Instruct, 32B Instruct, 72B Instruct
Models up to 34B can run on a single Colab GPU (T4 or A100) with 4-bit quantization. Larger models require multi-GPU environments.
Scoring Tasks
SAGAI ships with predefined scoring tasks and supports four response types through UVLM’s prompt builder:
- Numeric — integer or float extraction (e.g., sidewalk width in meters)
- Category — classification labels (e.g., vegetation type)
- Boolean — yes/no answers (e.g., storefront presence)
- Text — free-form responses
All prompts are fully customizable. Any visual analysis task can be defined through prompt modification alone — building typology, cycling infrastructure, pedestrian comfort, accessibility, street furniture, and more.
Key Capabilities
- Single notebook — the full pipeline from bounding box to thematic maps in one place
- Multi-model inference — benchmark and compare VLMs on identical tasks using identical prompts
- Consensus validation — majority voting across 2–5 repeated inferences per image for improved reliability
- Chain-of-thought reasoning — adjustable token budget (up to 1,500 tokens) for complex visual reasoning tasks
- Truncation detection — automatic flagging of responses that hit the generation limit, with per-task diagnostics
- Resume-safe batch processing — checkpoint saving every 3 images, automatic resume on interruption
- Quantization support — 4-bit, 8-bit, and FP16 precision via BitsAndBytes
- Publication-ready outputs — CSV scores, GeoPackages, and thematic maps at point and street segment levels
Why SAGAI?
- 100% open-source (Apache 2.0)
- Zero-shot — no training data, no fine-tuning, no annotation
- Multi-model — 11 VLMs through UVLM, not locked to a single architecture
- Runs entirely in Google Colab — no local installation, no GPU required
- Prompt-driven — adapt to any urban analysis task by changing the prompt
- Publication-ready — generates maps, scores, and spatial datasets out of the box
Get Started
GitHub Repository: github.com/perezjoan/SAGAI
UVLM Repository: github.com/perezjoan/UVLM
Open SAGAI.ipynb in Google Colab and follow the steps.
Citations & Publications
If you use SAGAI in your work, please cite:
Perez, J. & Fusco, G. (2025). Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes. Geomatica, Volume 77, Issue 2, 100063.
DOI: 10.1016/j.geomat.2025.100063
Assessing urban scenes for the 15-minute city through SAGAI (Streetscape Analysis with Generative AI) – Presentation – 24th European Colloquium on Theorietical and Quantitative Geography

