Google Colab Archives - Urban Geo Analytics

UVLM v3.0.0: From Colab Notebook to Python Package — Run Vision-Language Models Anywhere

Joan Perez — Thu, 23 Apr 2026 07:25:41 +0000

Highlights

UVLM is now a pip-installable Python package — no longer tied to Google Colab
Run on your own GPU with a local Jupyter notebook, or keep using Colab for free
Same tool, more flexibility — three lines of Python to load a model and analyse images

When we released UVLM in March 2026, it was a Google Colab notebook. You opened it in your browser, picked a model, typed your prompts, and ran your images — all without installing anything. That simplicity was the point: a tool that anyone could use to load and compare Vision-Language Models, regardless of their technical setup.

But we kept hearing the same requests. Can I run this on my own machine? Can I call UVLM from a script? Can I integrate it into an existing pipeline? The answer was always the same: not easily. The entire tool lived inside a single notebook, with all the logic packed into three massive code cells. Moving it anywhere else meant copy-pasting thousands of lines and untangling global variables.

Version 3.0.0 changes that. UVLM is now a proper Python package.

What Changed

The core logic — model loading, dual-backend inference, response parsing, consensus validation, batch processing — has been extracted from the notebook into eight standalone Python modules. These modules have no dependency on Google Colab, no global variables, and no widget code. They are plain Python functions that accept arguments and return results.

The package is installed from GitHub in one line:

pip install git+https://github.com/perezjoan/UVLM.git

On Google Colab, this happens automatically in the first cell of the Colab notebook. On your local machine, you run it once in a terminal and you are done.

Nothing changed in how UVLM analyses images. The same 11 model checkpoints are supported (LLaVA-NeXT and Qwen2.5-VL, from 3B to 110B parameters). The same parsing logic, the same consensus validation, the same truncation detection. If you had a workflow built on v2.2.2, the outputs will be identical.

Three Ways to Use UVLM

Google Colab — Zero Install

This is the same experience as before. Open the Colab notebook, select a GPU runtime, and start working. The notebook installs the UVLM package automatically. Images are loaded from Google Drive. Nothing has changed for Colab users, except that the code running behind the widgets is now cleaner and easier to maintain.

Local Jupyter Notebook — Your GPU, Your Data

If you have an NVIDIA GPU on your workstation (or access to a GPU server), you can now run UVLM locally. The local Jupyter notebook provides the same widget-based interface — model selection dropdown, prompt builder form, batch execution button — but images are read from your local filesystem and results are saved locally. No Google account needed, no data leaves your machine.

This matters for researchers working with sensitive imagery (medical, security, proprietary datasets) or for anyone who wants faster and more reliable model loading than what Colab’s network provides.

Python Script — Full Programmatic Control

For integration into larger pipelines, UVLM now exposes a clean API. Three lines of code replace the entire notebook workflow:

from uvlm import load_model, run_inference, parse_response
ctx = load_model("[Qwen] Qwen2.5-VL 7B Instruct", precision="4bit")
raw, tokens = run_inference("photo.jpg", "Count the cars", ctx)
result = parse_response(raw, "numeric")

The `load_model()` function returns a context dictionary containing the model, processor, backend type, and device information. This dictionary is passed to every subsequent function — no global state, no hidden side effects. You can load multiple models in the same session and switch between them by passing different context objects.

For batch processing, `run_batch()` handles the full pipeline:

from uvlm import load_model
from uvlm.batch import run_batch

ctx = load_model("[Qwen]  Qwen2.5-VL 7B Instruct", precision="4bit")
df = run_batch(
    model_ctx=ctx,
    task_specs=my_tasks,
    image_folder="./images",
    output_path="./results.csv",
)

Under the Hood: Package Structure

The monolithic notebook has been split into eight modules, each with a single responsibility:

registry.py holds the model dictionary — 11 checkpoints with their backend type and HuggingFace checkpoint ID. Adding a new model is one line in a dictionary.

loader.py contains the `load_model()` function. It handles quantisation configuration (4-bit, 8-bit, FP16), device placement (single GPU, auto, CPU offload), and the LLaVA vs Qwen branching logic. It returns a dictionary — not a set of global variables.

inference.py contains `run_inference()`, the dual-backend forward pass. It accepts a model context dictionary and returns the raw response plus the exact token count as a tuple. The full LLaVA response cleaning logic and the full Qwen token-trimming pipeline are preserved exactly as they were.

parsers.py holds the four response parsers (numeric, category, boolean, text) and the advanced reasoning parser. These are pure functions with zero dependencies beyond Python’s standard library.

consensus.py contains the majority voting logic. batch.py handles folder iteration, CSV writing, resume mode, and schema upgrading. prompts.py stores the task type definitions and the chain-of-thought templates. utils.py provides seed management, environment detection, and HuggingFace token retrieval.

Getting Started

On Colab: Open the notebook from GitHub and run the three blocks as before. The package installs itself.

Locally: First, install PyTorch with CUDA support matching your GPU driver (check with `nvidia-smi`). For example, with CUDA 12.8+:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install git+https://github.com/perezjoan/UVLM.git

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install git+https://github.com/perezjoan/UVLM.git

Then open the local Jupyter notebook.

You get the same dropdown menus, the same prompt builder form, the same batch execution. The only difference is that you type a local path for your image folder instead of a Google Drive path.

For HuggingFace authentication (needed for some gated models like LLaMA3-based checkpoints), either set the `HF_TOKEN` environment variable or run `huggingface-cli login` once in your terminal.

What Is Next

The package architecture makes it much easier to add new VLM families. InternVL, BLIP-2, CogVLM, DeepSeek-VL, and Molmo are planned for future releases — each one requires implementing the backend-specific sections of the inference function and adding entries to the registry, without touching the rest of the codebase.

We are also working on multi-GPU batching for parallel inference across images, video frame analysis support, and integration with the SAGAI workflow for automated streetscape analysis.

Links

Source code: github.com/perezjoan/UVLM

Paper: arXiv preprint — Perez & Fusco (2026)

UVLM page on this site: urbangeoanalytics.com › Software & Algorithms › UVLM

Previous blog post: Introducing UVLM: A Free Tool to Compare AI Models That Understand Images

Citation

If you use UVLM in your work, please cite:

Perez, J. & Fusco, G. (2026). UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking. arXiv:2603.13893

Table of contents

The post UVLM v3.0.0: From Colab Notebook to Python Package — Run Vision-Language Models Anywhere appeared first on Urban Geo Analytics.

Introducing UVLM: A Free Tool to Compare AI Models That Understand Images

Joan Perez — Tue, 17 Mar 2026 14:23:58 +0000

uvlm

Highlights

New open-source release: UVLM v2.2.2 — compare Vision-Language Models from a single notebook
11 AI models, 5 analysis tasks, 120 test images — all benchmarked with one tool
No coding, no installation — runs in Google Colab with a free account

Imagine you have thousands of street photographs and you need to answer the same questions about each one: how many cars are parked? Is there a sidewalk? How long is the building frontage? Hiring someone to go through every image manually would take weeks. Training a custom computer vision model would take months. But what if you could simply ask an AI model these questions in plain English — and get structured, usable answers back?

That is exactly what Vision-Language Models do. And today, we are releasing UVLM — an open-source tool that makes it easy to load, test, and compare these models, all from a single notebook in your browser.

What Are Vision-Language Models?

Vision-Language Models (VLMs) are AI systems that can look at an image and answer questions about it in natural language. Unlike traditional computer vision, which requires training a separate model for every task (one for counting cars, another for detecting sidewalks, a third for classifying buildings), a VLM handles all of these through text prompts. You write a question, attach a photo, and the model responds.

For example, you can ask a VLM: “Count all motor vehicles visible in this image” and it will answer “3”. You can ask the same model “Is there a sidewalk along the street frontage?” and it will answer “yes”. You can even ask it to estimate the length of a building facade in meters — a task that requires the model to identify reference objects (like parked cars), estimate their size, and reason about perspective. All of this from a single model, with no retraining and no labelled dataset.

The catch is that there are many VLM families available (LLaVA, Qwen, InternVL, BLIP-2, and more), and each one works differently under the hood. They use different image encoders, different tokenisation strategies, and different code to run. If you want to know which model is best for your specific task, you normally have to write separate code for each one — a tedious and error-prone process.

This Is the Problem UVLM Solves

UVLM (Universal Vision-Language Model Loader) is a free, open-source tool that lets you load, configure, and compare multiple VLM architectures using the same prompts and the same evaluation protocol — without writing any model-specific code. It runs entirely in Google Colab, which means you do not need to install anything on your computer or own a GPU. A free Google account is all you need.

The idea is simple: you pick a model from a dropdown menu, type your analysis questions into a form, point the tool at a folder of images, and hit run. UVLM handles all the technical details — the processor classes, the tokenisation, the generation settings, the output parsing — and delivers a clean CSV file with one row per image and one column per task. If you want to try a different model, you just switch the dropdown and run again. Same prompts, same images, same output format. Now you can compare.

The 3 blocks structure of UVLM Loader

A Practical Example: Scoring 120 Street Photographs

To demonstrate what UVLM can do, we benchmarked 8 different models on 120 street-level photographs of French urban frontages. Each image was analysed on five tasks: counting vehicles, detecting sidewalks, counting pedestrian entrances, estimating the street frontage length in meters, and classifying the vegetation type. That is 16 model configurations (each model tested in standard and advanced reasoning modes), 120 images, and 5 tasks per image — all processed and compared through UVLM.

The results were revealing. The largest model (LLaVA 34B, with 34 billion parameters) actually ranked last overall. A much smaller model (LLaVA Vicuna 7B) outperformed it significantly and ran on a free Google Colab GPU. The best overall results came from Qwen 32B with chain-of-thought reasoning enabled, which achieved 88% proximity to human expert annotations across all five tasks. Without UVLM, discovering these differences would have required writing and debugging eight separate inference pipelines.

Who Is UVLM For?

UVLM was designed for anyone who works with images and wants to extract structured information from them at scale — without becoming a machine learning engineer. If you are an urban planner evaluating streetscape quality across a city, UVLM lets you score thousands of street photographs using natural language prompts. If you are an environmental researcher classifying vegetation from field photographs, UVLM lets you test which AI model gives the most reliable results for your specific classification scheme. If you are an infrastructure inspector processing damage assessment photographs, UVLM lets you set up automated counting and scoring tasks and run them across your entire image archive.

The tool is also valuable for AI researchers who need a controlled benchmarking environment. Because UVLM ensures that every model receives exactly the same prompt and is evaluated with the same metrics, it produces fair, reproducible comparisons. The consensus validation feature (running each task multiple times and taking a majority vote) addresses the inherent randomness of AI outputs, and the truncation detection feature flags when a model’s response was cut off before it could finish — a common but often invisible source of errors.

How to Get Started

Getting started takes about five minutes. Open the UVLM notebook from GitHub (the link is below), connect to a GPU runtime in Google Colab, and run the first block to load a model. The second block gives you a form where you type your analysis questions — no coding required. The third block processes your images and saves the results as a CSV file on your Google Drive.

The tool currently supports 11 model checkpoints from two major families (LLaVA-NeXT and Qwen2.5-VL), ranging from 3 billion to 110 billion parameters. Models up to 34B can run on a single free-tier Colab GPU with 4-bit quantisation. Advanced features include consensus validation (2–5 runs per task with majority voting), chain-of-thought reasoning for complex tasks, and automatic truncation detection.

UVLM is released under the Apache 2.0 open-source licence. You can use it, modify it, and build on it for any purpose — academic or commercial.

Links

Source code: github.com/perezjoan/UVLM

Paper: arXiv preprint — Perez & Fusco (2026)

UVLM page on this site: urbangeoanalytics.com › Softwares & Algorithms › UVLM

Benchmark dataset: Zenodo — 120 street-view images

Citation

If you use UVLM in your work, please cite:

Perez, J. & Fusco, G. (2026). UVLM: A Universal Vision-Language Model Loader for Reproducible Multimodal Benchmarking. arXiv:2603.13893

Table of contents

The post Introducing UVLM: A Free Tool to Compare AI Models That Understand Images appeared first on Urban Geo Analytics.

Processing Spatial Data in the Cloud with GeoPandas and Google Colab

Joan Perez — Fri, 07 Nov 2025 12:54:23 +0000

Highlights

Run GeoPandas entirely in the cloud using Google Drive and Google Colab — no local setup required.
Create and analyze a polygon around Paris with simple spatial operations like buffering.
Save results back to Google Drive, completing your first cloud-based geospatial workflow.

Working with geospatial data has never been easier thanks to GeoPandas on Google Colab. This powerful combination lets you run Python scripts entirely in the cloud — no installation or setup required. In this tutorial, you’ll learn how to create, manipulate, and save geographic data using GeoPandas and Google Drive, all within a Colab notebook. We’ll build a simple polygon around Paris, apply a spatial buffer, and save the results directly to your Drive. By the end, you’ll have a lightweight, fully cloud-based workflow for reproducible geospatial analysis.

1. Setting Up Your Cloud Workspace

Before starting, open your Google Drive and create a new folder, for example named geospatial_colab_project. This folder will serve as your project directory, where you’ll store your notebooks, datasets, and outputs.

Once the folder is ready, go to Google Colab, create a new notebook, and connect it to your Drive. Colab allows you to run Python code on Google’s servers while accessing your Drive files as if they were local. This integration makes it ideal for lightweight, cloud-based geospatial processing.

You can connect your Drive with the following code that you will first add in a new code block (+ Code) and then Run by clicking on the play button.

from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Set your working directory
import os
project_folder = '/content/drive/MyDrive/geospatial_colab_project'
os.chdir(project_folder)

print("Current working directory:", os.getcwd())

After executing the cell, Colab will prompt you to authorize access to your Google Drive. Once mounted, you’ll see a folder named MyDrive appear in the Colab file browser. All files you create or modify inside this folder will automatically sync to your Drive.

If everything went smoothly, the following line will be printed:

Current working directory: /content/drive/MyDrive/geospatial_colab_project

2. Installing and Importing GeoPandas

GeoPandas extends the popular Pandas library to handle geometric data such as points, lines, and polygons. For official documentation, visit GeoPandas.org. Install it directly in Colab:

!pip install geopandas shapely fiona pyproj

Then, import the necessary libraries:

import geopandas as gpd
from shapely.geometry import Polygon

3. Creating a Simple Polygon Around Paris

Let’s create a basic polygon — a simple rectangle surrounding Paris — directly from scratch using GeoPandas and Shapely.

# Define coordinates (longitude, latitude)
paris_bounds = [
    (2.20, 48.80),  # Southwest corner
    (2.20, 48.90),  # Northwest
    (2.45, 48.90),  # Northeast
    (2.45, 48.80),  # Southeast
    (2.20, 48.80)   # Close the polygon
]

# Create a Shapely Polygon
polygon = Polygon(paris_bounds)

# ✅ Create a GeoDataFrame properly
gdf = gpd.GeoDataFrame(, crs="EPSG:4326")

# Display the GeoDataFrame
gdf

4. Performing a Simple Geospatial Operation (Buffer)

Now that you have your polygon, let’s perform a basic spatial operation — creating a 10 km buffer around Paris. This buffer will expand the polygon outward by 10,000 meters.

# Convert to a projected coordinate system for accurate distance (meters)
gdf_projected = gdf.to_crs(epsg=2154)  # Lambert-93 for France

# Create a 10 km buffer
gdf_buffer = gdf_projected.buffer(10000)

# Convert back to WGS84 for visualization
gdf_buffer = gpd.GeoDataFrame(geometry=gdf_buffer, crs="EPSG:2154").to_crs(epsg=4326)

# Plot both
ax = gdf.plot(color='blue', edgecolor='black', figsize=(6, 6))
gdf_buffer.plot(ax=ax, color='none', edgecolor='red', linewidth=2)

Map showing Paris polygon (blue) and 10 km buffer (red)

5. Saving the File Back to Google Drive

Once your data is processed, saving it back to Google Drive is straightforward. GeoPandas supports many file formats such as GeoJSON, Shapefile, and GeoPackage.

# Save as GeoJSON
output_path = os.path.join(project_folder, 'paris_buffer.geojson')
gdf_buffer.to_file(output_path, driver='GeoJSON')

You can now download the GeoJSON and for example open it in QGIS like in this example

With just a few lines of Python, you’ve connected Google Drive to Colab, created and visualized a polygon around Paris, applied a spatial buffer, and saved your results back to the cloud. This simple workflow demonstrates the power and accessibility of cloud-based geospatial computing — ideal for collaboration, education, and rapid prototyping without the need for heavy local setups.

6. Alternative Cloud-Based Geospatial Combos

While Google Drive + Google Colab is a convenient and free solution for quick experiments, other combinations can be equally effective depending on your workflow:

Combo	Description
GitHub + Kaggle Notebooks	Store your data and notebooks on GitHub and run them on Kaggle’s cloud environment, which offers free GPUs and persistent datasets.
Dropbox + Colab	Similar to Drive integration, Dropbox can be mounted via API to provide additional storage flexibility.
AWS S3 + SageMaker Studio Lab	For more advanced workflows, S3 provides scalable data storage with SageMaker’s free-tier notebooks.
Google Earth Engine + Colab	The best option for satellite or raster data processing, with integrated access to massive Earth observation datasets.

Don’t hesitate to comment and provide feedbacks by engaging with this post.

Table of contents

The post Processing Spatial Data in the Cloud with GeoPandas and Google Colab appeared first on Urban Geo Analytics.