Llava

Introducing UVLM: A Free Tool to Compare AI Models That Understand Images
Gallery
Introducing UVLM: A Free Tool to Compare AI Models That Understand Images

Intermediate, Python, Vision Language Model

Introducing UVLM: A Free Tool to Compare AI Models That Understand Images

UVLM is a free, open-source tool for loading, testing, and comparing Vision-Language Models on custom image analysis tasks. Running entirely in Google Colab, it lets researchers and practitioners benchmark multiple AI models using the same prompts and images — no coding, no GPU ownership, no model-specific pipelines. This post explains what VLMs are, why comparing them matters, and how to get started in five minutes.

Joan Perez2026-03-17T14:23:58+00:00March 17, 2026|Categories: Intermediate, Python, Vision Language Model|Tags: Benchmarking, Chain-of-Thought, Google Colab, Image Analysis, Llava, Multimodal AI, Open Source, Qwen, UVLM, VLM|0 Comments

A Stable and Reproducible Vision–Language Inference Engine for SAGAI v1.1
Gallery
A Stable and Reproducible Vision–Language Inference Engine for SAGAI v1.1

Python, Urbanism, Vision Language Model

A Stable and Reproducible Vision–Language Inference Engine for SAGAI v1.1

SAGAI v1.1 introduces Module 3 v2.0, a stable and reproducible vision–language inference engine for streetscape analysis. Built exclusively on Hugging Face LLaVA models, it enables robust multimodal processing of street-level images for large-scale urban and geospatial analysis.

Joan Perez2025-12-17T17:07:11+00:00December 17, 2025|Categories: Python, Urbanism, Vision Language Model|Tags: AI, Cloud Computing, GIS, Llava, Spatial Analysis|0 Comments