cpaua·May 20, 2026 at 03:18 PM1 min484

Marlin-2B Open Source: 2B Vision-Language Model for Video Search

Marlin-2B Vision-Language Models Video Analytics Open Source Hugging Face

Читати українською

Marlin-2B Open Source: 2B Vision-Language Model for Video Search

The source code for Marlin-2B has been released

This is a compact vision-language model for extracting structured information from video.

Marlin was fine-tuned for two key queries that developers most often need when working with video: what is happening and exactly when.

For its size class, the model shows strong results, competing with Gemini-2.5-flash while having only 2B parameters.

Marlin was trained in two modes:

1. marlin.caption() returns structured JSON with the scene and events, with timecodes accurate to the second.

This can be used to generate subtitles for Reels videos, index a video library, or provide an agent with context about what happened and when in a video stream.

2. marlin.find() returns timecodes (start, end) for any natural-language query about the video.

Fast enough to run directly in an agent loop; can be used to search for video segments with sub-second precision.

model: NemoStation/Marlin-2Bhuggingface.co/NemoStation/Marlin-2B
demo: https://vlm.nemostation.com/

Share:

Author

VibeCode blog admin. Writing about vibe coding, AI and open source.

Comments

To leave a comment, log in or sign up

Loading...

Related articles

Bumblebee Open Source: Read-Only Scanner for AI Tool Supply Chain

Perplexity open-sources Bumblebee, a read-only metadata scanner for security issues in package managers, IDE plugins, browser extensions, and AI tool configs.

Graph-Based Multimodal RAG for Document Processing on LightRAG

Open-source, graph-based universal multimodal RAG system built on LightRAG to process documents and unify text, images, tables, and more.

Google Open-Sources DESIGN.md Spec with Tokens, Components & CLI

Google released a draft DESIGN.md spec on GitHub with tokens, early components, and a CLI validator—enabling cross-platform use and WCAG-aware agents.