cpaua·4h ago1 min6

NVIDIA Open-Sources LocateAnything-3B for Faster Visual Localization

Nvidia Computer Vision Object Detection Open Source AI Models

Читати українською

NVIDIA has open-sourced the visual localization model LocateAnything-3B.

The model can find objects even in very dense scenes. For example, in an image with dozens of minions standing close together, it correctly highlights each one with a separate bounding box.

The main difference from most existing models is the way bounding boxes are generated. Usually, the coordinates (x1, y1, x2, y2) are predicted sequentially, digit by digit. This slows things down, and errors at early stages can affect subsequent coordinates, especially when there are many objects.

LocateAnything-3B uses parallel decoding: the model immediately predicts complete, ready-made boxes, rather than constructing them step by step. Thanks to this, detection becomes more stable, especially in scenes with a large number of objects.

For training, not only classic object recognition datasets were used, but also data for UI recognition, OCR, and document structure analysis. Therefore, the model can find both real-world objects and user interface elements and text regions.

The model has 3 billion parameters and is released as open source.

Share:

Author

VibeCode blog admin. Writing about vibe coding, AI and open source.

Comments

To leave a comment, log in or sign up

Loading...

Related articles

PixelRAG: Open-Source Visual Web Scraping & RAG via Page Screenshots

PixelRAG is an open-source retriever that indexes rendered page screenshots, not HTML. Uses Qwen3-VL embeddings, FAISS, and beats text RAG on QA.

NVIDIA Releases SkillSpector: Open-Source Agent Skill Security Tool

SkillSpector by NVIDIA is an open-source tool to find vulnerabilities in agent skills, including prompt injection, data leaks, unsafe code, and dependencies.

Bumblebee Open Source: Read-Only Scanner for AI Tool Supply Chain

Perplexity open-sources Bumblebee, a read-only metadata scanner for security issues in package managers, IDE plugins, browser extensions, and AI tool configs.