cpaua
·2 min2

PixelRAG: Open-Source Visual Web Scraping & RAG via Page Screenshots

Web scraping will never be the same again.

PixelRAG has been released — an open-source retriever framework that uses images of pages instead of traditional HTML parsing.

According to the developers, traditional HTML-to-text pipelines can lose more than 40% of a page’s content, including tables, charts, and layout elements. PixelRAG works with a document in the form the user sees after rendering.

How the pipeline works:

- Renders each document (web pages, PDFs, images) into a set of tiles.
- Builds embeddings using Qwen3-VL-Embedding, fine-tuned via LoRA on screenshots.
- Creates a FAISS index and provides an API for search.

If you replace the reader model with a more powerful one, accuracy will increase without reindexing, since the index stores only pixels.

For experiments, the project team created a visual index of all of Wikipedia — over 30 million screenshots. As a result, even in this format the system outperforms the best text RAG baseline by 18.1% on text-only question answering tasks.

A plugin for Claude Code was also introduced, enabling analysis of rendered pages via screenshots without working with the DOM.

The entire project is published in StarTrail-org/PixelRAGStarTrail-org/PixelRAG/blob/main/assets/pixelrag-paper.pdf access under the Apache-2.0 license, and the StarTrail-org/PixelRAGStarTrail-org/PixelRAG/blob/main/assets/pixelrag-paper.pdf contains detailed error analyses, ablation studies, and comparisons with more than 25 VLM models.

Share:
Author
cpaua

VibeCode blog admin. Writing about vibe coding, AI and open source.

Comments

To leave a comment, log in or sign up
Loading...

Related articles