PixelRAG: Open-Source Visual Web Scraping & RAG via Page Screenshots

Web scraping will never be the same again.

PixelRAG has been released — an open-source retriever framework that uses images of pages instead of traditional HTML parsing.

According to the developers, traditional HTML-to-text pipelines can lose more than 40% of a page’s content, including tables, charts, and layout elements. PixelRAG works with a document in the form the user sees after rendering.

How the pipeline works:

- Renders each document (web pages, PDFs, images) into a set of tiles.
- Builds embeddings using Qwen3-VL-Embedding, fine-tuned via LoRA on screenshots.
- Creates a FAISS index and provides an API for search.

If you replace the reader model with a more powerful one, accuracy will increase without reindexing, since the index stores only pixels.

For experiments, the project team created a visual index of all of Wikipedia — over 30 million screenshots. As a result, even in this format the system outperforms the best text RAG baseline by 18.1% on text-only question answering tasks.

A plugin for Claude Code was also introduced, enabling analysis of rendered pages via screenshots without working with the DOM.

The entire project is published in StarTrail-org/PixelRAGStarTrail-org/PixelRAG/blob/main/assets/pixelrag-paper.pdf access under the Apache-2.0 license, and the StarTrail-org/PixelRAGStarTrail-org/PixelRAG/blob/main/assets/pixelrag-paper.pdf contains detailed error analyses, ablation studies, and comparisons with more than 25 VLM models.

PixelRAG: Open-Source Visual Web Scraping & RAG via Page Screenshots

Comments

Related articles

Graph-Based Multimodal RAG for Document Processing on LightRAG

zvec: Lightweight Local Vector Search for Your Own Knowledge Base

Open-Source RAG Method: 40x Smaller Corpus, 3x Fewer Tokens