PixelRAG: Open-Source Visual Web Scraping & RAG via Page Screenshots
Web scraping will never be the same again.
PixelRAG has been released — an open-source retriever framework that uses images of pages instead of traditional HTML parsing.
According to the developers, traditional HTML-to-text pipelines can lose more than 40% of a page’s content, including tables, charts, and layout elements. PixelRAG works with a document in the form the user sees after rendering.
How the pipeline works:
- Renders each document (web pages, PDFs, images) into a set of tiles.
- Builds embeddings using Qwen3-VL-Embedding, fine-tuned via LoRA on screenshots.
- Creates a FAISS index and provides an API for search.
If you replace the reader model with a more powerful one, accuracy will increase without reindexing, since the index stores only pixels.
For experiments, the project team created a visual index of all of Wikipedia — over 30 million screenshots. As a result, even in this format the system outperforms the best text RAG baseline by 18.1% on text-only question answering tasks.
A plugin for Claude Code was also introduced, enabling analysis of rendered pages via screenshots without working with the DOM.
The entire project is published in access under the Apache-2.0 license, and the contains detailed error analyses, ablation studies, and comparisons with more than 25 VLM models.