rga: Search Text In PDF, Ebooks, Office Documents, Archives And More (ripgrep Wrapper)
rga (or ripgrep-all) is a command line tool to recursively search all files in a directory for a regex pattern, that runs on Linux, macOS and Windows. It's a wrapper for ripgrep, the line-oriented recursive search program, on top of which it enables search in a multitude of file types like PDF, DOCX, ODT, EPUB, SQLite databases, movies subtitles embedded in MKV or MP4 files, archives like ZIP or GZ, and more.
rga is great when you want to search for some text from a file available in a folder with many documents of various file types, even if some of them are available in archives.
And it's fast too, even from the first run, thanks to multithreading. On subsequent runs though, it's even faster (like it was searching through plain text files) thanks to caching. The cache can be disabled if you wish though, by using
--rga-no-cache
.rga uses ripgrep (rg) to do the searching, with some options set. For some file types, external programs are used to do the actual work, for example using ffmpeg to read subtitles from mkv or mp4 files, pandoc to convert documents like EPUB, ODT, DOCX, FB2 or IPYNB to plain markdown-like text, and grip and tar to read archive contents.
Besides being able to search text in documents, archives and in subtitles embedded in mkv or mp4 files, rga can also search for text in JPG or PNG images, or scanned PDF files, using OCR (with the use of tesseract). This feature is disabled by default though, because it's slow and it's not useful most of the time, but it can be enabled using
--rga-adapters=+pdfpages,tesseract
.Search-related: Drill: New Desktop File Search Utility That Uses Clever Crawling Instead Of Indexing
This is a list of rga (ripgrep-all) adapters and supported file types:
- ffmpeg:
- Uses ffmpeg to extract video metadata/chapters and subtitles
- Extensions: .mkv, .mp4, .avi
- pandoc:
- Uses pandoc to convert binary/unreadable text documents to plain markdown-like text
- Extensions: .epub, .odt, .docx, .fb2, .ipynb
- poppler:
- Uses pdftotext (from poppler-utils) to extract plain text from PDF files
- Extensions: .pdf
- zip:
- Reads a zip file as a stream and recurses down into its contents
- Extensions: .zip
- Mime Types: application/zip
- decompress:
- Reads compressed file as a stream and runs a different extractor on the contents.
- Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
- Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd
- tar:
- Reads a tar file as a stream and recurses down into its contents
- Extensions: .tar
- sqlite:
- Uses sqlite bindings to convert sqlite databases into a simple plain text format
- Extensions: .db, .db3, .sqlite, .sqlite3
- Mime Types: application/x-sqlite3
- pdfpages (disabled by default):
- Converts a pdf to it's individual pages as png files. Only useful in combination with tesseract
- Extensions: .pdf
- tesseract (disabled by default):
- Uses tesseract to run OCR on images to make them searchable. May need
-j1
to prevent overloading the system. Make sure you have tesseract installed. - Extensions: .jpg, .png
Download rga (ripgrep-all)
The rga GitHub project page has instructions for installing the tool on Linux, Windows or macOS.
Remember to install the dependencies used by the rga adapters to be able to search in all the file types it supports (and ripgrep itself): ripgrep, pandoc, poppler (poppler-utils package on Debian/Ubuntu; name depends on the Linux distribution you're using), ffmpeg and cargo.
You can install the rga binary by downloading the Linux x86_64 .tar.gz archive, extract it, and install the rga and rga-preproc binaries to
/usr/local/bin
using (run the command in the folder where these two binaries were extracted):sudo install rga rga-preproc /usr/local/bin/
After installation, use it by typing rga followed by your search query and the folder where to look. For example:
rga "text to find" ~/Documents
Also checkout the available rga flags, and its help information (
rga --help
).