PDF Search Mastery: Rga, Fzf, And Sioyek For Peak Efficiency

by Kenji Nakamura 61 views

Hey guys! Ever felt like searching through PDFs is like navigating a maze? You know, you're hunting for that one piece of information, but it's buried somewhere in a mountain of text. Well, I've got a solution that's going to blow your mind: combining rga (ripgrep-all), fzf (fuzzy finder), and Sioyek. Trust me, this is a game-changer for anyone who works with PDFs regularly.

What are rga, fzf, and Sioyek?

Before we dive into the nitty-gritty, let's quickly break down what these awesome tools are:

  • rga (ripgrep-all): Think of rga as the superhero version of grep. It's a command-line tool that searches files for specific patterns, but it's not just for plain text. rga can handle PDFs, DOCX, EPUBs, and a whole bunch of other formats. It's fast, it's smart, and it's going to be your new best friend for finding stuff in your documents.
  • fzf (fuzzy finder): fzf is like the autocomplete on steroids. It's a command-line fuzzy finder that lets you interactively search through lists of anything – files, directories, command history, you name it. You start typing, and fzf narrows down the results in real-time. It's super intuitive and makes navigating the command line a breeze.
  • Sioyek: Sioyek is a free PDF viewer designed for quickly navigating and reading PDF documents, especially research papers and technical documentation. Its killer feature is its ability to jump directly to a specific page within a PDF based on search results, which is exactly what we need for this setup.

Why Use This Combo?

Okay, so why bother using all three of these tools together? Here's the deal:

  • Speed: rga is incredibly fast at searching through files, and fzf lets you quickly narrow down your results. This combo is way faster than manually searching through PDFs.
  • Precision: rga can handle different file formats, so you don't have to worry about converting PDFs to text before searching. Plus, fzf's fuzzy search means you can find what you're looking for even if you don't remember the exact wording.
  • Efficiency: Sioyek's ability to open PDFs directly to the page containing your search term is a massive time-saver. No more flipping through pages trying to find the right spot!
  • Productivity: By combining these tools, you'll drastically reduce the time you spend searching for information, freeing you up to focus on the actual content.

Setting Up the Dream Team: Installation

Alright, let's get these tools installed. I'm going to assume you're on a Unix-like system (Linux, macOS), but if you're on Windows, don't worry, there are ways to get these tools working there too (you might need to use WSL or a similar environment).

Installing rga (ripgrep-all)

rga is built on top of ripgrep, so you'll need to install that first. The easiest way is usually through your system's package manager.

  • Debian/Ubuntu:
    sudo apt update
    sudo apt install ripgrep
    
  • Fedora/CentOS/RHEL:
    sudo dnf install ripgrep
    
  • macOS (using Homebrew):
    brew install ripgrep
    

Once ripgrep is installed, you can install rga. There are a few ways to do this, but the easiest is usually using cargo, the Rust package manager (if you don't have Rust installed, you can get it from https://www.rust-lang.org/):

cargo install ripgrep-all

Installing fzf (fuzzy finder)

fzf also has installation packages for most systems:

  • Debian/Ubuntu:
    sudo apt install fzf
    
  • Fedora/CentOS/RHEL:
    sudo dnf install fzf
    
  • macOS (using Homebrew):
    brew install fzf
    # Optional: Install shell completion and key bindings
    $(brew --prefix fzf)/install
    

Follow the instructions that fzf's installer provides to set up shell completion and keybindings – it'll make your life a lot easier!

Installing Sioyek

Sioyek has prebuilt packages for various distributions:

  • Debian/Ubuntu:
    sudo apt install sioyek
    
  • Fedora:
    sudo dnf install sioyek
    
  • Arch Linux:
    sudo pacman -S sioyek
    
  • macOS (using Homebrew):
    brew install --cask sioyek
    

The Magic Script: Putting It All Together

Okay, now for the fun part! We're going to create a script that ties rga, fzf, and Sioyek together. This script will:

  1. Use rga to search for a term in your PDFs.
  2. Pipe the results to fzf for fuzzy finding.
  3. Extract the filename and page number from the selected result.
  4. Open the PDF in Sioyek, jumping directly to the page where the search term was found.

Here's the script (you can save it as something like pdfsearch.sh and make it executable with chmod +x pdfsearch.sh):

#!/usr/bin/env bash

# Check if rga, fzf, and sioyek are installed
if ! command -v rga &> /dev/null
then
  echo "rga is not installed. Please install it."
  exit 1
fi

if ! command -v fzf &> /dev/null
then
  echo "fzf is not installed. Please install it."
  exit 1
fi

if ! command -v sioyek &> /dev/null
then
  echo "sioyek is not installed. Please install it."
  exit 1
fi

# Get the search term from the command line
if [ -z "$1" ]; then
  echo "Usage: $0 <search_term>"
  exit 1
fi
search_term="$1"

# Search for the term using rga and pipe to fzf
rga --column --line --no-heading --files-with-matches "$search_term" *.pdf | \
fzf --ansi --multi --preview 'echo {} | sed "s/:[0-9]*:/
/" | head -n 1 | xargs -I {} pdftotext -f $(echo {} | sed "s/.*://g" | sed "s/-.*//g") -l $(echo {} | sed "s/.*://g" | sed "s/^[0-9]*-//g") {} - | rg --context 10 "$search_term"'

if [ -z "$REPLY" ]; then
    exit 0
fi

# Extract the filename and page number from the selected result
selected_result="$REPLY"
filename=$(echo "$selected_result" | cut -d':' -f1)
pagenumber=$(echo "$selected_result" | cut -d':' -f2)

# Open the PDF in Sioyek, jumping to the page
sioyek "$filename:#$pagenumber"

echo "Opening $filename at page $pagenumber"

Let's break down what this script does:

  • #!/usr/bin/env bash: This shebang line tells the system to use bash to execute the script.
  • if ! command -v rga &> /dev/null ...: This section checks if rga, fzf, and sioyek are installed. If any of them are missing, it prints an error message and exits.
  • if [ -z "$1" ]; then ...: This checks if a search term was provided as a command-line argument. If not, it prints a usage message and exits.
  • search_term="$1": This stores the search term in a variable.
  • rga --column --line --no-heading --files-with-matches "$search_term" *.pdf | ...: This is the core of the script. Let's break it down further:
    • rga --column --line --no-heading --files-with-matches "$search_term" *.pdf: This runs rga to search for the search term ("$search_term") in all PDF files (*.pdf). The options do the following:
      • --column: Show column numbers in the results.
      • --line: Show line numbers in the results.
      • --no-heading: Don't show the filename as a heading.
      • --files-with-matches: Only show filenames that contain matches (this makes the output cleaner for fzf).
    • |: This pipes the output of rga to the next command.
    • fzf --ansi --multi --preview '...': This pipes the results to fzf for fuzzy finding. The options do the following:
      • --ansi: Enables ANSI color support (so you can see the highlighted search terms).
      • --multi: Allows you to select multiple results.
      • --preview '...' : This is the magic that shows a preview of the search result in fzf. The preview command is a bit complex, so let's break it down further:
      • echo {} | sed "s/:[0-9]*:/ /": The {} is a placeholder for the selected line from rga output (e.g., filename.pdf:123:The matching line). This part extracts the filename by removing the line and column number.
      • head -n 1: Takes the first line, which is the filename.
      • xargs -I {} pdftotext -f ... -l ... {} -: This is the core of the preview. It uses pdftotext to extract text from the PDF for the page containing the match. xargs -I {} takes the filename as input and substitutes it into the command.
      • pdftotext -f $(...) -l $(...) {} -: This part extracts the text from a specific page range in the PDF using pdftotext and outputs it to standard output.
      • -f $(echo {} | sed "s/.*://g" | sed "s/-.*//g") and -l $(echo {} | sed "s/.*://g" | sed "s/^[0-9]*-//g"): These extract the starting and ending page numbers from the fzf input.
      • rg --context 10 "$search_term": Finally, the extracted text is piped to rg (ripgrep) to highlight the search term within a context of 10 lines around it.
  • if [ -z "$REPLY" ]; then ...: This checks if the user selected anything in fzf. If not (i.e., they pressed Esc or Ctrl-C), the script exits.
  • selected_result="$REPLY": This stores the selected result from fzf (which will be in the format filename.pdf:page_number:matching_line).
  • filename=$(echo "$selected_result" | cut -d':' -f1): This extracts the filename from the selected result.
  • pagenumber=$(echo "$selected_result" | cut -d':' -f2): This extracts the page number from the selected result.
  • sioyek "$filename:#$pagenumber": This opens the PDF in Sioyek, using the #page_number syntax to jump directly to the page.
  • echo "Opening $filename at page $pagenumber": This prints a confirmation message.

Using the Script: Let the Searching Begin!

To use the script, just run it from the command line, providing your search term as an argument:

./pdfsearch.sh your_search_term

For example, if you wanted to search for "quantum mechanics" in your PDFs, you'd run:

./pdfsearch.sh "quantum mechanics"

fzf will pop up with a list of matching results. You can use the arrow keys to navigate, type to filter the results, and press Enter to select a result (or Ctrl-Enter to select multiple results if you used the --multi option). Sioyek will then open the PDF to the page containing your search term.

Level Up Your PDF Kung Fu

This setup is a powerful way to search through PDFs, but there are a few things you can do to make it even better:

  • Create an alias: You can create an alias in your .bashrc or .zshrc file to make it easier to run the script. For example, you could add the following line:
    alias pdfsearch='./pdfsearch.sh'
    
    Then you can just run pdfsearch your_search_term.
  • Customize fzf: fzf has tons of options for customization. You can change the colors, the layout, the keybindings, and more. Check out the fzf documentation for details.
  • Integrate with your editor: If you use a text editor like Neovim or VS Code, you can create a similar script to open PDFs directly from your editor.

Conclusion: Conquer Your PDFs

So there you have it! By combining rga, fzf, and Sioyek, you can create a super-efficient PDF searching workflow. No more getting lost in endless pages – you'll be able to find exactly what you need in seconds. Go forth and conquer your PDFs!