Tuning qgrep Config to Index Everything (But Skip the Junk)

Yesterday I walked through mapping a Synology share and pointing qgrep at breach data so the index lives next to the dataset instead of cluttering my home directory. That’s great for portability, but there’s one more step worth calling out: making sure the index actually covers all the file types you care about.

Out of the box, qgrep is tuned for source code search. Its default .cfg file includes a ton of language extensions (.cpp, .java, .cs, etc.), but in the breach-analysis world we don’t just see clean .json and .csv — we see:

  • .sql dumps
  • .tsv, .bak, .log
  • files with no extension at all (shadow, passwd, dump)
  • random one-off names that would never match the defaults

If you don’t tweak the config, you’ll miss a huge percentage of what matters.


My Config

Here’s the version I landed on for my china.cfg project, which points at Z:/breach_data/china:

path Z:/breach_data/china

# index all files
include .*

# exclude obvious binaries
exclude \.(exe|dll|so|o|a|class|jar|pdb|jpg|jpeg|png|gif|bmp|ico|tif|tiff|mp3|mp4|avi|mov|wmv|zip|rar|7z|gz|tar|xz)$

Why This Works

  • include .* → blunt hammer. It matches literally everything, regardless of extension. No more worrying about weird .dump files slipping through.
  • Exclude list → keeps your index lean. No need to burn cycles indexing images, video, archives, or executables. Those add zero value in a text-search workflow and can blow up index size fast.

This strikes a nice balance: you grab all the text-like content you care about without indexing 12GB of .mp4 memes someone dropped into the dump.


Updating the Index

After editing the config, just re-run:

qgrep update "Z:\qgrep_china_index\china.cfg"

That will rebuild using the new include/exclude filters. Once that finishes, you can go right back to searching:

qgrep search "Z:\qgrep_china_index\china.cfg" i "password"
qgrep search "Z:\qgrep_china_index\china.cfg" il "example@domain.com"

Wrap-Up

The takeaway: don’t settle for the defaults. qgrep is lightning-fast, but only if you feed it the right filters. By flipping your config to include everything and then exclude just the obvious junk, you guarantee coverage across messy, real-world breach data without ballooning the index with useless files.