Tripwire

If you run a web server, your logs are full of probes and scanners looking for misconfigured systems. This site is running on Next.js. There’s no PHP, no WordPress, no Spring Boot, no Java to be exploited. And yet I’m constantly inundated with requests for /wp-login.php, /.env, /actuator/env, and a few hundred other paths that have nothing to do with anything I run.

192.0.2.41    "GET /wp-login.php HTTP/1.1"                  404  "Mozilla/5.0"
192.0.2.41    "GET /wp-config.php HTTP/1.1"                 404  "Mozilla/5.0"
198.51.100.3  "GET /.env HTTP/1.1"                          404  "curl/8.4.0"
203.0.113.7   "GET /actuator/env HTTP/1.1"                  404  "Nuclei/2.9"
203.0.113.7   "GET /actuator/heapdump HTTP/1.1"             404  "Nuclei/2.9"
192.0.2.66    "POST /xmlrpc.php HTTP/1.1"                   404  "Go-http-client/2.0"
198.51.100.91 "GET /phpunit/.../eval-stdin.php HTTP/1.1"    404  "-"
203.0.113.2   "GET /phpmyadmin/index.php HTTP/1.1"          404  "Mozilla/5.0"

These are obviously bad actors who don’t play by the rules. The right thing to do is set up a firewall rule and block them.

I had a different (far dumber?) idea.

The idea

The idea is simple. Instead of 404ing on these endpoints, what if I served back a zip bomb that looks like a valid payload?

I knew about zip bombs already. Small file on the wire, large file once decompressed. The trick relies on the decompressor doing the work after the bytes have already arrived. So could I exploit Content-Encoding: gzip on these 404 endpoints? Pick the paths scanners actually probe, prebuild a gzip body for each MIME the scanner expected, and turn every boring 404 into a 200 that inflates straight into the crawler’s process memory.

It would bring me joy to crash some script kiddies’ dumb crawler code with an OOM error. Or at least slow them down for a little bit.

The trap

What if I made a carefully crafted robots.txt that lists every one of these paths under Disallow? It does two things at once. It tells real crawlers like Googlebot, Bingbot, and archive.org to stay out of these paths, so they never get zip-bombed by accident. And it turns those same paths into a honeypot: anyone who shows up anyway has identified themselves as a scanner by ignoring a rule they were given. Instead of a 404, they get a gzip bomb.

The numbers above are this site’s slice of that traffic since the trap went live. Real scanners. Real hits. Their resources burn. Mine don’t.

The bomb

There’s one bomb per MIME the scanner expected: HTML for admin panels, JSON for REST probes, YAML for config files, plain text for credential probes. The point is to keep their parser engaged after the inflate, not just their network stack. Credit to Ache’s HTML Zip Bomb for refining the HTML variant.

The production bombs inflate to about 2 GB. The buttons below serve a 2 MB version so you can sample the trick without your tab dying.

A few notes on how this works

The bait list is one TypeScript module that two things import. The proxy imports it to match incoming requests. The robots.txt route imports it to emit Disallow lines. One source of truth, no drift.

There’s no User-Agent allowlist. Real crawlers already respect robots.txt, so an allowlist would protect against a problem that doesn’t exist. Meanwhile, scanners spoof crawler UAs constantly. An allowlist would mostly be a free bypass. Doing this right means reverse-DNS verification, which is real work for a hypothetical benefit.

Generic paths are excluded: /admin, /login, /dashboard, anything reasonable a future page might want. Only the specific scanner-authored variants (/wp-admin/, /administrator/, /phpmyadmin/) are bait. The full list lives in research/tripwire/, along with where each pattern came from.

tripwire

The idea

The trap

The bomb

try one

The numbers

daily activity

what they were looking for

ua families

most-probed paths

where they came from / origin networks (top 10)

A few notes on how this works

References