If you run a web server, your logs are full of probes and scanners looking for misconfigured systems. This site is running on Next.js. There’s no PHP, no WordPress, no Spring Boot, no Java to be exploited. And yet I’m constantly inundated with requests for /wp-login.php, /.env, /actuator/env, and a few hundred other paths that have nothing to do with anything I run.
192.0.2.41 "GET /wp-login.php HTTP/1.1" 404 "Mozilla/5.0" 192.0.2.41 "GET /wp-config.php HTTP/1.1" 404 "Mozilla/5.0" 198.51.100.3 "GET /.env HTTP/1.1" 404 "curl/8.4.0" 203.0.113.7 "GET /actuator/env HTTP/1.1" 404 "Nuclei/2.9" 203.0.113.7 "GET /actuator/heapdump HTTP/1.1" 404 "Nuclei/2.9" 192.0.2.66 "POST /xmlrpc.php HTTP/1.1" 404 "Go-http-client/2.0" 198.51.100.91 "GET /phpunit/.../eval-stdin.php HTTP/1.1" 404 "-" 203.0.113.2 "GET /phpmyadmin/index.php HTTP/1.1" 404 "Mozilla/5.0"
These are obviously bad actors who don’t play by the rules. The right thing to do is set up a firewall rule and block them.
I had a different (far dumber?) idea.
The idea
The idea is simple. Instead of 404ing on these endpoints, what if I served back a zip bomb that looks like a valid payload?
I knew about zip bombs already. Small file on the wire, large file once decompressed. The trick relies on the decompressor doing the work after the bytes have already arrived. So could I exploit Content-Encoding: gzip on these 404 endpoints? Pick the paths scanners actually probe, prebuild a gzip body for each MIME the scanner expected, and turn every boring 404 into a 200 that inflates straight into the crawler’s process memory.
It would bring me joy to crash some script kiddies’ dumb crawler code with an OOM error. Or at least slow them down for a little bit.
The trap
What if I made a carefully crafted robots.txt that lists every one of these paths under Disallow? It does two things at once. It tells real crawlers like Googlebot, Bingbot, and archive.org to stay out of these paths, so they never get zip-bombed by accident. And it turns those same paths into a honeypot: anyone who shows up anyway has identified themselves as a scanner by ignoring a rule they were given. Instead of a 404, they get a gzip bomb.
The numbers above are this site’s slice of that traffic since the trap went live. Real scanners. Real hits. Their resources burn. Mine don’t.
The bomb
There’s one bomb per MIME the scanner expected: HTML for admin panels, JSON for REST probes, YAML for config files, plain text for credential probes. The point is to keep their parser engaged after the inflate, not just their network stack. Credit to Ache’s HTML Zip Bomb for refining the HTML variant.
The production bombs inflate to about 2 GB. The buttons below serve a 2 MB version so you can sample the trick without your tab dying.