The one-liner:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

This is brilliant.

  • aesthelete@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    3 hours ago

    This reminds me of shitty FTP sites with ratios when I was on dial-up. I used to push them files full of null characters with filenames that looked like actual content. The modem would compress the upload as it transmitted it which allowed me to upload the junk files at several times the rate of a normal file.

      • deaddigger@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        33 minutes ago

        I mean i am not a lawyer.

        In germany we have § 303 b StGB. In short it says if you hinder someone elses dataprocessing through physical means or malicous data you can go to jail for up to 3 years . If it is a major process for someone you can get up to 5 and in major cases up to 10 years.

        So if you have a zipbomb on your system and a crawler reads and unpacks it you did two crimes. 1. You hindered that crawlers dataprocessing 2. Some isp nodes look into it and can crash too. If the isp is pissed of enough you can go to jail for 5 years. This applies even if you didnt crash them die to them having protection agsinst it, because trying it is also against the law.

        Having a zipbomb is part of a gray area. Because trying to disrupt dataprocessing is illegal, having a zipbomb can be considered trying, however i am not aware of any judgement in this regard

        Edit: btw if you password protect your zipbomb, everything is fine

        • barsoap@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          8 minutes ago

          Severely disrupting other people’s data processing of significant import to them. By submitting malicious data requires intent to cause harm, physical destruction, deletion, etc, doesn’t. This is about crashing people’s payroll systems, ddosing, etc. Not burning some cpu cycles and having a crawler subprocess crash with OOM.

          Why the hell would an ISP have a look at this. And even if, they’re professional enough to detect zip bombs. Which btw is why this whole thing is pointless anyway: If you class requests as malicious, just don’t serve them. If that’s not enough it’s much more sensible to go the anubis route and demand proof of work as that catches crawlers which come from a gazillion IPs with different user agents etc.

  • 👍Maximum Derek👍@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    41
    ·
    6 hours ago

    When I was serving high volume sites (that were targeted by scrapers) I had a collection of files in CDN that contained nothing but the word “no” over and over. Scrapers who barely hit our detection thresholds saw all their requests go to the 50M version. Super aggressive scrapers got the 10G version. And the scripts that just wouldn’t stop got the 50G version.

    It didn’t move the needle on budget, but hopefully it cost them.

      • 👍Maximum Derek👍@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        18
        ·
        3 hours ago

        Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          ·
          3 hours ago

          That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

          • 👍Maximum Derek👍@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            7
            ·
            3 hours ago

            My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              3
              ·
              2 hours ago

              Dang, I was hoping for a FOSS project that would do most of the heavy lifting for me. Maybe such a thing exists, idk, but it would be pretty cool to have a pluggable system that analyzes activity and tags connections w/ some kind of identifier so I could configure a web server to either send it nonsense (i.e. poison AI scrapers), zip bombs (i.e. bots that aren’t respectful of resources), or redirect to a honey pot (i.e. malicious actors).

              A quick search didn’t yield anything immediately, but I wasn’t that thorough. I’d be interested if anyone knows of such a project that’s pretty easy to play with.

              • A Basil Plant@lemmy.world
                link
                fedilink
                English
                arrow-up
                4
                ·
                edit-2
                1 hour ago

                Not exactly what you asked, but do you know about ufw-blocklist?

                I’ve been using this on my multiple VPSes for some time now and the number of fail2ban failed/banned has gone down like crazy. Previously, I had 20k failed attempts after a few months and 30-50 currently-banned IPs at all times; now it’s less than 1k failed after a year and maybe 3-ish banned at any time.

                There was also that paid service where users share their spammy IP address attempts with a centralized network, which does some dynamic intelligence monitoring. I forgot the name and search these days isn’t great. Something to do with “Sense”? It was paid, but well recommended as far as I remember.

                Edit: seems like the keyword is " threat intelligence platform"

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    21
    ·
    6 hours ago

    Anyone who writes a spider that’s going to inspect all the content out there is already going to have to have dealt with this, along with about a bazillion other kinds of oddball or bad data.

    • lennivelkant@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      46 minutes ago

      That’s the usual case with arms races: Unless you are yourself a major power, odds are you’ll never be able to fully stand up to one (at least not on your own, but let’s not stretch the metaphor too far). Often, the best you can do is to deterr other, minor powers and hope major ones never have a serious intent to bring you down.

      In this specific case, the number of potential minor “attackers” and the hurdle for “attack” mKe it attractive to try to overwhelm the amateurs at least. You’ll never get the pros, you just hope they don’t bother you too much.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      3
      ·
      4 hours ago

      Competent ones, yes. Most developers aren’t competent, scraper writers even less so.

  • palordrolap@fedia.io
    link
    fedilink
    arrow-up
    57
    arrow-down
    4
    ·
    7 hours ago

    The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.

    That’s not a typo. bzip2 is way better with highly redundant data.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      58
      arrow-down
      1
      ·
      edit-2
      1 hour ago

      I believe he’s returning a gzip HTTP response stream, not just a file payload that the requester then downloads and decompresses.

      Bzip isn’t used in HTTP compression.

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        11
        ·
        edit-2
        3 hours ago

        Brotli is an option, and it’s comparable to Bzip. Brotli works in most browsers, so hopefully these bots would support it.

        I just tested it, and a 10G file full of zeroes is only 8.3K compressed. That’s pretty good, though a little bigger than BZip.

  • mbirth@lemmy.ml
    link
    fedilink
    English
    arrow-up
    28
    ·
    7 hours ago

    And if you want some customisation, e.g. some repeating string over and over, you can use something like this:

    yes "b0M" | tr -d '\n' | head -c 10G | gzip -c > 10GB.gz
    

    yes repeats the given string (followed by a line feed) indefinitely - originally meant to type “yes” + ENTER into prompts. tr then removes the line breaks again and head makes sure to only take 10GB and not have it run indefinitely.

    If you want to be really fancy, you can even add some HTML header and footer to some files like header and footer and then run it like this:

    yes "b0M" | tr -d '\n' | head -c 10G | cat header - footer | gzip -c > 10GB.gz
    
  • lemmylommy@lemmy.world
    link
    fedilink
    English
    arrow-up
    47
    ·
    7 hours ago

    Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device.

    LOL. Destroy your device, kill the cat, what else?

    • archonet@lemy.lol
      link
      fedilink
      English
      arrow-up
      29
      arrow-down
      1
      ·
      7 hours ago

      destroy your device by… having to reboot it. the horror! The pain! The financial loss of downtime!

      • CrazyLikeGollum@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        32 minutes ago

        Ah yes, the infamous “stinky cheese” email virus. Who knew zip bombs could be so destructive. It erased all of the easter eggs off of my DVDs.

  • comador @lemmy.world
    link
    fedilink
    English
    arrow-up
    23
    ·
    edit-2
    7 hours ago

    Funny part is many of us crusty old sysadmins were using derivatives of this decades ago to test RAID-5/6 sequencial reads and write speeds.

    • melroy@kbin.melroy.org
      link
      fedilink
      arrow-up
      13
      arrow-down
      1
      ·
      7 hours ago

      Looks fine to me. Only 1 CPU core I think was 100%.

      10+0 records in
      10+0 records out
      10737418240 bytes (11 GB, 10 GiB) copied, 28,0695 s, 383 MB/s
      
      • melroy@kbin.melroy.org
        link
        fedilink
        arrow-up
        12
        arrow-down
        2
        ·
        7 hours ago

        ow… now the idea is to unzip it right?

        nice idea:

        if (ipIsBlackListed() || isMalicious()) {
            header("Content-Encoding: deflate, gzip");
            header("Content-Length: "+ filesize(ZIP_BOMB_FILE_10G)); // 10 MB
            readfile(ZIP_BOMB_FILE_10G);
            exit;
        }
        
        • mbirth@lemmy.ml
          link
          fedilink
          English
          arrow-up
          7
          ·
          7 hours ago

          Might need some

          if (ob_get_level()) ob_end_clean();
          

          before the readfile. 😉

  • UnbrokenTaco@lemm.ee
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    7 hours ago

    Interesting. I wonder how long it takes until most bots adapt to this type of “reverse DoS”.

    • ivn@jlai.lu
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      1
      ·
      edit-2
      7 hours ago

      Linux and Windows compress it too, for 10 years or more. And that’s not how you avoid zip bombs, just limit how much you uncompress and abort if it’s over that limit.

      • Aatube@kbin.melroy.org
        link
        fedilink
        arrow-up
        4
        arrow-down
        2
        ·
        7 hours ago

        All I know is it compresses memory. The mechanism mentioned here for ZIP bombs to crash bots is to fill up memory fast with repeating zeroes.

    • DreamButt@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 hours ago

      No, but that’s an interesting question. Ultimately it probably comes down to hardware specs. Or depending on the particular bot and it’s env the spec of the container it’s running in

      Even with macos’s style of compressing inactive memory pages you’ll still have a hard cap that can be reached with the same technique (just with a larger uncompressed file)

      • 4am@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 hours ago

        How long would it take to be considered an inactive memory page? Does OOM conditions immediately trigger compression, or would the process die first?