• daq
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    4
    ·
    7 hours ago

    I’m not sure how they actually implemented it, but you can easily block ML crawlers via cloud flare. Isn’t just about every small site/service behind CF anyway?

    • grysbok
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      6 hours ago

      Last I checked, cloudflare requires the user to have JavaScript and cookies enabled. My institution doesn’t want to require those because it would likely impact legitimate users as well as bots.

      • daq
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 hours ago

        Huh? I can reach my site via curl that has neither. How did you come up with this random set of requirements?

        • grysbok
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 hours ago

          Odd. I just tried

          curl https://www.scrapingcourse.com/cloudflare-challenge

          and got

          Enable JavaScript and cookies to continue

          I’m clearly not on the same setup as you are, but my off-the-cuff guess is that your curl command was issued from a system that cloudflare already recognized (IP whitelist, cookies, I dunno).

          Anyways, I’m reading through this blog post on using cURL with cloudflare-protected sites and I’m finding it interesting.

          • daq
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 hours ago

            Of course their challenge requires those things. How else could they implement it? Most users will never be presented with a challenge though and it is trivial to disable if you don’t want to ever challenge anyone. I was just saying CF blocks ML crawlers.