Google Built Its Empire Scraping The Web. Now It’s Suing To Stop Others From Scraping Google

Sunshine (she/her)@piefed.ca · 1 day ago

Google Built Its Empire Scraping The Web. Now It’s Suing To Stop Others From Scraping Google

0_o7@lemmy.dbzer0.com · 13 hours ago

Archive: https://web.archive.org/web/20251224194440/https://www.techdirt.com/2025/12/24/google-built-its-empire-scraping-the-web-now-its-suing-to-stop-others-from-scraping-google/

Couldn’t archive on archive.today, they put up a captcha, and google one at that. That doesn’t let me through at all.

scytale@piefed.zip · 19 hours ago

mesa@piefed.social · edit-2 1 day ago

Google and OpenAI sucks:

Google’s legal theory has another significant problem: the requirement that a TPM must “effectively control” access. Just last week, a court rejected Ziff Davis’s attempt to turn robots.txt into a 1201 violation when OpenAI allegedly ignored its crawling restrictions. The court’s reasoning is directly applicable here:

OpenAI slamed my small server into the ground, until I put fail2ban on top. It was really bad, like thousands of requests per second bad.

apftwb@lemmy.world · 11 hours ago

How does fail2ban prevent scrapping? My understanding was that fail2ban works on failed login attempts.

mesa@piefed.social · 9 hours ago

There’s some premade scripts out there that make it do more. I have it hooked up to nginx and other such logs. Its common enough in login attempts for login portals online, not just ssh. It can work with any grep-able log file.

I just took two scripts other people have made, verified they soon my mini PC and set it loose. Within about 10 min it caught most scrappers and banned the IPs.

watson@sopuli.xyz · 1 day ago

Fuck Google