The linked page has 2 “PDF” docs. But they are not really PDFs. If you wget them, they are HTML with javascript embedded.

So we can no longer simply download a PDF anymore. Apparently we must run a JavaScript application to get the PDF in a browser tab, then use pdf.js to save it. WTF? This breaks my script (which stores the URL as metadata on every PDF I fetch).

Other sites do this too. I’ve seen websites for restaurants pull this shit with their menus.

What’s the point?

  • MoonMelon@lemmy.ml
    link
    fedilink
    English
    arrow-up
    3
    ·
    12 days ago

    I just searched for parts of the js an apparently this is some kind of anti-scraping javascript detection courtesy of F5 Networks.

    Here is someone complaining about it on some forums.

    • autonomousPunk@belgae.socialOP
      link
      fedilink
      arrow-up
      3
      ·
      12 days ago

      Thanks for the insight. Apparently Mozilla is okay with this.

      I suspect it violates open data law to impose JS execution as a precondition to reaching public documents.