• DaGeek247
      link
      fedilink
      272 months ago

      My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

      I’ve only gotten like, 20 visits in the past three months though, so, very small sample size.

      • mozz
        link
        fedilink
        142 months ago

        I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

        This is fuckin GENIUS

        • @Moonrise2473@feddit.it
          link
          fedilink
          72 months ago

          only if you don’t want any visits except from yourself, because this removes your site from any search engine

          should write a “disallow: /juicy-content” and then block anything that tries to access that page (only bad bots would follow that path)

            • @Moonrise2473@feddit.it
              link
              fedilink
              32 months ago

              Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt

              • @Zoop@beehaw.org
                link
                fedilink
                22 months ago

                Just in case it makes you feel any better: I’m a native English speaker who always aced the reading comprehension tests back in school, and I read it the exact same way. Lol! I’m glad I wasn’t the only one. :)

          • mozz
            link
            fedilink
            52 months ago

            You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.

      • @thingsiplay@beehaw.org
        link
        fedilink
        2
        edit-2
        2 months ago

        Interesting way of testing this. Another would be to search the search machines with adding site:your.domain (Edit: Typo corrected. Off course without - at -site:, otherwise you will exclude it, not limit to.) to show results from your site only. Not an exhaustive check, but another tool to test this behavior.

    • @Moonrise2473@feddit.it
      link
      fedilink
      102 months ago

      for common people they respect and even warn a webmaster if they submit a sitemap that has paths included in robots.txt