• mozz
      link
      fedilink
      46 months ago

      Simple and just works

          def fetch_html(self, url):
              domain = urllib.parse.urlparse(url).netloc
              if domain not in self.robot_parsers:
                  rp = urllib.robotparser.RobotFileParser()
                  rp.set_url(f'https://{domain}/robots.txt')
                  rp.read()
                  self.robot_parsers[domain] = rp
      
              rp = self.robot_parsers[domain]
              if not rp.can_fetch(self.user_agent, url):
                  print(f"Fetching not allowed by robots.txt: {url}")
                  return None
      
              if self.last_fetch_time:
                  time_since_last_fetch = time.time() - self.last_fetch_time
                  if time_since_last_fetch < self.delay:
                      time.sleep(self.delay - time_since_last_fetch)
      
              headers = {'User-Agent': self.user_agent}
              response = requests.get(url, headers=headers)
              self.last_fetch_time = time.time()
      
              if response.status_code == 200:
                  return response.text
              else:
                  print(f"Failed to fetch {url}: {response.status_code}")
                  return None
      

      Randomly selected something from a project I’m working on that’s simple and just works. Show me less than 300 lines of .NET to do the same, and I would be somewhat surprised.

        • mozz
          link
          fedilink
          06 months ago

          I am familiar.

          Not saying don’t pay your bills with it; that part sounds great. I was just confused by this guy’s enthusiasm for it, that’s all.

          • @AdamBomb
            link
            English
            16 months ago

            Well, for starters, it’s a greentext, so who knows how genuine it is, right? Most of the points listed are either subjective or citation needed fodder. However, maybe there’s one fact I can bring to the table:

            ASP.NET’s benchmark performance ranked 16th in Round 22 of the TechEmpower Web Framework Benchmarks, ranking below solutions written in Rust, C, Java, and JS. C# has advantages over each of those languages and frameworks in exchange for the relative loss in performance. Rust and C are much lower level. Java’s syntax is generally considered to lag behind C#'s at this point. JS’s disadvantages could fill a whole post of their own. C# and .NET have their own disadvantages (such as relatively fewer libraries available) as you’ve pointed out in this thread and another in this post, but when you take into consideration the relatively high performance while being a strongly-typed higher-level language with plenty of nice QoL features, you might be able to see why it could be attractive to a specific slice of professionals.

      • @ShortFuse@lemmy.world
        link
        fedilink
        16 months ago

        You left out the hundred of lines from the library you’re importing. Where’s all the code for robotparser?

        You can import libraries with C# too. That says nothing about the differences between languages.

        • mozz
          link
          fedilink
          26 months ago

          That’s exactly my point though - part of my assertion of a big weakness in C# would be that more mainstream languages (python or node) have massive libraries you can draw on with existing code for simple stuff like parsing robots.txt, whereas C# has one that probably seems pretty luxurious if you’re comparing it to nothing, but is well short of what OSS programmers are accustomed to.

          So yeah it’s not a purely fair language-design comparison but it’s a perfectly fair “how easy is it to get stuff done in this language” comparison. And then at a certain point it starts to become not just a convenience but a whole new area of computation (something like numpy or pytorch) that’s simply impossible in C# without a whole research project devoted to it to implement. That said, I’m sure there are areas (esp in heavily business-oriented fields like airline or medical backend or whatnot) where it’s the other way around, of course, and you have C#-specific stuff for that domain that would be real difficult to replicate in some other environment. I’m not trying to say that side doesn’t exist, just saying what’s generally applicable to my experience.

          So I’m not like being critical of C# because of language features (it seems perfectly fine and functional; I get what the people are saying who say they get work done every day in it and it seems fine.) But also, I think it’s relevant that it’s missing some big advantages if you’re trying to go beyond the “it doesn’t actively punish you for using it” stage.