• ebu@awful.systems
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 days ago

      dividing by commits is some nasty sleight of hand given the commit rate has gone through the roof. “only 3 bugs per 10 commits!!” doesn’t really mean much when there’s an order of magnitude more slop commits than not

    • flere-imsaho@awful.systems
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 days ago

      a person whose last two years worth of github contributions are related to confabulation machinery, surely this is an unbiased analysis.

  • WhoIzDisIz@lemmy.today
    link
    fedilink
    English
    arrow-up
    33
    ·
    4 days ago

    But the answer to finding yourself being load-bearing is not to start using AI code with AI tests.

    The Great Man theory of open source development, where it all hinges on one heroic individual, has always been a fatal weakness. It happens because the companies benefiting from the software just will not pay the individual guys who let their company work. So the companies try to make the guys feel obligated to do work for them for free.

    Those guys have to start saying “no.” Go sailing. Declare the project closed and see if the beneficiaries will finally contribute. Maybe they will, maybe they won’t. But no company will put in the developers or money for this stuff to be done until you say “no”.

    You heard it from the Ray-Guns first, but apparently you need to hear it again: “Just say no!”

  • degenerate_neutron_matter@fedia.io
    link
    fedilink
    arrow-up
    14
    arrow-down
    1
    ·
    3 days ago

    So that’s why my backup script, which has worked perfectly for months, failed completely the last time I tried to run it. Guess I’ll be downgrading to the last non-slop version.

  • MoonMelon@lemmy.ml
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    5
    ·
    4 days ago

    Reading his response, I think calling it “slop” isn’t being totally fair, but it does sound like he should hand it off again or close the project. Not having test coverage for something is bad, but it happens. It sounds like the alternatives have this issue also. But the sailing comment is kind of tragic. Just go sailing, dude. Unless you have a phylactery under your desk the project will outlive you anyway, and honestly that’s the best compliment a developer can get.

      • MoonMelon@lemmy.ml
        link
        fedilink
        English
        arrow-up
        22
        arrow-down
        5
        ·
        4 days ago

        I rewrote the rsync test suite in python from the old shell script design. I did the design for that myself (and I’m really quite pleased with it), but used claude with cross-checks from codex and gemini to do the grunt work. I did not just vibe-code “convert test suite to python”… I used AI tools to do the grunt work because they are good at that. I reviewed every part of it myself and ran through a huge amount of CI time getting it right

        If what he claims is true then he’s using LLMs for test coverage with significant editing by hand. I hate LLMs, but even I have to admit this seems like one of the few, valid use cases of LLM assisted coding. Unless “slop” has become one of those words that’s just lost all meaning.

        • AnarchistArtificer@slrpnk.net
          link
          fedilink
          English
          arrow-up
          9
          ·
          3 days ago

          On one of the BlueSky threads going over over the test code, one of the things they uncovered was some stuff running as root which in no world should be necessary. He may not have just prompted Claude to “convert test suite to python”, but there’s a lot there that seem like clear red flags in terms of AI slop code.

          Which is no surprise, really, given that properly proof-reading AI code is often much more labour intensive than just writing the code oneself. It’s easy for things like this to slip through the cracks, even if you are trying to check the AI output

        • diz@awful.systems
          link
          fedilink
          English
          arrow-up
          9
          ·
          edit-2
          3 days ago

          It’s a perfect example of how “using LLMs for test coverage” can also be harmful. He expected the tests to to prevent introduction of said regressions, probably based on a combination of the quantity of tests and their style (they look like what decent human written tests look like). But the tests are AI slop, and so they give a lot less value per line of code than he expects, hence a significant regression.

          It is literally useful to call these tests AI slop, and the problem is in part caused by not calling them AI slop, and having consequent inflated expectations.

          • MoonMelon@lemmy.ml
            link
            fedilink
            English
            arrow-up
            11
            ·
            4 days ago

            I don’t know anything about rsync aside from as a user, but I am pretty experienced with Python and I admit those tests look really bizarre. If he did “slot machine” code it (a term I wasn’t familiar with) then yeah, I agree that’s slop. If he didn’t, I don’t understand why he made these changes. OK yeah, that’s a bad sign.