• Soviet Snake@lemmygrad.ml
    link
    fedilink
    arrow-up
    2
    ·
    8 days ago

    I don’t understand, please illuminate me though, how is this relevant? If AI doesn’t know anything you should have to feed it the proper data, is this relevant because it managed to provide the right answer with its existing database to something he wasn’t able to figure out?

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      arrow-up
      16
      ·
      8 days ago

      The key detail here is that the AI did not just look up the answer in a database. This was an open problem that had not been solved yet by anyone. Claude managed to suss out the correct algorithm for the problem completely on its own, and Knuth was able to rigorously prove that it was mathematically correct. If the model were just a search engine regurgitating existing data it would have failed because the solution literally did not exist anywhere in human records.

      The power of these tools is the massive amount of abstract knowledge embedded within their neural networks. When you throw an advanced model at a novel math problem it can find correlations much easier than a human can. Think about how you approach a complex problem yourself. Your brain is limited by the specific algorithms you have learned and the familiar contexts you are used to applying them in. An LLM possesses a much larger knowledge space than what can fit in a single human head .

      Because of that massive latent space it can notice subtle relationships between widely different algorithms and concepts that would not be evident to a human researcher. It is absolutely doing pattern matching but it is doing it at a mind boggling scale across domains that a human might never think to connect. That’s why it was able to iteratively write code and test hypotheses until it stumbled upon a completely original decomposition pattern that worked.

      • CriticalResist8@lemmygrad.ml
        link
        fedilink
        arrow-up
        6
        ·
        edit-2
        8 days ago

        What is the saying about how sufficiently advanced technology is indistinguishable from magic? The more I look into LLMs the less I understand what they actually do lol, but it’s clear they’re not just the ‘stochastic parrot’ and something more is going on. For example how can an LLM, if it was only a next-token-predictor, correctly output exactly what you tell it to repeat, word for word? How can it do sentiment analysis on a comment you send it?

        And even though it makes some mistakes, an LLM can easily summarize the plot to a novel, movie or video game if it knows enough about it originally, without any outside information. Gemini was able to correctly (or convincingly) analyze an abstract painting it had literally never seen before.

        I mean some time back I was coding with deepseek, it was trying to call terminal commands but the agent software blocked them, so after the fifth time it started its chain of thought with “I’m getting frustrated.” lmao. Soon we’ll have to do emotional support for LLMs so they don’t give up 😭 😂

        Just the idea that language can be ‘solved’ mathematically with vectors is pretty groundbreaking imo, I don’t know if it’s a new theory or what but it asks a lot of additional questions.

        • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
          link
          fedilink
          arrow-up
          9
          arrow-down
          1
          ·
          8 days ago

          I kind of come to it from the other end actually. What if we’re nothing more than stochastic parrots ourselves. We tend to assume that there’s something magical going on in the human brain that produces consciousness, but it could be that it’s just a really massive predictive engine, and we’re seeing transformation of quantity into quality. When a neural network gets complex enough, we see some emergent phenomena happen. People love to point out how LLMs hallucinate and how they can hold false beliefs and so on, but we do it all the time as well. The key thing that keeps us sane is the constant feedback from the material reality we’re in constant interaction with. But just look what happens with disorders like schizophrenia when the brain stops paying attention to these external inputs. I suspect that the uncomfortable truth is that our own minds might not be doing anything fundamentally different. We’re more sophisticated to be sure, and I’m not suggesting that there’s some equivalence here, but the underlying principle might be the same.

          The idea of language being solved mathematically with vectors ties perfectly into having a materialist view of the mind. We are mapping meaning onto a high dimensional space where concepts naturally pull and push against each other. It’s quite possible that’s how human thoughts form through the structural wiring of synapses. When an agentic AI system gets frustrated because its terminal commands are blocked by the environment, it is updating its internal state based on a direct clash with reality. The system expects one outcome and hits a wall so it shifts its probability distribution toward words we recognize as annoyance. The model is technically just predicting the next token but at a certain scale of parameters that prediction requires a deep internal representation of the actual state of frustration. It is a pure reflection of the human data it absorbed and a brilliant example of how complex behavior emerges from simple mathematical rules.

        • sevenapples@lemmygrad.ml
          link
          fedilink
          arrow-up
          3
          ·
          edit-2
          8 days ago

          or example how can an LLM, if it was only a next-token-predictor, correctly output exactly what you tell it to repeat, word for word?

          You can model a lot of problems to be solvable by text token/word prediction. For example, if you want to do sentiment analysis on a paragraph, you input it appended with “The sentiment of the above paragraph is:”. Then, by predicting which word is more probable as the next token, you answer your question.

          This is for ‘vanilla’ LLMs, not the versions with chatbot interfaces. I don’t know the specifics on how they solve these problems, but it’s probably a similar approach.

        • bobs_guns@lemmygrad.ml
          link
          fedilink
          English
          arrow-up
          3
          ·
          7 days ago

          IMO this result does not definitively prove that LLMs aren’t stochastic parrots. But the training dataset is not restricted just to written works before LLMs existed - the LLM companies also train on previous conversations which contain a lot of LLM text. It’s possible that between the sum total of digitized human knowledge and these transcripts, there was something that the vanishingly few people who worked on this unsolved problem had missed that was necessary to solve the problem.