• @abhibeckert@beehaw.org
    link
    fedilink
    12
    edit-2
    8 months ago

    ChatGPT 4 is estimated to use 700GB of “High Bandwidth Memory”.

    … which will set you back about half a million dollars at current prices (which are high, because the manufacturers can’t keep up with demand). Or, you could just pay 20 bucks a month.

      • conciselyverbose
        link
        fedilink
        98 months ago

        If it’s actually High Bandwidth Memory, it’s the VRAM they use for some video cards/SoCs.

        It might be mostly the same components, but the high bandwidth part is important and harder to do. They get the much higher throughput by physically stacking the chips on top of each other directly on the chip. The much lower distance signals have to travel (combined with a lot of pins to send signal through) do more than you can do with traditional RAM.

        • @GiveMemes@jlai.lu
          link
          fedilink
          38 months ago

          There’s a company making analog chips that do the matrix calculations at a (15 or) 60x (I forget which) more efficient rate than moden chips (by multiplying voltages I believe). Even though one is only about 1/3 the processing power of a modern gpu, stack enough together and you’re cooking. The matrix multiplication aspect is what we’re using the VRAM for right?

          • conciselyverbose
            link
            fedilink
            38 months ago

            The actual models telling them what to multiply are, to my knowledge.

            VRAM isn’t the low level “working” memory. You still have to pull structures from memory and into actual use. If you’re working on pen and paper, a bookshelf might be system storage and your desk might be RAM/VRAM, but you still need to copy the numbers from your desk onto the piece of paper you’re working on. That’s lower level cache, registers, the tensor cores, etc.

            If the chip you’re discussing is a better calculator, that’s useful, but you still need the big desk to hold the huge amount of information you need to reference at any given time.

            My brain is mush for some reason today, so that might not make sense, but better matrix operations shouldn’t remove the need to have access to a huge model.

            • @GiveMemes@jlai.lu
              link
              fedilink
              18 months ago

              Thanks for the informative reply! Looks like I need to brush up on my hardware knowledge lol

        • lol3droflxp
          link
          fedilink
          18 months ago

          I get that this is expensive. However, it should also work with RAM if you accept slower speeds I guess. The question is of course if it’s still usable then.

          • @averyminya@beehaw.org
            link
            fedilink
            48 months ago

            Most current locally hosted software has some option to offload to RAM, CPU, and disk. VRAM is fastest, but RAM and CPU offloading lets you cut down to less than 4GB VRAM for certain applications, at plenty reasonable speed.

          • @abhibeckert@beehaw.org
            link
            fedilink
            1
            edit-2
            8 months ago

            GPT-4 is already kinda slow - it works best as a “conversational” tool where you ask follow up questions and clarify things that have already been said. That’s painful when you have to wait 10 seconds for a response. I couldn’t imagine it being useful if it was minutes.

      • @abhibeckert@beehaw.org
        link
        fedilink
        2
        edit-2
        8 months ago

        To put some numbers on it - RAM runs at tens of gigabytes per second (bytes, not bits). High Bandwidth Memory runs at several hundred or sometimes terabytes per second (OpenAI is likely using the latter, and that memory isn’t just expensive it’s also supply constrained, so the prices are astronomically high right now).

        You can buy HBM, and you can use it as your main system RAM, but it’s painfully expensive. The actual amount of bandwidth also scales linearly with with the amount of memory you buy as well. So a 500GB is 10x faster than 50GB - because it write to all of the chips simultaneously (and then read from all of them when you access the data back).

        It’s pretty standard on high end GPUs these days. Apple also uses it on all their computers (if you buy a Mac with 64GB of RAM, it’ll run at 800MB/s - which isn’t quite as fast as a high end GPU but it’s close and it is HBM). It’s part of why Macs are so expensive (and also why the cheaper ones have very little RAM).

    • @DavidGarcia@feddit.nl
      link
      fedilink
      38 months ago

      I highly doubt that, there are comparable models that are way smaller than that. No way they would waste that much money.

      • @abhibeckert@beehaw.org
        link
        fedilink
        3
        edit-2
        8 months ago

        There are comparable models to GPT 3.5 “Turbo”, which is faster and 30x cheaper than GPT 4 (if you pay OpenAI’s regular API prices).

        I suspect that’s because GPT-4 needs 30x more memory than 3.5.

        I’m not aware of any other model that performs as well as GPT-4. In fact I suspect even 3.5 Turbo is the second best model.

  • JackGreenEarth
    link
    fedilink
    128 months ago

    I’d like this offline. Why are all the good chatbots proprietary online-only software?

    • @Amaltheamannen@lemmy.ml
      link
      fedilink
      178 months ago

      Check out /r/localllama. Preferably you need a Nvidia you with >= 24 GB VRAM but it also works with a cpu and loads of normal RAM, if you can wait a minute or two for a lengthy answer. Loads of models to choose from, many with no censorship at all. Won’t be as good as chatgptv4, but many are close to gpt3.

    • @CanadaPlus
      link
      38 months ago

      By design, because they don’t want some basement guy launching skynet.

      I have to agree, I trust a handful of big shops, some of which could actually be killed by ethics people against the wishes of investors, far more than the entire internet. It still might not be enough, but there is no applying breaks whatsoever if anyone can take the next step.

      • The Doctor
        link
        fedilink
        English
        38 months ago

        They don’t want somebody toppling an oligarch, you mean.

        • @CanadaPlus
          link
          38 months ago

          Which oligarch? I mean, yes there’s definitely a degree of trusting “the right sort” there, but capitalism isn’t a team sport and they’re not a team. Honestly one of them might launch skynet anyway, if that’s how the technology grows, but a few people are theoretically able to agree not to do something, while legions never can.

          So do you think it should all be open sourced, then? And if so, are you a skeptic of “AI alignment”, or even “AI safety”?

          • The Doctor
            link
            fedilink
            English
            18 months ago

            Any of them. They don’t necessarily like each other or team up, but they are smart enough to understand that an upstart toppling one is a potential threat to all of them. All things being equal, keep the game board the way it is, without any unwelcome surprises coming in to kick things over.

            I do think it should be open sourced, just so that those of us who aren’t oligarchs have a chance to at least tread water a little longer. Those of us who aren’t wealthy need all the help we can get during a time where our inherent disposability has been writ large as a warning.

            Am I a skeptic of AI alignment? No. What I’ve observed is that AI systems tend to reflect their creators’ goals and ethics quite well. Problem is, their goals and ethics are pretty much the same as the human race’s for the last few centuries. Built in racism? No shit, it would have been strange if the construct hadn’t acted that way.

            Am I a skeptic of AI safety? Yes, I think the idea is complete bullshit. AI reflects the goals, prejudices, and ethics of its creators quite well, which if you look at human history is anything but safe and sound. To put it another way, if you’ve got the money and the chops to build an AI system, you’re going to build it to make sure you don’t lose what you have already and see if you can get hold of more of what you have (at first to recoup the cost, then just to get hold of more wealth). If you’re the military you’re going to want to make sure you’re on equal footing with your enemies, both explicit and implicit at the very least (probably half of ‘warfighting superiority’ is propaganda; if you look at the breakdowns it’s closer to equal footing with the usual margin of error).

            • @CanadaPlus
              link
              1
              edit-2
              8 months ago

              I should say that I worry a lot about some powerful person getting an obedient AI. I’d say it’s been an animating force in my life, even, although the exact way the situation has gone in this decade makes me a bit less worried. A paperclip optimiser seems like the most likely outcome right now if AI takes off, which is somewhat better.

              They don’t necessarily like each other or team up, but they are smart enough to understand that an upstart toppling one is a potential threat to all of them.

              Most of them were upstarts, though. They all come from privileged backgrounds of some kind, but didn’t start as billionares. Rather, they invested in the right thing at the right time and were carried to the top. If we were talking about a more feudal-esque system like they have in Russia, you’d be right, and that’s why Russia sucks economically and militarily, but (for now) competition meaningfully exists in the West.

              I do think it should be open sourced, just so that those of us who aren’t oligarchs have a chance to at least tread water a little longer. Those of us who aren’t wealthy need all the help we can get during a time where our inherent disposability has been writ large as a warning.

              How would that work? I have trouble seeing a way the average worker would benefit from having the ability to run an LLM offline a few years ahead of schedule. Hackers like me and probably you would a bit, but then again I’m not going to personally compete with OpenAI for reasons that have nothing to do with the software.

              What I’ve observed is that AI systems tend to reflect their creators’ goals and ethics quite well.

              That surprises me. Most of “data science”, as far as a I can tell, is struggling to get a neural net to learn what you want it to, either by trail-and-error or by inventing new training schemes. Even getting ChatGPT to only answer the questions it’s supposed to has proven elusive.

              They turn out racist, because we’re racist and so the training data is racist. Creators rarely want that, because it can bring legal trouble and certainly bad press. Sometimes they fail in ways that have nothing to do with us, like the mentioned getting ChatGPT to pretend to be an evil version of itself, which it will do because that’s a likely sequence of tokens and doesn’t look enough like something bad to a less capable system.

              I’d actually agree that AI safety is on shaky foundations sometimes, but more because we don’t know what we do want our machines to do, and more than anything I’d like the two camps to stop undercutting each other.

    • DarkThoughts
      link
      fedilink
      38 months ago

      I think KoboldAI runs locally, but like many current AI tools it’s a pain in the ass to install, especially if you’re on Linux, especially if you’re using AMD GPUs. I wonder if we’ll see some specialized AI related cards to slot into our pci ports or something. Not a whole lot of necessary options to fill them nowadays anyway. I’d also be interested in local AI voice changers too. Maybe even packaged like a Roland VT-4 voice transformer that sits between your mic & whatever audio other audio interface you might be using, where you just throw the trained voice models onto the device and it does all the real time computing for you.

      I’m sure things get more refined over the next years though.

    • @DavidGarcia@feddit.nl
      link
      fedilink
      18 months ago

      It won’t take long until cheap special purpose chips hit the market. Then you’ll have your offline model. There are already models that run on consumer hardware, but it’s for enthusiasts at the moment and not the same quality (but almost). But if you want to spend thousands on a PC that can handle the largest models, go ahead.

  • AutoTL;DRB
    link
    fedilink
    English
    38 months ago

    🤖 I’m a bot that provides automatic summaries for articles:

    Click here to see the summary

    According to The Decoder, leaked screenshots and videos show a custom chatbot creator with many of the same features already available in ChatGPT using GPT-4, like web browsing and data analysis.

    This morning, SEO tools developer Tibor Blaho shared a video of the UI for the feature in action, showing a GPT Builder option that lets users enter a prompt — an example reads “make a creative who helps generate visuals for new products.” — to create a chatbot.

    Users can also upload files for a bespoke knowledgebase and toggle capabilities like web browsing and image generation.

    Choi shared a screenshot that breaks down the Team plan’s features, like unlimited high-speed GPT-4 and four times longer context.

    Recent ChatGPT beta features include live web results, image generation, and voice chat.

    OpenAI says it will preview new tools at the developer conference on Monday, so we probably won’t have to wait long to find out if these rumors are accurate.


    Saved 55% of original text.