Very large amounts of gaming gpus vs AI gpus

TheMightyCat@ani.social · 18 days ago

And i officially ended my subscription a long time ago

TheMightyCat@ani.social · edit-2 20 days ago

I currently have a PinePhonePro and I’m happy that its the real linux experience in phone form factor, I’ve tried searching but couldn’t find a conclusive awnser.

Is sailfish linux in the sense that i can just change compile target and any linux program will work like on the PPP or is it “linux” in the sense that android is linux?

TheMightyCat@ani.social · 1 month ago

I think everyone has once thought of the idea of taking an sbc and a touchscreen and making a linux phone, cool to actually see one!

TheMightyCat@ani.social · 1 month ago

It’s almost like client side anti cheat doesn’t work and if proper server side anti cheat is made it wouldn’t matter what platform the client is on.

TheMightyCat@ani.social · 2 months ago

Dirth rates are overrated anyway

TheMightyCat@ani.social · edit-2 2 months ago

i found the reason, somehow setting --max_num_seqs 1 makes vllm way more efficient.

Not sure exactly what it does but i think its because vllm batches requests and the api was using with exlamav3 doesn’t

Now im doing 100k with vllm too

(Worker_TP0_EP0 pid=99695) INFO 11-03 17:34:00 [gpu_worker.py:298] Available KV cache memory: 4.73 GiB
(Worker_TP1_EP1 pid=99696) INFO 11-03 17:34:00 [gpu_worker.py:298] Available KV cache memory: 4.73 GiB
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1087] GPU KV cache size: 103,264 tokens
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1091] Maximum concurrency for 100,000 tokens per request: 1.03x
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1087] GPU KV cache size: 103,328 tokens
(EngineCore_DP0 pid=99577) INFO 11-03 17:34:00 [kv_cache_utils.py:1091] Maximum concurrency for 100,000 tokens per request: 1.03x

I would say exlamav3 is still slightly more efficient but this explains the huge discrepancy, exlamav3 also allows setting GB per gpu which allows me to get a view more GB then vllm which spreads it evenly because a bunch of memory on gpu 0 is used for other stuff

As for the T/s its about the same, in the 80-100 range, this is what im getting with vllm:

(APIServer pid=99454) INFO 11-03 17:36:31 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:32 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:32 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:34 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:34 [loggers.py:127] Engine 000: Avg prompt throughput: 461.4 tokens/s, Avg generation throughput: 17.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.6%, Prefix cache hit rate: 66.9%
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:35 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:36 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:43 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=99454) INFO:     127.0.0.1:32968 - "POST /tokenize HTTP/1.1" 200 OK
(APIServer pid=99454) INFO 11-03 17:36:44 [qwen3coder_tool_parser.py:76] vLLM Successfully import tool parser Qwen3CoderToolParser !
(APIServer pid=99454) INFO 11-03 17:36:44 [loggers.py:127] Engine 000: Avg prompt throughput: 1684.4 tokens/s, Avg generation throughput: 96.7 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.4%, Prefix cache hit rate: 83.4%

Now that i have found this out ive switched back to vllm because the API i’m using with exlamav3 doesn’t support qwen 3 tools yet :(

TheMightyCat@ani.social · 2 months ago

I really don’t want to shill for nvidia but Isn’t the kernel driver open source now?

TheMightyCat@ani.social · 2 months ago

I would suggest trying exllamav3 once, i have no idea what kind of black magic they use but its very memory efficient.

i can’t load Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 with 16K using vllm

but using exlamav3 i can SOMEHOW load ArtusDev/Qwen_Qwen3-Coder-30B-A3B-Instruct-EXL3:8.0bpw_H8 at its full context of 262.144 with still 2GiB to spare.

I really feel like this is too good to be true and im doing something wrong but it just works so i don’t know.

TheMightyCat@ani.social · edit-2 2 months ago

If I were to guess the biggest peformance impact is if your driver doesn’t support newer types like FP4 and FP8 while your hardware does, but I am not sure.

TheMightyCat@ani.social · edit-2 3 months ago

Forget that the real evil is that the first character of Int or Ganz in this case is capitalized

TheMightyCat@ani.social · 3 months ago

Why does it matter what is replaced?

I would say if it frees up a human from having to do it that it’s good news.

TheMightyCat@ani.social · 3 months ago

“must benefit everyone, not just a handful of billionaires”

It’s hard to disagree with that statement but all this upheaval about “killing jobs” seems really pointless to me.

A manual farm has far more employees then one that uses modern machinery, is the economy in decline if the farmer replaces human labor with machines?

TheMightyCat@ani.social · edit-2 3 months ago

The hoodied master hacker hacking the mainframe using HTML…

I really doubt the authenticity of reporting if they use such “tech looking” images

TheMightyCat@ani.social · 3 months ago

Good stuff, Ukraine has proven again that having a nuclear arsenal is really the only way to ensure a country’s security when faced wirh a hostile nuclear armed neighbor.

TheMightyCat@ani.social · edit-2 3 months ago

Yeah I tried taking a screenshot first but as you can see the screenshot doesn’t include the panel.

This is a PinePhonePro, the experience is surprisingly usable except for one major thing, the battery life is terrible.

Wether that is specific to the PPP, PostmarketOS or if my phone is defective but after like an hour of use the battery is already drained and even in standby the battery won’t last more then 6 hours or so.

There are some minor things here and there but that is the main reason i can’t use it as a daily driver.

TheMightyCat@ani.social · edit-2 3 months ago

Fun thing about linux phones is that they can all look different without vendor lock in

(Yes i aplogize for the scuffed picture put pressing the screenshot button closes the menu)

TheMightyCat@ani.social · 3 months ago

Haven’t used shotcut or openshot but can recommend kdenlive, it has alot of features but I find it a bit clunky sometimes to work with.

For me when I was working on a large project it crashed once and the autorecover worked, although I still manually save since I’m paranoid.

TheMightyCat@ani.social · 3 months ago

Ive always wondered how it is ever planned to enforce it, is a police team going to bust down my door:

PUT THE UNBACKDOORED LIBSSL DOWN!

Sure companies like meta will comply instantly but everyone that doesn’t use big tech (including criminals) can just continue doing whatever? Why would criminals the one the law claims to stop use backdoored clients?

TheMightyCat@ani.social · 3 months ago

I’m selfhosting Forgejo and i don’t really see the benefit of migrating to a container, i can easily install and update it via the package manager so what benefit does containerization give?

TheMightyCat@ani.social · 3 months ago

Russian MiG-31 aircraft.

Do they not know how to do a simple google search? that’s a MiG-29

This is MiG-31 when viewed from below

TheMightyCat@ani.social · 5 months ago

Very large amounts of gaming gpus vs AI gpus

TheMightyCat@ani.social · 5 months ago

Very large amounts of gaming gpus vs AI gpus