DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

misk ( @misk@sopuli.xyz ) · 6 days ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

IrritableOcelot ( @IrritableOcelot@beehaw.org ) · 6 days ago

Not somebody who knows a lot about this stuff, as I’m a bit of an AI Luddite, but I know just enough to answer this!

“Tokens” are essentially just a unit of work – instead of interacting directly with the user’s input, the model first “tokenizes” the user’s input, simplifying it down into a unit which the actual ML model can process more efficiently. The model then spits out a token or series of tokens as a response, which are then expanded back into text or whatever the output of the model is.

I think tokens are used because most models use them, and use them in a similar way, so they’re the lowest-level common unit of work where you can compare across devices and models.