DeepSeek's free 685B-parameter AI model runs at 20 tokens/second on Apple's Mac Studio, outperforming Claude Sonnet while using just 200 watts, challenging OpenAI's cloud-dependent business model.
Not somebody who knows a lot about this stuff, as I’m a bit of an AI Luddite, but I know just enough to answer this!
“Tokens” are essentially just a unit of work – instead of interacting directly with the user’s input, the model first “tokenizes” the user’s input, simplifying it down into a unit which the actual ML model can process more efficiently. The model then spits out a token or series of tokens as a response, which are then expanded back into text or whatever the output of the model is.
I think tokens are used because most models use them, and use them in a similar way, so they’re the lowest-level common unit of work where you can compare across devices and models.
Not somebody who knows a lot about this stuff, as I’m a bit of an AI Luddite, but I know just enough to answer this!
“Tokens” are essentially just a unit of work – instead of interacting directly with the user’s input, the model first “tokenizes” the user’s input, simplifying it down into a unit which the actual ML model can process more efficiently. The model then spits out a token or series of tokens as a response, which are then expanded back into text or whatever the output of the model is.
I think tokens are used because most models use them, and use them in a similar way, so they’re the lowest-level common unit of work where you can compare across devices and models.