• Not somebody who knows a lot about this stuff, as I’m a bit of an AI Luddite, but I know just enough to answer this!

    “Tokens” are essentially just a unit of work – instead of interacting directly with the user’s input, the model first “tokenizes” the user’s input, simplifying it down into a unit which the actual ML model can process more efficiently. The model then spits out a token or series of tokens as a response, which are then expanded back into text or whatever the output of the model is.

    I think tokens are used because most models use them, and use them in a similar way, so they’re the lowest-level common unit of work where you can compare across devices and models.