• One thing I’d push back on in the article is:

    That cost-per-user doesn’t decrease as you add more customers. You need more servers. More GPUs.

    This is assuming constant use, which is not the case. If I have a server handling LLM prompt requests, and for illustrative purposes each request uses 100% of the single discrete GPU in it, and I only have 1 customer, but that one customer only uses it 5% of the day (which would actually be pretty high in real terms), I can still add additional customers without needing to buy additional servers. The question is whether the given revenue of a single server outweighs its cost to run.

    And when it comes to training, that is an upfront cost, that you could (if you get a model to where you want it) stop having to pay whenever you want. I’m pretty surprised they haven’t been really leaning into training models for medical diagnoses, because once you have a model that can e.g. spot a type of tumor with n% accuracy beyond a human, you don’t really have to refine it further if you don’t want to (after all, it’s not like the humans can choose to do it better themselves at that point, like they can with writing prompts).