• Wow it got worse for me. Maybe through last update? Is this probably related to he application? Now I get 12 t/s on my CPU and switching to GPU it’s only 1.5 t/s. Something is fishy. With Nous hermes 2 Mistral 7B DPO with q4 I get 33 t/s (I believe it was up to 44 before).

    Now I’m curious if this will happen with a different application too, but I have nothing else than GPT4All installed.

    • Unfortunately I can’t even test Llama 3.1 in Alpaca because it refuses to download, showing some error message with the important bits cut off.

      That said, the Alpaca download interface seems much more robust, allowing me to select a model and then select any version of it for download, not just apparently picking whatever version it thinks I should use. That’s an improvement for sure. On GPT4All I basically have to download the model manually if I want one that’s not the default, and when I do that there’s a decent chance it doesn’t run on GPU.

      However, GPT4All allows me to plainly see how I can edit the system prompt and many other parameters the model is run with, and even configure multiple sets of parameters for the same model. That allows me to effectively pre-configure a model in much more creative ways, such as programming it to be a specific character with a specific background and mindset. I can get the Mistral model from earlier to act like anything from a very curt and emotionally neutral virtual intelligence named Jarvis to a grumpy fantasy monster whose behavior is transcribed by a narrator. GPT4All can even present an API endpoint to localhost for other programs to use.

      Alpaca seems to have some degree of model customization, but I can’t tell how well it compares, probably because I’m not familiar with using ollama and I don’t feel like tinkering with it since it doesn’t want to use my GPU. The one thing I can see that’s better in it is the use of multiple models at the same time; right now GPT4All will unload one model before it loads another.

      • That’s quite unfortunate. Alpaca needs to support those explicitly to work with the new 3.1 128k models; GPT4All was not compatible with it before update either. There was a bug in some library they was using and needed a patch. So maybe that’s why you can’t use the new Llama 3.1 in Alpaca. (Edit: Never mind. On the webpage they advertise and talk about 3.1 being working, so a wrong guess by me probably.)

        Actually that sounds very useful and I missed that option, to be able to select from a set of related models. One thing that GPT4All can also do is, analyzing text files and then using the data to ask questions about it. It will also output the exact lines of the file in relation to the answer. I only experimented a little bit with this, but sounds useful too. The team also experiments and works on a web search using, but no idea how that would work with a local model if ever.