How to run GPT4ALL in a private cloud
TLDR; check out https://github.com/drbh/gmessage for the latest code
Background
Large language models like ChatGPT and LlaMA are amazing technologies that are kinda like calculators for simple knowledge task like writing text or code.
I wont get into the weeds, but at the core, these technologies are using precise statistical analysis to generate text that is most likely to occur next.
The models can do this, because they have seen a large amount of text (way more text than any human can read) and they optimize their statistical guesses using the text as the source of truth.
LLM’s have already proven very helpful for a number of tasks, but because of their technical complexity there are very few providers of these models.
Fortunately, some really smart people have deciphered the open source models shared by large organizations and reduced their size and hardware requirements by drastic amounts. Resulting in the ability to run these models on everyday machines.
I especially want to point out the work done by ggerganov; llama.cpp which enables much of the low left mathematical operations, and Nomic AI’s GPT4ALL which provide a comprehensive layer to interact with many LLM models. I encourage the readers to check out these awesome projects!
In addition to running models locally, I’ve been seeking a way to run my own LLM in a personal private network and interact with it in my browser similar to ChatGPT.
Since GPT4ALL had just released their Golang bindings I thought it might be a fun project to build a small server and web app to serve this use case.
Later that day gmessage was born: https://github.com/drbh/gmessage
gmessage is yet another web interface for gpt4all with a couple features that I found useful like search history, model manager, themes and a topbar app.
While the application is still in it’s early days the app is reaching a point where it might be fun and useful to others, and maybe inspire some Golang or Svelte devs to come hack along on open source AI tech!
What’s the end result
Before scrolling down and dedicating 5 minutes to setting this up, what is the end result? Whadda I get?
🔍 Search
🎨 Themes
⚙️ Model Download Manager
There are other features currently implemented like text-to-speech and both a desktop and web version. As well as many more features being worked on now.
How to run right now
docker run -p 10999:10999 drbh/gmessage:v0.0.0
or if you just want to see the UI and test it out you can test it here
How to deploy in your own cloud
First clone the repo
git clone https://github.com/drbh/gmessages.git
Next, since we want to deploy this remotely we’ll need to use a cloud provider.
For simplicity and low cost we’ll use Fly.io to deploy our Dockerfile with only a few commands
fly launch
Great! That was easy, now we just wait for the process to complete and fly will share a custom url with you.
Note; you’re server is not secured by any authorization or authentication so anyone who has that link can use your LLM.
Additionally if you want to run it via docker you can use the following commands.
docker build -t gmessage .
docker run -p 10999:10999 gmessage
In production its important to secure you’re resources behind a auth service or currently I simply run my LLM within a person VPN so only my devices can access it.
If you’re interested or currently building large distributed networks with open source LLM’s please reach out!
Conclusion
It’s early days for open source LLMs and privacy preserving AIs. While the models are not perfect now, progress is happening fast and every day new advancements are made!
Its also early days for gmessage and we can use all the help we can get to build a fun and helpful chat interface, please think about contributing or trying out the project!