Running a Local AI Chatbot

I got really distracted over the last week and didn't get around to writing up the post I was planning on, so instead you get this one! Turns out, if you have a reasonably powerful computer you can run those fancy LLMs on your own hardware. LLMs are all the rage right now, what with ChatGPT, Bard, Claude and all the others that seem to be getting released each week. So why not experiment a little bit with getting a large language model running offline shall we?

First things first, I should mention that this is all highly experimental and prone to breakage, but it does work at all, so I think that is still pretty cool in it's own right.

Before going any further, I want to bring up an important point that might change your choice on how exactly you set this up. As of the time of writing, Llama does not build correctly on a Windows system, but it will build under WSL without any issues. If you plan on running the Llama models instead of or beside the Alpaca models, you really should start with a Linux based setup here to save yourself having to double back like I did. That said, it really is much easier to get this running under Linux (WSL counts here) since you can very easily just install the dependencies through your package manager. I'll be using Ubuntu here since it is one of the more common distros (and is the default for WSL, so if you haven't adjusted anything this should work out of the box. Provided you already have WSL enabled. If not, you can check out my other post to get that set up).

So now without further ado, let's begin shall we? Go ahead and open up your terminal window so we can get the dependencies installed now.

sudo apt install python3 python3-venv curl build-essential
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/master/install.sh | bash
. ~/.bashrc
nvm install --lts

Breaking this down, we just ran the following:

Install python3 and venv support for it, plus curl and the required packages for building things from source
Download the installer script for NVM and run it
Source our .bashrc file to pick up the changes caused by the NVM installer
Install the current LTS release of NodeJS

Now that we have the required environments for both the LLM and the web frontend for it, we need to actually install the packages needed to run the LLMs on our machine. npm i -g dalai will get the job done here. Once that finishes installing everything, you can use it to both download models and run the frontend. Before loading the frontend though, we need to download at least one LLM. This takes a little time, and it also requires you to make a choice based on your own hardware configuration. The actual command to run is npx dalai XXX install YYY, where XXX is replaced by either llama or alpaca and YYY is replaced by one or more specific variants of the model that you want to use. I've included a table below to help you choose which models to download and run.

Model	Memory	Disk Space (Quantized)	Disk Space (Full)
7B	4GB	4.21GB	31.17GB
13B	8GB	8.14GB	60.21GB
30B	16GB	20.36GB	150.48GB
65B	32GB	40.88GB	432.64GB

The memory requirement listed is for system RAM, none of this works on the GPU yet unfortunately.

So for example, if you wanted to use Alpaca and you wish to download and try out the first three models, you would use npx dalai alpaca install 7B 13B 30B. Go ahead and grab whichever models you want now and wait for the lengthy download and build process to complete. This one takes a few minutes but once it finishes it will just dump you back to the terminal (hopefully without any error messages). After it is done, we can load the interface up by simply calling npx dalai serve and visiting http://localhost:3000 to get started with using your brand new chatbot. For real. That's all there is to it, nothing more needed to get the baseline chatbot working. You can of course train the models further, or use the dalai npm package to write a custom frontend for it, plus you can actually add your own prompt starters to the config folder located at ~/dalai/config/prompts. You just create a plain old txt file and fill it with anything, then when you restart dalai by pressing ctrl+c to stop the server then run it again with npx dalai serve, your prompt will be available in the dropdown menu for templates.