How to Run ChatGPT-like LLMs Locally on Your Computer in 3 Easy Steps

An image representing the concept of running advanced AI models locally on a computer, with a visual metaphor of a powerful AI brain connected to a laptop. The laptop screen displays lines of code and a neural network pattern, symbolizing the installation and running of AI models like ChatGPT. The AI brain should be futuristic and glowing, representing innovation and advanced technology. The background is sleek and modern, with digital elements and subtle hints of a home office environment, making it relatable to personal computer users.

Running Large Language Models (LLMs) similar to ChatGPT locally on your computer and without Internet connection is now more straightforward, thanks to llamafile, a tool developed by Justine Tunney of the Mozilla Internet Ecosystem (MIECO) and Mozilla's innovation group. Llamafile is a game-changer in the world of LLMs, enabling you to run these models locally with ease.

In this post, I’ll show you how to run locally on your Mac LLaVA 1.5, an open-source multimodal LLM capable of handling both text and image inputs, or Mistral 7B, an open-source LLM known for its advanced natural language processing and efficient text generation, leveraging llamafile.

Llamafile transforms LLM weights into executable binaries. This technology essentially packages both the model weights and the necessary code required to run an LLM into a single, multi-gigabyte file. This file includes everything needed to run the model, and in some cases, it also contains a full local server with a web UI for interaction. This approach simplifies the process of distributing and running LLMs on multiple operating systems and hardware architectures, thanks to its compilation using Cosmopolitan Libc.

This innovative approach simplifies the distribution and execution of LLMs, making it much more accessible for users to run these models locally on their own computers.

LLaVA 1.5 is an open-source large multimodal model that supports text and image inputs, similar to GPT-4 Vision. It is trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Mistral 7B is an open-source large language model with 7.3 billion parameters developed by Mistral AI. It excels in generating coherent text and performing various NLP tasks. Its unique sliding window attention mechanism allows for faster inference and handling of longer text sequences. Notable for its fine-tuning capabilities, Mistral 7B can be adapted to specific tasks, and it has shown impressive performance in benchmarks, outperforming many similar models.

Here’s how to start using LLaVA 1.5 or Mistral 7B on your own computer leveraging llamafile. Don’t get intimidated, the setup process is very straightforward!

Open Terminal: Before beginning, you need to open the Terminal application on your computer. On a Mac, you can find it in the Utilities folder within the Applications folder, or you can use Spotlight (Cmd + Space) to search for "Terminal."
Download the LLaVA 1.5 llamafile: Pick your preferred option to download the llamafile for LLaVA 1.5 (around 4.26GB):
1. Go to Justine's repository of LLaVA 1.5 on Hugging Face and click download or just click here and the download should start directly.
2. Use this command in the Terminal:
```
curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4-server.llamafile
```
Make the Binary Executable: Once downloaded, use the Terminal to navigate to the folder where the file was downloaded, e.g. Downloads, and make the binary executable:
```
cd ~/Downloads
chmod 755 llava-v1.5-7b-q4-server.llamafile
```
For Windows, simply add .exe at the end of the file name.

Every time you want to use LLaVA on your compute follow these steps:

Run the Executable: Start the web server by executing the binary1:
```
./llava-v1.5-7b-q4-server.llamafile
```
This command will launch a web server on port 8080.
Access the Web UI: To start using the model, open your web browser and navigate to http://127.0.0.1:8080/ (or click the link to open directly).

Once you're done using the LLaVA 1.5 model, you can terminate the process. To do this, return to the Terminal where the server is running. Simply press Ctrl + C. This key combination sends an interrupt signal to the running server, effectively stopping it.

Open Terminal
Download the Mistral 7B llamafile: Pick your preferred option to download the llamafile for Mistral 7B (around 4.37 GB):
1. Go to Justine's repository of Mistral 7B on Hugging Face and click download or just click here and the download should start directly.
2. Use this command in the Terminal:
```
curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
```
Make the Binary Executable: Once downloaded, use the Terminal to navigate to the folder where the file was downloaded, e.g. Downloads, and make the binary executable:
```
cd ~/Downloads
chmod 755 mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
```
For Windows, simply add .exe at the end of the file name.

Every time you want to use LLaVA on your compute follow these steps:

Run the Executable: Start the web server by executing the binary:
```
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
```
This command will launch a web server on port 8080.
Access the Web UI: To start using the model, open your web browser and navigate to http://127.0.0.1:8080/ (or click the link to open directly).

Once you're done using the Mistral 7B model, you can terminate the process. To do this, return to the Terminal where the server is running. Simply press Ctrl + C. This key combination sends an interrupt signal to the running server, effectively stopping it.

The introduction of llamafile significantly simplifies the deployment and use of advanced LLMs like LLaVA 1.5 or Mistral 7B for personal, development, or research purposes. This tool opens up new possibilities in the realm of AI and machine learning, making it more accessible for a wider range of users.

How to Run ChatGPT-like LLMs Locally on Your Computer in 3 Easy Steps

Discussion about this post

Ready for more?