Run a Large Language Model (LLM) locally with Ollama

Introduction

Want to experiment with Large Language Models (LLMs) without relying on cloud services? With Ollama you can run powerful open-source language models directly on your own computer. This not only guarantees your privacy, but also gives you full control over your data and the models you use. In this article we explain step by step how to install Ollama, use an LLM locally, and how to integrate it with popular developer tools such as LangChain and Visual Studio Code.

What is Ollama?

Ollama is a tool that greatly simplifies the process of downloading, setting up, and running LLMs, such as Llama 3. It packages model weights and configurations into a single file, similar to how Docker works for applications. This makes it easy for both developers and enthusiasts to get started with LLMs.

System Requirements

Before you begin, it's important to check that your system meets the minimum requirements. To run smaller models (around 7 billion parameters), the following is recommended:

RAM: At least 8 GB, but 16 GB is recommended for better performance.
Storage: Sufficient free disk space for the models, which can be several gigabytes in size.

For larger models, you'll need significantly more RAM and possibly a powerful graphics card (GPU).

Installation

Ollama is available for Windows, macOS and Linux.

Download Ollama: Go to the official Ollama website at https://ollama.com/ and download the installer for your operating system.
Install Ollama: Run the downloaded file and follow the installation instructions. After installation, Ollama runs in the background.

Downloading and Running a Model

Open the terminal:
- Windows: Open the Start menu, type cmd or Terminal, and press Enter.
- macOS: Open the Terminal app from the Utilities folder.
- Linux: Open your preferred terminal emulator.
Download a model: Choose a model from the Ollama library (found at https://ollama.com/library). A popular and powerful model to start with is Llama 3.1. Download it with the following command:

ollama pull llama3.1

This may take a while, depending on the size of the model and your internet speed.

Run the model: Once the download is complete, you can use the model directly in your terminal with the following command:

ollama run llama3.1

Interaction and Useful Commands

After running the run command, you can immediately start asking questions or giving commands to the model. You essentially chat with the LLM in your terminal.

ollama list: Shows a list of all models you have downloaded locally.
ollama rm <model-name>: Removes a specific model to free up disk space.
/bye: Closes the current chat session with a model.

Integration with LangChain

LangChain is a popular framework for building applications with LLMs. You can easily integrate your locally running Ollama model into both Python and JavaScript/TypeScript projects.

Python

Install the Python package:

pip install langchain-ollama

Use in your code:

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3.1")
response = llm.invoke("What is the capital of the Netherlands?")
print(response.content)

Node.js (JavaScript/TypeScript)

LangChain is also available for JavaScript/TypeScript, ideal for back-ends (Node.js) or front-end frameworks (such as Vue.js, React or Svelte).

Install via npm or yarn:

# For npm
npm install @langchain/ollama

# For yarn
yarn add @langchain/ollama

Use in your code:

import { ChatOllama } from "@langchain/ollama";

async function main() {
  const llm = new ChatOllama({ model: "llama3.1" });
  const response = await llm.invoke("What is the capital of the Netherlands?");
  console.log(response.content);
}

main();

This way, you can seamlessly switch between cloud providers and your own local Ollama instance in both your Python backend and your JavaScript stack.

Integration with Visual Studio Code

You can also use your locally running LLM as a complement to your development environment in Visual Studio Code. This gives you the ability to generate code and ask questions using your own, locally hosted model.

Ensure Ollama is running: The Ollama process must be active in the background.
Install a compatible extension: Search the VS Code Marketplace for an extension that offers Ollama integration, such as Continue.
Configure the extension: Follow the extension's instructions to set Ollama as the provider and select the model you want to use (for example llama3.1).

Now you can call your local model for code suggestions and other programming tasks in the extension's chat interface, entirely within your own environment.

Conclusion

Ollama makes running LLMs locally accessible to a wide audience. Whether you're a developer looking to build an AI application with LangChain, or want to improve your programming workflow in VS Code, Ollama lets you get started quickly and easily. With an active community and ongoing development, Ollama is an excellent choice for anyone looking to explore the world of local AI.