🤖 AI 📖 Tutorial February 28, 2026 · 8 min read

How to Run AI Locally with Ollama in 5 Minutes

No API keys. No monthly fees. No data leaving your computer. This step-by-step guide shows you how to install Ollama and run ChatGPT-class AI models on your own machine — in under 5 minutes.

⏱️ Total setup time: ~5 minutes

What is Ollama?

Ollama is a free, open-source tool that lets you run large language models (LLMs) like Llama 3, Mistral, and Gemma directly on your computer. Think of it as "ChatGPT on your laptop" — except it's free, private, and works offline.

No OpenAI account required. No internet needed after download. Your conversations never leave your machine.

What You'll Need

RAM: 8GB minimum (16GB recommended for larger models)
Storage: 5GB free space (models range from 2-40GB)
OS: macOS 11+, Windows 10+, or Linux
GPU: Optional but speeds things up (NVIDIA, AMD, or Apple Silicon)

💡 Tip

If you have an M1/M2/M3 Mac, you're in luck — Ollama runs extremely fast on Apple Silicon, even without a dedicated GPU.

STEP 1

Install Ollama (1 minute)

Go to ollama.com and download the installer for your OS. It's a one-click install:

# macOS / Linux — one command install:
curl -fsSL https://ollama.com/install.sh | sh

# Windows — download the .exe from ollama.com
# Double-click to install. That's it.

After installation, Ollama runs in the background automatically. No configuration needed.

STEP 2

Run Your First AI Model (2 minutes)

Open your terminal (or Command Prompt on Windows) and type:

ollama run llama3

That's it. Ollama will download Llama 3 (4.7GB) and start a chat. You'll see a prompt where you can type questions — just like ChatGPT.

# Example conversation:
>>> What is the capital of France?

The capital of France is Paris. It's the largest city
in France and serves as the country's political,
economic, and cultural center...

Type /bye to exit the chat.

STEP 3

Try Different Models (1 minute each)

Ollama supports 50+ models. Here are the best ones to try:

# Best for coding:
ollama run codellama

# Best for fast responses:
ollama run mistral

# Best for reasoning:
ollama run llama3:70b

# Google's lightweight model:
ollama run gemma2

Which Model Should You Use?

Model	Size	RAM Needed	Best For
Llama 3 8B	4.7 GB	8 GB	General chat, writing, Q&A
Mistral 7B	4.1 GB	8 GB	Fast responses, summarization
CodeLlama 13B	7.4 GB	16 GB	Code generation, debugging
Gemma 2 9B	5.4 GB	8 GB	Reasoning, analysis
Llama 3 70B	40 GB	48 GB	GPT-4 level quality

💡 Recommendation

Start with Llama 3 8B — it runs on any modern machine and handles 90% of what ChatGPT does. Upgrade to 70B only if you have 48GB+ RAM.

STEP 4

Add a ChatGPT-Style Web Interface (Optional, 1 minute)

Want a beautiful interface instead of the terminal? Install Open WebUI:

# Using Docker (recommended):
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser. You'll get a polished ChatGPT-like interface that connects to your local Ollama models.

STEP 5

Use Ollama with VS Code (For Developers)

Install the Continue extension in VS Code to get AI code completion powered by your local Ollama:

Open VS Code → Extensions → Search "Continue"
Install it and open the Continue panel
Select your Ollama model (CodeLlama recommended)
Start coding with AI suggestions — completely local and private

This replaces GitHub Copilot ($100/yr) with a free, private alternative.

Useful Ollama Commands

# List all downloaded models:
ollama list

# Delete a model to free space:
ollama rm llama3

# Pull a model without running it:
ollama pull mistral

# Show model info:
ollama show llama3

# Run with a system prompt:
ollama run llama3 "You are a Python expert"

Ollama vs ChatGPT: Quick Comparison

Feature	Ollama (Free)	ChatGPT Plus ($20/mo)
Cost	$0	$240/yr
Privacy	100% local	Data sent to OpenAI
Offline	Yes	No
Rate limits	None	40 messages/3 hours
Web search	No	Yes
Image generation	No	Yes (DALL-E)
Quality (GPT-4 level)	70B model only	Yes

For the full deep-dive comparison, read our Ollama vs ChatGPT 2026 review.

⚠️ When to Use ChatGPT Instead

Ollama doesn't have web search, image generation, or plugins. If you need real-time information or DALL-E, keep a free ChatGPT account as a backup. But for 90% of daily tasks (coding, writing, brainstorming, analysis), Ollama is more than enough.

Troubleshooting

"Model is too slow"

Try a smaller model: ollama run mistral (4.1GB) instead of Llama 3 70B. Close other apps to free up RAM.

"Out of memory"

Your RAM is full. Use a smaller model or add more RAM. 8GB models work best for 16GB machines.

"Connection refused"

Make sure Ollama is running: ollama serve in a separate terminal window.

What's Next?

Now that you have local AI running, here's how to level up:

Try different models — each has strengths for different tasks
Set up Open WebUI — for a polished chat interface with conversation history
Add AI to VS Code — free Copilot alternative for coding
Explore RAG — chat with your own documents using LangChain or LlamaIndex

You've just saved yourself $240/yr in ChatGPT fees — and gained complete privacy over your AI conversations. Welcome to local AI. 🎉

Build Your Free AI Stack

Ollama is just the beginning. Discover more free AI tools in our complete guide.

See All Free AI Tools →