How to Run AI Locally with Ollama in 5 Minutes
No API keys. No monthly fees. No data leaving your computer. This step-by-step guide shows you how to install Ollama and run ChatGPT-class AI models on your own machine β in under 5 minutes.
What is Ollama?
Ollama is a free, open-source tool that lets you run large language models (LLMs) like Llama 3, Mistral, and Gemma directly on your computer. Think of it as "ChatGPT on your laptop" β except it's free, private, and works offline.
No OpenAI account required. No internet needed after download. Your conversations never leave your machine.
What You'll Need
- RAM: 8GB minimum (16GB recommended for larger models)
- Storage: 5GB free space (models range from 2-40GB)
- OS: macOS 11+, Windows 10+, or Linux
- GPU: Optional but speeds things up (NVIDIA, AMD, or Apple Silicon)
If you have an M1/M2/M3 Mac, you're in luck β Ollama runs extremely fast on Apple Silicon, even without a dedicated GPU.
Install Ollama (1 minute)
Go to ollama.com and download the installer for your OS. It's a one-click install:
curl -fsSL https://ollama.com/install.sh | sh
# Windows β download the .exe from ollama.com
# Double-click to install. That's it.
After installation, Ollama runs in the background automatically. No configuration needed.
Run Your First AI Model (2 minutes)
Open your terminal (or Command Prompt on Windows) and type:
That's it. Ollama will download Llama 3 (4.7GB) and start a chat. You'll see a prompt where you can type questions β just like ChatGPT.
>>> What is the capital of France?
The capital of France is Paris. It's the largest city
in France and serves as the country's political,
economic, and cultural center...
Type /bye to exit the chat.
Try Different Models (1 minute each)
Ollama supports 50+ models. Here are the best ones to try:
ollama run codellama
# Best for fast responses:
ollama run mistral
# Best for reasoning:
ollama run llama3:70b
# Google's lightweight model:
ollama run gemma2
Which Model Should You Use?
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| Llama 3 8B | 4.7 GB | 8 GB | General chat, writing, Q&A |
| Mistral 7B | 4.1 GB | 8 GB | Fast responses, summarization |
| CodeLlama 13B | 7.4 GB | 16 GB | Code generation, debugging |
| Gemma 2 9B | 5.4 GB | 8 GB | Reasoning, analysis |
| Llama 3 70B | 40 GB | 48 GB | GPT-4 level quality |
Start with Llama 3 8B β it runs on any modern machine and handles 90% of what ChatGPT does. Upgrade to 70B only if you have 48GB+ RAM.
Add a ChatGPT-Style Web Interface (Optional, 1 minute)
Want a beautiful interface instead of the terminal? Install Open WebUI:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000 in your browser. You'll get a polished ChatGPT-like
interface that connects to your local Ollama models.
Use Ollama with VS Code (For Developers)
Install the Continue extension in VS Code to get AI code completion powered by your local Ollama:
- Open VS Code β Extensions β Search "Continue"
- Install it and open the Continue panel
- Select your Ollama model (CodeLlama recommended)
- Start coding with AI suggestions β completely local and private
This replaces GitHub Copilot ($100/yr) with a free, private alternative.
Useful Ollama Commands
ollama list
# Delete a model to free space:
ollama rm llama3
# Pull a model without running it:
ollama pull mistral
# Show model info:
ollama show llama3
# Run with a system prompt:
ollama run llama3 "You are a Python expert"
Ollama vs ChatGPT: Quick Comparison
| Feature | Ollama (Free) | ChatGPT Plus ($20/mo) |
|---|---|---|
| Cost | $0 | $240/yr |
| Privacy | 100% local | Data sent to OpenAI |
| Offline | Yes | No |
| Rate limits | None | 40 messages/3 hours |
| Web search | No | Yes |
| Image generation | No | Yes (DALL-E) |
| Quality (GPT-4 level) | 70B model only | Yes |
For the full deep-dive comparison, read our Ollama vs ChatGPT 2026 review.
Ollama doesn't have web search, image generation, or plugins. If you need real-time information or DALL-E, keep a free ChatGPT account as a backup. But for 90% of daily tasks (coding, writing, brainstorming, analysis), Ollama is more than enough.
Troubleshooting
"Model is too slow"
Try a smaller model: ollama run mistral (4.1GB) instead of Llama 3 70B. Close other apps
to free up RAM.
"Out of memory"
Your RAM is full. Use a smaller model or add more RAM. 8GB models work best for 16GB machines.
"Connection refused"
Make sure Ollama is running: ollama serve in a separate terminal window.
What's Next?
Now that you have local AI running, here's how to level up:
- Try different models β each has strengths for different tasks
- Set up Open WebUI β for a polished chat interface with conversation history
- Add AI to VS Code β free Copilot alternative for coding
- Explore RAG β chat with your own documents using LangChain or LlamaIndex
You've just saved yourself $240/yr in ChatGPT fees β and gained complete privacy over your AI conversations. Welcome to local AI. π
Build Your Free AI Stack
Ollama is just the beginning. Discover more free AI tools in our complete guide.
See All Free AI Tools β