ollamaofflinetutorial

Ollama + Skales: The Complete Offline AI Stack

Mario Simic

ยท6 min read
ShareXLinkedIn

This guide covers the complete stack for running Skales entirely offline โ€” no internet required, no API costs, no data leaving your machine. By the end you will have a fully functioning local AI agent that can handle email, calendar, file management, and automation without any external dependencies.

Step 1: Install Ollama

Download Ollama from ollama.ai. On macOS, open the .dmg, drag to Applications, and launch โ€” a small icon appears in your menu bar. On Windows, run the .exe installer. Ollama installs as a background service that starts automatically at login and listens on http://localhost:11434.

Verify it is running: open your browser and navigate to http://localhost:11434. You should see: Ollama is running.

Step 2: Download a Model

Open a terminal and run: ollama pull llama3.1. This downloads Llama 3.1 8B (4.7 GB) from Meta's public weights. Choose your model based on your hardware โ€” see the benchmark section below.

Test immediately with ollama run llama3.1. Type anything and press Enter. You are now running AI with zero cloud dependency.

Step 3: Connect Skales to Ollama

Install Skales from getskales.app/download. On first launch, when prompted for an AI provider, choose Ollama and enter the endpoint: http://localhost:11434. Select your downloaded model from the dropdown. Click Save.

If you are adding Ollama to an existing Skales install: Settings โ†’ Providers โ†’ Add Provider โ†’ Ollama โ†’ enter endpoint โ†’ select model โ†’ Save. You can set Ollama as your default provider or use it alongside cloud providers for specific conversations.

Step 4: Verify Offline Operation

Disconnect your internet (enable airplane mode, or disable WiFi). Open Skales and ask it a question. If it responds, you are fully offline. Navigate to Settings โ†’ Network Monitor โ€” this shows all network activity Skales has initiated. With Ollama as your provider and no integrations that require cloud access (Gmail sync, calendar sync), you should see zero outbound connections.

Performance Benchmarks by Model and Hardware

These are real-world measurements on common hardware configurations, measuring time-to-first-token for a typical 200-token prompt:

MacBook Air M2 (8GB RAM): Llama 3.2 3B โ†’ 0.4s | Phi-3 Mini โ†’ 0.5s | Mistral 7B Q4 โ†’ 1.8s

MacBook Pro M3 (16GB RAM): Llama 3.1 8B โ†’ 0.6s | Qwen 2.5 14B Q4 โ†’ 1.4s | Mistral 7B full โ†’ 0.9s

Windows laptop, Intel i7, 16GB, no GPU: Llama 3.2 3B โ†’ 2.1s | Mistral 7B Q4 โ†’ 8.4s | Llama 3.1 8B โ†’ 12s

Windows desktop, RTX 3080 (10GB VRAM): Mistral 7B โ†’ 0.3s | Llama 3.1 8B โ†’ 0.5s | Qwen 2.5 14B โ†’ 1.1s

Apple Silicon is disproportionately well-suited to local inference because of its unified memory architecture โ€” the GPU and CPU share RAM, so there is no bottleneck moving model weights between them. An M-series MacBook is currently the best consumer hardware for local AI by performance-per-watt.

Best Models by Task

General conversation and email: Llama 3.1 8B. Excellent natural language, strong instruction following, fits comfortably in 8โ€“10 GB RAM.

Coding and technical tasks: Qwen 2.5 Coder 7B or DeepSeek Coder V2. Specifically fine-tuned for code โ€” noticeably better than general models on function generation, debugging, and code explanation.

Reasoning and document analysis: Qwen 2.5 14B (needs 16 GB RAM). Strong on analytical tasks, structured output, and following complex multi-step instructions.

Speed-critical use (many quick queries): Phi-3 Mini or Llama 3.2 3B. Sacrifice some quality for near-instant responses. Good for simple classification, short summaries, quick answers.

Maximum quality (64 GB RAM or 24 GB VRAM): Llama 3.1 70B Q4 or Qwen 2.5 32B. Approaches GPT-4-class performance on many tasks with full local privacy.

See more about fully offline workflows and local AI use cases.

Try it yourself ๐ŸฆŽ

Skales is free for personal use. No Docker. No account.

Download Free โ†’
ShareXLinkedIn