Firstly, why would you want your own AI server?

It’s Exciting

People who call AI a dangerous new species we don’t understand seem to forget that we grew up on a cartoon about a 10-year-old kid wandering around with an electric hamster whose entire vocabulary was just its own name. Some folks buy pit bulls because they look intimidating—don’t mean to brag but my new “pet” is supposedly destined to replace us all.

Financial Considerations

Asking ChatGPT-4o “How many free tokens do I get a day?” followed by “How many tokens did I use on this conversation?” already eats up about 7.5% of the 5-hour window allowance—just for two short answers. On my local setup, I can ask as many questions as I want without watching a meter tick down. That kind of freedom on ChatGPT costs $200/month.

Additionally, I convinced my wife that buying a new GPU was a “career investment” so now I can play Cyberpunk on Ultra details. My career as a futuristic mercenary is indeed flourishing.

Privacy & Data Control

Running AI locally keeps all your prompts and data on your own hardware. That means less risk of leaks, breaches, or third-party snooping—critical for personal projects, businesses with confidential data, or compliance with laws like GDPR. Even Sam Altman, CEO of OpenAI, admitted ChatGPT prompts can be used against you in court. Let’s just say… virtual snitches don’t get stitches.

Customization & Flexibility

A local AI can be shaped exactly to your needs—fine-tuned to specific tasks, datasets, or workflows. You can adjust parameters, integrate niche tools, or build features cloud AIs won’t allow. A few tweaks and—voilà—my kids can’t use AI in all the ways I would have at their age. This post might be partially written by AI, but my child’s essays? Old fashioned sweat, tears and terrible grammar.

Offline Access

Local AI doesn’t need the internet. Perfect for remote work, secure facilities, or unreliable networks. Personally, I rarely leave my house—but if you’re a traveler, you could run it on a laptop at 30,000 feet.

Avoiding Vendor Lock-In

Cloud providers can change prices, restrict models, or limit usage. Your own setup means independence—no surprise policy changes, no disappearing features. Just an occasional answer in Chinese to broaden your horizons.

Democratization of AI

It’s important that such a powerful technology isn’t just used and developed by the chosen few. Power tends to corrupt, and absolute power corrupts absolutely—self-hosting is a tiny step towards decentralization of AI.

Learning & Experimentation

This is the most convincing reason for me. Running a local AI isn’t just useful—it’s a hands-on way to explore the fundamentals of AI itself.

Fancy a franchise?

When I realized I could set up an AI server at home, I knew I had to do it. The use case can come later. Ollama’s first version dropped back in January ’24, so yeah—I might be a little late to the party but I’ve done my fair share of partying in my 20s and I know that’s actually the best move: you skip the awkward introductions and everybody’s already having fun — you just need to get on their level.

If it comes to requirements, all you need is a computer really. I started playing with AI around the same time when I was buying a new PC but technically you don’t even need a GPU to run some small models. That said, the better your GPU, the more fun you'll have.

And just to set expectations: I never planned for my home setup to replace GPT or Grok — those are the Michelin-star restaurants at the top of the culinary food chain, and you’d be foolish not to visit them when you want the very best.

My AI server is an all-you-can-eat diner, allowing you to stuff your face when no one is looking. You're not going to cry when some of it falls to the floor. Screw it, throw something and see what happens.

Meet our crew:

Ollama is the kitchen staff. They don’t need to be versed in customer service and there’s no need for them to look presentable. They just cook. The food comes out hot and ready, but you probably don’t want to eat it straight out of the pot.

Open WebUI is the waiter. Sleek, polished, and attentive. He takes what the kitchen is churning out and makes it an experience. Special requests? No problem. Unsure what to order? He’ll recommend the chef’s special, maybe even something you didn’t know you wanted.

Docker is the staffing agency. Do you care about the waiter’s work visa, contract type, or how his paycheck gets handled? Of course not. The agency makes sure he shows up, ready to serve, and that he stays happy enough to keep doing the job.

My AI server might not replace the five-star cloud giants, but it’s an endless buffet where I’m both the chef and the customer.

Now grab a plate—we’re cooking.

🪟 Installation on Windows

Many tutorials I came across during my research suggest that Ollama—primarily designed for Linux—performs better when installed within a WSL2 VM. However, given the rapid pace of development in the local AI space (with older Windows versions of Ollama previously labeled as BETA), I suspected this guidance might be outdated. It’s possible that even newer tutorials are just echoing old information.

To test this, I experimented with both setups: Ollama installed in a WSL2 VM and a native Windows installation. Interestingly, the performance difference was relatively minor (97 vs. 93 average tokens/s using Llama3.1:8b) and it was actually the native Windows setup that slightly outperformed the WSL2 configuration, suggesting that the conventional wisdom about WSL2 superiority may no longer hold.

For those who want to test it out on their system follow the steps for Linux installation within the WSL2 VM.

Ollama

Go to ollama.com/download/windows and press download to get the installer. Once it’s installed you can open the Ollama app, choose one of the models from the dropdown menu and write anything to kick off the model download. Right, Bob’s your uncle, thank you for tuning in.

On a more serious note, we still want some of the bells and whistles offered by self-hosting a local AI but I genuinely think it’s impressive that over the past few months Ollama made free of charge access to a basic local LLM with a UI this simple. It even supports some multimodal input like PDFs and images.

If the model that you want isn’t in the dropdown menu you can go to ollama.com/search, click the model and version that interests you, copy the command (for example ollama run llama3.1:8b) and paste it into Powershell.

Docker

Go to docker.com, select the AMD version, download and install it. ARM is mainly for phones and tablets.

Docker is recommending to use WSL and you can install it using this command in Powershell:

wsl --install

You’ll be prompted to create an Admin account for the newly created Ubuntu instance in WSL.

Open WebUI

A good UI makes the experience similar to cloud-based AI models and allows functionalities like secure role-based access, seamless local and web-enhanced Retrieval Augmented Generation (RAG), integrated web browsing, dynamic image generation, and the ability to collaborate across multiple models for richer, more personalized chat experiences.

Download & run the container by pasting one of below Powershell commands:

To run Open WebUI with Nvidia GPU support:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

If you don’t have an Nvidia GPU:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

After installation, you can access Open WebUI at localhost:3000. You will be asked to create an account — the first account becomes the Admin.

🐧 Installation for Linux

Ollama

Paste this command in Bash:

curl -fsSL https://ollama.com/install.sh | sh

Go to ollama.com/search, select your model and copy paste the command into Bash:

Example:

ollama run llama3.1

Docker

Set up Docker's apt repository:

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

Then install the Docker packages:

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Open WebUI

To run Open WebUI with Nvidia GPU support:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

If you don’t have an Nvidia GPU:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

After installation, you can access Open WebUI at localhost:3000. You will be asked to create an account — the first account becomes the Admin.

How I set up an AI server in my bedroom

Firstly, why would you want your own AI server?

It’s Exciting

Financial Considerations

Privacy & Data Control

Customization & Flexibility

Offline Access

Avoiding Vendor Lock-In

Democratization of AI

Learning & Experimentation

Fancy a franchise?

🪟 Installation on Windows

Ollama

Docker

Open WebUI

To run Open WebUI with Nvidia GPU support:

If you don’t have an Nvidia GPU:

🐧 Installation for Linux

Ollama

Docker

Open WebUI

To run Open WebUI with Nvidia GPU support:

If you don’t have an Nvidia GPU: