I set up a powerful AI server, complete with a web-based interface for easy interaction and a robust API for programmatic access. This guide covers hardware selection, operating system installation, configuration of AI applications (llama.cpp
for inference and embeddings, and Open WebUI), Nginx for secure web access, integration of web search capabilities, and a reliable BorgBackup solution.
The server is custom-built for high-performance AI workloads and general server operations, featuring a powerful and balanced selection of components:
/dev/nvme0n1
).
/dev/nvme1n1
, mounted as /data
).
Ubuntu Server 22.04 LTS was chosen as the operating system. This is a robust, stable, and widely supported Linux distribution, ideal for server environments due to its long-term support (LTS), strong community, and extensive package repositories.
2TB NVMe SSD
(/dev/nvme0n1
)./boot
) outside LVM./
) and potentially other system directories.2TB NVMe SSD
(/dev/nvme1n1
) was formatted as ext4
and mounted as /data
.llama.cpp
Setup and Configuration
llama.cpp
is a C/C++ implementation of the LLaMA inference,
optimized for various hardware, including CPU and GPU (via CUDA/cuBLAS).
llama.cpp
source code was obtained from its official GitHub repository:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
llama.cpp
was compiled with CUDA and cuBLAS support using CMake.
# (Pre-requisite: NVIDIA drivers, CUDA Toolkit, and cmake installed)
# Install cmake if not already present:
# sudo apt install cmake
# Create a build directory and navigate into it
mkdir build
cd build
# Configure CMake to enable CUDA/cuBLAS
cmake .. -DLLAMA_CUBLAS=ON
# Build the project
cmake --build . --config Release
This compilation ensures that llama.cpp
can offload computational tasks to the powerful GPU, significantly speeding up inference.
llama.cpp/models
directory. These models are the core AI components that llama.cpp
will run.
# Example: Download the primary models
wget -P models/ https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/resolve/main/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf
main
executable.
# Assuming one is still in the 'build' directory after compilation
./bin/main -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -p "Hello, AI!" -n 128
llama.cpp
models, one can run llama.cpp
in server mode, which exposes an HTTP API.
# Assuming one is in the 'build' directory
./bin/server -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -c 2048 --host 0.0.0.0 --port 8001
This command starts the llama.cpp
server, listening on port 8001
. For secure API access, Nginx will be configured as a reverse proxy in front of this server, enforcing API key authentication.
llama.cpp
as a Systemd Service: To ensure llama.cpp
starts automatically on boot and runs reliably in the background, it can be configured as a systemd service.
sudo nano /etc/systemd/system/llama-server.service
Add the following content to the file:
[Unit]
Description=Llama.cpp API Server
After=network.target
[Service]
Type=simple
User=user
Group=user
WorkingDirectory=/home/user/llama.cpp/build
ExecStart=/home/user/llama.cpp/build/bin/server --model /home/user/llama.cpp/models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf --port 8001 --host 0.0.0.0 --n-gpu-layers 999
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload
sudo systemctl enable llama-server.service
sudo systemctl start llama-server.service
Check service status:
sudo systemctl status llama-server.service
llama.cpp
Embedding models are crucial for tasks like Retrieval Augmented Generation (RAG), where you need to convert text into numerical vectors (embeddings) to find relevant information. llama.cpp
can also run embedding models.
nomic-embed-text-v1.5.Q4_K_M.gguf
. Place it in the llama.cpp/models
directory.
# Example: Download Nomic-Embed-Text-v1.5
wget -P models/ https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf
server
executable with the --embedding
flag.
# Assuming one is in the 'build' directory
./bin/server --model ../models/nomic-embed-text-v1.5.Q4_K_M.gguf --embedding --port 8003
This command will start the embedding server on port 8003
.
8003
) and configure it as a systemd service.
sudo nano /etc/systemd/system/llama-embedding-server.service
Add the following content to the file:
[Unit]
Description=Llama.cpp Embedding API Server
After=network.target
[Service]
User=user
Group=user
WorkingDirectory=/home/user/llama.cpp/build
ExecStart=/home/user/llama.cpp/build/bin/llama-server --model /home/user/llama.cpp/models/nomic-embed-text-v1.5.Q4_K_M.gguf --port 8003 --host 0.0.0.0 --embedding --n-gpu-layers 999
StandardOutput=journal
StandardError=journal
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload
sudo systemctl enable llama-embedding-server.service
sudo systemctl start llama-embedding-server.service
Check service status:
sudo systemctl status llama-embedding-server.service
Once the service is running, you can send a POST request to the `/embeddings` endpoint of this new server for programmatic access:
# Example POST request (using curl) to the llama.cpp Embedding API
# Replace decrai.com:8002 with your actual API endpoint
curl -X POST \
https://decrai.com:8002/embeddings \
-H "Content-Type: application/json" \
-H "X-API-Key: your_secret_api_key_1" \
-d '{
"content": "This is a sentence to embed."
}'
The API will return a JSON object containing the embedding vector. This allows external applications to programmatically generate embeddings using your server's GPU resources.
Open WebUI provides a user-friendly web interface for interacting with various LLMs. It is deployed using Docker for ease of management.
# Install Docker
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Add the user to the docker group
sudo usermod -aG docker user
newgrp docker # Apply group changes immediately
# Example docker-compose.yml (simplified)
version: '3.8'
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "8080:8080" # Default port for Open WebUI
volumes:
- /data/open-webui:/app/backend/data # Persistent data for WebUI
- /var/run/docker.sock:/var/run/docker.sock # For Docker integration
- /data/models:/app/backend/models # Mount LLM models
restart: always # This ensures the container automatically starts on reboot
sudo docker compose up -d
http://the_server_ip:8080
to create an admin user and start interacting with models.https://decrai.com
or http://the_server_ip:8080
).http://127.0.0.1:8003/v1
).Once configured, your LLMs in Open WebUI should be able to leverage your local embedding model for RAG tasks and perform real-time web searches using DDGS.
Integrating DuckDuckGo Search (DDGS) allows your LLMs within Open WebUI to perform real-time web searches, providing more up-to-date and factual responses. Running it as a dedicated service enhances stability and resource management.
# First, get the container ID or name of your running Open WebUI container
sudo docker ps -a
# Then, execute a shell inside the container and install the package
sudo docker exec -it open-webui /bin/bash -c "pip install ddgs"
After installation, you might need to restart the Open WebUI container for the changes to take effect:
sudo docker restart open-webui
sudo nano /etc/systemd/system/ddgs.service
Add the following content to the file:
[Unit]
Description=DDGS API Service
After=network.target
[Service]
User=user
Group=user
WorkingDirectory=/home/user/ddgs
ExecStart=/home/user/ddgs/venv/bin/gunicorn -w 4 -b 0.0.0.0:5000 app:app
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload
sudo systemctl enable ddgs.service
sudo systemctl start ddgs.service
Check service status:
sudo systemctl status ddgs.service
Nginx was set up as a reverse proxy to provide secure, web-standard access to Open WebUI, and to handle SSL/TLS encryption. It is also used to secure the `llama.cpp` API endpoints.
sudo apt install nginx
# /etc/nginx/sites-available/decrai.com
# HTTP to HTTPS Redirect (Standard Port 80)
server {
listen 80;
listen [::]:80;
server_name decrai.com www.decrai.com;
return 301 https://$host$request_uri;
}
# HTTPS on Standard Port 443 (Hosts Static Content)
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name decrai.com www.decrai.com;
ssl_certificate /etc/letsencrypt/live/decrai.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/decrai.com/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
root /var/www/html;
index index.html index.htm;
location / {
try_files $uri $uri/ =404;
}
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
}
# HTTPS on Custom Port 8000 (Proxies to internal service on 8080)
server {
listen 8000 ssl http2;
listen [::]:8000 ssl http2;
server_name decrai.com www.decrai.com;
ssl_certificate /etc/letsencrypt/live/decrai.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/decrai.com/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
}
# HTTPS on Custom Port 8002 (Proxies to internal service on 8001 with API Key)
server {
listen 8002 ssl http2;
listen [::]:8002 ssl http2;
server_name decrai.com;
ssl_certificate /etc/letsencrypt/live/decrai.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/decrai.com/privkey.pem;
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
if ($api_key_valid = 0) {
return 403 "Forbidden: Invalid API Key\n";
}
location / {
proxy_pass http://127.0.0.1:8001;
proxy_set_header Host $host:$server_port;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
}
# Create symlink for the consolidated configuration file
sudo ln -s /etc/nginx/sites-available/decrai.com /etc/nginx/sites-enabled/
sudo nginx -t # Test Nginx configuration for syntax errors
sudo systemctl restart nginx
sudo apt install certbot python3-certbot-nginx
# Run Certbot for the main domain and subdomains:
sudo certbot --nginx -d decrai.com -d www.decrai.com
Certbot automatically modifies the Nginx configuration to include SSL directives and sets up automatic certificate renewal, ensuring sites remain secure.
sudo ufw allow OpenSSH
sudo ufw allow 'Nginx Full' # Allows both HTTP (80) and HTTPS (443)
sudo ufw allow 8000/tcp # Allow the Open WebUI proxy port
sudo ufw allow 8002/tcp # Allow the llama.cpp API proxy port
sudo ufw enable
Key Principles for Hardware Firewall ACLs:
8080
for Open WebUI or 8001
for `llama.cpp`'s internal listening) to the internet without Nginx as a proxy. Nginx acts as the secure gateway.Consult the specific router or cloud provider documentation for exact instructions on configuring Access Control Lists (ACLs) or port forwarding rules.
After experiencing issues with Timeshift's configuration persistence and its primary focus on system files, BorgBackup was chosen as a more flexible and robust solution for comprehensive system and user data backups.
/
) and all user home directories in a single, unified repository.sudo apt update
sudo apt install borgbackup
/data/borg_repo
), secured with a strong passphrase.
sudo mkdir -p /data/borg_repo
sudo borg init --encryption=repokey /data/borg_repo
/usr/local/bin/backup_all.sh
) was created to perform the backup.
sudo nano /usr/local/bin/backup_all.sh
This script includes:
/
and /home/
.borg check
) after the backup.borg compact
) to reclaim space.prune
command was removed from the script to allow for manual pruning.)sudo /usr/local/bin/backup_all.sh
The user is prompted for the Borg repository passphrase during execution.
sudo borg list /data/borg_repo::archive_name
to see files within a specific backup.sudo borg mount /data/borg_repo::archive_name /mnt/restore_point
allows Browse the backup like a live filesystem.sudo borg extract
can be used to pull specific files or perform a full system restore (typically from a live Linux environment).This setup provides a powerful and flexible server environment. The robust hardware supports demanding AI applications, while Ubuntu Server offers a stable foundation. Nginx provides secure web access and API protection, and BorgBackup ensures that both critical system files and valuable personal data (including the installed applications) are reliably backed up and easily restorable, offering peace of mind against data loss or system corruption.
With the AI server now fully configured, a powerful and versatile platform for a wide range of applications and integrations exists. This section outlines how the environment can be leveraged.
The possibilities are vast. The new AI server is now a versatile tool ready to power the next AI-driven project or application.