I set up a powerful AI server, complete with a web-based interface for easy interaction and a robust API for programmatic access. This guide covers hardware selection, operating system installation, configuration of AI applications (llama.cpp
for inference and embeddings, and Open WebUI), Nginx for secure web access, integration of web search capabilities, and a reliable BorgBackup solution.
The server is custom-built for high-performance AI workloads and general server operations, featuring a powerful and balanced selection of components:
/dev/nvme0n1
).
/dev/nvme1n1
, mounted as /data
).
Ubuntu Server 22.04 LTS was chosen as the operating system. This is a robust, stable, and widely supported Linux distribution, ideal for server environments due to its long-term support (LTS), strong community, and extensive package repositories.
2TB NVMe SSD
(/dev/nvme0n1
)./boot
) outside LVM./
) and potentially other system directories.2TB NVMe SSD
(/dev/nvme1n1
) was formatted as ext4
and mounted as /data
.llama.cpp
Setup and Configuration
llama.cpp
is a C/C++ implementation of the LLaMA inference,
optimized for various hardware, including CPU and GPU (via CUDA/cuBLAS).
llama.cpp
source code was obtained from its official GitHub repository:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
llama.cpp
was compiled with CUDA and cuBLAS support using CMake.
# (Pre-requisite: NVIDIA drivers, CUDA Toolkit, and cmake installed)
# Install cmake if not already present:
# sudo apt install cmake
# Create a build directory and navigate into it
mkdir build
cd build
# Configure CMake to enable CUDA/cuBLAS
cmake .. -DLLAMA_CUBLAS=ON
# Build the project
cmake --build . --config Release
This compilation ensures that llama.cpp
can offload computational tasks to the powerful GPU, significantly speeding up inference.
llama.cpp/models
directory. These models are the core AI components that llama.cpp
will run.
# Example: Download the primary models (replace with actual model URLs if different)
# mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf
# Qwen3-30B-A3B-Q5_K_M.gguf
wget -P models/ https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/resolve/main/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf
wget -P models/ https://huggingface.co/Qwen/Qwen1.5-32B-Chat-GGUF/resolve/main/qwen1_5-32b-chat-Q5_K_M.gguf
main
executable.
# Assuming one is still in the 'build' directory after compilation
./bin/main -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -p "Hello, AI!" -n 128
# Or for the other model:
# ./bin/main -m ../models/qwen1_5-32b-chat-Q5_K_M.gguf -p "Hello, AI!" -n 128
llama.cpp
models, one can run llama.cpp
in server mode, which exposes an HTTP API.
# Assuming one is in the 'build' directory
./bin/server -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -c 2048 --host 0.0.0.0 --port 8123
This command starts the llama.cpp
server, typically listening on port 8123
. For secure API access, Nginx will be configured as a reverse proxy in front of this server, enforcing API key authentication.
llama.cpp
as a Systemd Service: To ensure llama.cpp
starts automatically on boot and runs reliably in the background, it can be configured as a systemd service.
sudo nano /etc/systemd/system/llama-cpp-api.service
Add the following content to the file:
[Unit]
Description=Llama.cpp API Server
After=network.target
[Service]
Type=simple
User=user # Replace 'user' with the actual username
Group=user # Replace 'user' with the actual username
WorkingDirectory=/home/user/llama.cpp/build # Adjust if the llama.cpp clone is elsewhere
ExecStart=/home/user/llama.cpp/build/bin/server -m /home/user/llama.cpp/models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -c 2048 --host 0.0.0.0 --port 8123
# One can change the model above to Qwen3-30B-A3B-Q5_K_M.gguf if that's the primary
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload
sudo systemctl enable llama-cpp-api.service
sudo systemctl start llama-cpp-api.service
Check service status:
sudo systemctl status llama-cpp-api.service
llama.cpp
Embedding models are crucial for tasks like Retrieval Augmented Generation (RAG), where you need to convert text into numerical vectors (embeddings) to find relevant information. llama.cpp
can also run embedding models.
nomic-embed-text-v1.5.Q4_K_M.gguf
. Place it in the llama.cpp/models
directory.
# Example: Download Nomic-Embed-Text-v1.5
wget -P models/ https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf
embedding
executable within llama.cpp
to generate embeddings.
# Assuming one is in the 'build' directory
./bin/embedding -m ../models/nomic-embed-text-v1.5.Q4_K_M.gguf -p "This is a test sentence for embedding."
This command will output a vector of numbers representing the input text. This output can then be used for similarity searches in a vector database or for RAG applications.
8124
) and configure it as a systemd service.
sudo nano /etc/systemd/system/llama-cpp-embedding-api.service
Add the following content to the file:
[Unit]
Description=Llama.cpp Embedding API Server
After=network.target
[Service]
Type=simple
User=user # Replace 'user' with the actual username
Group=user # Replace 'user' with the actual username
WorkingDirectory=/home/user/llama.cpp/build # Adjust if the llama.cpp clone is elsewhere
ExecStart=/home/user/llama.cpp/build/bin/llama-cpp-embedding -m /home/user/llama.cpp/models/nomic-embed-text-v1.5.Q4_K_M.gguf --host 0.0.0.0 --port 8124
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload
sudo systemctl enable llama-cpp-embedding-api.service
sudo systemctl start llama-cpp-embedding-api.service
Check service status:
sudo systemctl status llama-cpp-embedding-api.service
Once the service is running, you can send a POST request to the `/embeddings` endpoint of this new server for programmatic access:
# Example POST request (using curl) to the llama.cpp Embedding API
# Replace api.your_domain.com:8124 with your actual API endpoint
# Replace "your_secret_api_key_1" with your actual API key (if Nginx is configured for this port)
curl -X POST \
https://api.your_domain.com:8124/embeddings \
-H "Content-Type: application/json" \
-H "X-API-Key: your_secret_api_key_1" \
-d '{
"content": "This is a sentence to embed."
}'
The API will return a JSON object containing the embedding vector. This allows external applications to programmatically generate embeddings using your server's GPU resources.
Open WebUI provides a user-friendly web interface for interacting with various LLMs. It is deployed using Docker for ease of management.
# Install Docker
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Add the user to the docker group
sudo usermod -aG docker $USER
newgrp docker # Apply group changes immediately
# Example docker-compose.yml (simplified)
version: '3.8'
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "8080:8080" # Default port for Open WebUI
volumes:
- /data/open-webui:/app/backend/data # Persistent data for WebUI
- /var/run/docker.sock:/var/run/docker.sock # For Docker integration
- /data/models:/app/backend/models # Mount LLM models
restart: always # This ensures the container automatically starts on reboot
sudo docker compose up -d
http://the_server_ip:8080
to create an admin user and start interacting with models.http://your_domain.com
or http://the_server_ip:8080
).http://127.0.0.1:8124/v1
if using the default OpenAI-compatible endpoint, or just http://127.0.0.1:8124
if it's a direct embedding endpoint).Once configured, your LLMs in Open WebUI should be able to leverage your local embedding model for RAG tasks and perform real-time web searches using DDGS.
Integrating DuckDuckGo Search (DDGS) allows your LLMs within Open WebUI to perform real-time web searches, providing more up-to-date and factual responses. Running it as a dedicated service enhances stability and resource management.
# First, get the container ID or name of your running Open WebUI container
sudo docker ps -a
# Then, execute a shell inside the container and install the package
sudo docker exec -it open-webui /bin/bash -c "pip install ddgs"
After installation, you might need to restart the Open WebUI container for the changes to take effect:
sudo docker restart open-webui
sudo nano /etc/systemd/system/ddgs-api.service
Add the following content to the file. This assumes you have a simple Python Flask/FastAPI app that exposes `ddgs` functionality as an API, listening on a port (e.g., `8125`).
[Unit]
Description=DDGS Search API Service
After=network.target
[Service]
Type=simple
User=user # Replace 'user' with the actual username
Group=user # Replace 'user' with the actual username
WorkingDirectory=/home/user/ddgs_api # Create this directory and place your ddgs API script here
ExecStart=/usr/bin/python3 /home/user/ddgs_api/app.py # Assuming app.py runs a simple API for ddgs
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
You would need a simple Python script (e.g., `app.py` in `/home/user/ddgs_api/`) that uses the `ddgs` library and exposes it via a web framework like Flask or FastAPI. For example, a basic `app.py`:
# /home/user/ddgs_api/app.py
from flask import Flask, request, jsonify
from duckduckgo_search import DDGS # Make sure to use DDGS as per your update
app = Flask(__name__)
@app.route('/search', methods=['POST'])
def search_ddgs():
query = request.json.get('query')
if not query:
return jsonify({"error": "Query parameter is missing"}), 400
try:
results = DDGS().text(keywords=query, max_results=5) # Adjust max_results as needed
return jsonify(results)
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8125) # Run on port 8125
Reload systemd, enable, and start the service:
sudo systemctl daemon-reload
sudo systemctl enable ddgs-api.service
sudo systemctl start ddgs-api.service
Check service status:
sudo systemctl status ddgs-api.service
Nginx was set up as a reverse proxy to provide secure, web-standard access to Open WebUI, typically on port 80 (HTTP) and 443 (HTTPS),
and to handle SSL/TLS encryption. It will also be used to secure the llama.cpp
API endpoint using API key enforcement,
hosted on a separate port.
sudo apt install nginx
map
directive to define valid keys and a variable to check against.
# /etc/nginx/conf.d/api_keys.conf
# This map defines valid API keys.
# $http_x_api_key refers to the 'X-API-Key' header sent by the client.
# $api_key_valid will be 1 if the key matches, 0 otherwise.
map $http_x_api_key $api_key_valid {
default 0; # Default to invalid
"your_secret_api_key_1" 1; # Replace with the actual API key
"another_secret_api_key_2" 1; # Add more keys as needed
}
IMPORTANT: Replace "your_secret_api_key_1"
and "another_secret_api_key_2"
with actual, strong, randomly generated API keys. Keep these keys secure.
llama.cpp
APIs:
# /etc/nginx/sites-available/the_domain.com.conf
# HTTP server block to redirect to HTTPS for the main domain
server {
listen 80;
server_name your_domain.com;
return 301 https://$host$request_uri;
}
# HTTPS server block for Open WebUI on port 443
server {
listen 443 ssl;
server_name your_domain.com;
ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem; # Managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem; # Managed by Certbot
ssl_trusted_certificate /etc/letsencrypt/live/your_domain.com/chain.pem; # Managed by Certbot
# Include common SSL settings for security
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Managed by Certbot
# Location for Open WebUI
location / {
proxy_pass http://127.0.0.1:8080; # Proxy to Open WebUI's default port
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support for interactive AI chat
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
# HTTPS server block for llama.cpp main API on port 8123
server {
listen 8123 ssl; # Listen directly on port 8123 with SSL
server_name api.your_domain.com; # This is the dedicated API subdomain
ssl_certificate /etc/letsencrypt/live/api.your_domain.com/fullchain.pem; # Managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/api.your_domain.com/privkey.pem; # Managed by Certbot
ssl_trusted_certificate /etc/letsencrypt/live/api.your_domain.com/chain.pem; # Managed by Certbot
# Include common SSL settings for security
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Managed by Certbot
# Location for llama.cpp API
location / {
# Check if the API key is valid
if ($api_key_valid = 0) {
return 403; # Forbidden if key is invalid or missing
}
proxy_pass http://127.0.0.1:8123; # Proxy to llama.cpp server on its internal port 8123
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-API-Key $http_x_api_key; # Pass the API key header to backend (optional)
# WebSocket support if llama.cpp server uses it
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
# HTTPS server block for llama.cpp Embedding API on port 8124
server {
listen 8124 ssl; # Listen directly on port 8124 with SSL
server_name embed-api.your_domain.com; # Dedicated subdomain for embedding API
ssl_certificate /etc/letsencrypt/live/embed-api.your_domain.com/fullchain.pem; # Managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/embed-api.your_domain.com/privkey.pem; # Managed by Certbot
ssl_trusted_certificate /etc/letsencrypt/live/embed-api.your_domain.com/chain.pem; # Managed by Certbot
# Include common SSL settings for security
include /etc/letsencrypt/options-ssl-nginx.conf;
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Managed by Certbot
# Location for llama.cpp Embedding API
location / {
# Check if the API key is valid (optional, but recommended for external access)
if ($api_key_valid = 0) {
return 403; # Forbidden if key is invalid or missing
}
proxy_pass http://127.0.0.1:8124; # Proxy to llama.cpp embedding server on its internal port 8124
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-API-Key $http_x_api_key; # Pass the API key header to backend (optional)
}
}
```
# Ensure any old symlinks are removed if separate files were used
# sudo rm -f /etc/nginx/sites-enabled/openwebui.conf
# sudo rm -f /etc/nginx/sites-enabled/llama-api.conf
# Create symlink for the consolidated configuration file
sudo ln -s /etc/nginx/sites-available/your_domain.com.conf /etc/nginx/sites-enabled/
sudo nginx -t # Test Nginx configuration for syntax errors
sudo systemctl restart nginx
sudo apt install certbot python3-certbot-nginx
# Run Certbot for the main domain:
sudo certbot --nginx -d your_domain.com
# Run Certbot for the API subdomain:
sudo certbot --nginx -d api.your_domain.com
# Run Certbot for the Embedding API subdomain:
sudo certbot --nginx -d embed-api.your_domain.com
Certbot automatically modifies the Nginx configuration to include SSL directives and sets up automatic certificate renewal, ensuring sites remain secure.
sudo ufw allow OpenSSH
sudo ufw allow 'Nginx Full' # Allows both HTTP (80) and HTTPS (443)
sudo ufw allow 8123/tcp # Allow the main API port
sudo ufw allow 8124/tcp # Allow the embedding API port
sudo ufw enable
Key Principles for Hardware Firewall ACLs:
llama.cpp
main API.llama.cpp
Embedding API.8080
for Open WebUI or 8123
for llama.cpp
's internal listening) to the internet without Nginx as a proxy. Nginx acts as the secure gateway.Consult the specific router or cloud provider documentation for exact instructions on configuring Access Control Lists (ACLs) or port forwarding rules.
After experiencing issues with Timeshift's configuration persistence and its primary focus on system files, BorgBackup was chosen as a more flexible and robust solution for comprehensive system and user data backups.
/
) and all user home directories in a single, unified repository.sudo apt update
sudo apt install borgbackup
/data/borg_repo
), secured with a strong passphrase.
sudo mkdir -p /data/borg_repo
sudo borg init --encryption=repokey /data/borg_repo
/usr/local/bin/backup_all.sh
) was created to perform the backup.
sudo nano /usr/local/bin/backup_all.sh
This script includes:
/
and /home/
.borg check
) after the backup.borg compact
) to reclaim space.prune
command was removed from the script to allow for manual pruning.)sudo /usr/local/bin/backup_all.sh
The user is prompted for the Borg repository passphrase during execution.
sudo borg list /data/borg_repo::archive_name
to see files within a specific backup.sudo borg mount /data/borg_repo::archive_name /mnt/restore_point
allows browsing the backup like a live filesystem.sudo borg extract
can be used to pull specific files or perform a full system restore (typically from a live Linux environment).This setup provides a powerful and flexible server environment. The robust hardware supports demanding AI applications, while Ubuntu Server offers a stable foundation. Nginx provides secure web access and API protection, and BorgBackup ensures that both critical system files and valuable personal data (including the installed applications) are reliably backed up and easily restorable, offering peace of mind against data loss or system corruption.
With the AI server now fully configured, a powerful and versatile platform for a wide range of applications and integrations exists. This section outlines how the environment can be leveraged.
The possibilities are vast. The new AI server is now a versatile tool ready to power the next AI-driven project or application.