Building an AI Server with Web Interface and API Programmatic Access

I set up a powerful AI server, complete with a web-based interface for easy interaction and a robust API for programmatic access. This guide covers hardware selection, operating system installation, configuration of AI applications (llama.cpp for inference and embeddings, and Open WebUI), Nginx for secure web access, integration of web search capabilities, and a reliable BorgBackup solution.

1. Hardware Choices

The server is custom-built for high-performance AI workloads and general server operations, featuring a powerful and balanced selection of components:

2. Operating System Installation

Ubuntu Server 22.04 LTS was chosen as the operating system. This is a robust, stable, and widely supported Linux distribution, ideal for server environments due to its long-term support (LTS), strong community, and extensive package repositories.

Installation & Partitioning:

3. llama.cpp Setup and Configuration

llama.cpp is a C/C++ implementation of the LLaMA inference, optimized for various hardware, including CPU and GPU (via CUDA/cuBLAS).

Setup Steps:

  1. Cloning the Repository: The llama.cpp source code was obtained from its official GitHub repository:
    git clone https://github.com/ggerganov/llama.cpp.git
    cd llama.cpp
  2. Building with CUDA/cuBLAS Support (using CMake): To leverage the RTX 3090 Ti GPU, llama.cpp was compiled with CUDA and cuBLAS support using CMake.
    # (Pre-requisite: NVIDIA drivers, CUDA Toolkit, and cmake installed)
    # Install cmake if not already present:
    # sudo apt install cmake
    
    # Create a build directory and navigate into it
    mkdir build
    cd build
    
    # Configure CMake to enable CUDA/cuBLAS
    cmake .. -DLLAMA_CUBLAS=ON
    
    # Build the project
    cmake --build . --config Release

    This compilation ensures that llama.cpp can offload computational tasks to the powerful GPU, significantly speeding up inference.

  3. Downloading Models: Large Language Models (LLMs) in GGUF format were downloaded and placed into the llama.cpp/models directory. These models are the core AI components that llama.cpp will run.
    # Example: Download the primary models
    wget -P models/ https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/resolve/main/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf
  4. Basic Usage: Models can be run directly from the command line using the compiled main executable.
    # Assuming one is still in the 'build' directory after compilation
    ./bin/main -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -p "Hello, AI!" -n 128
  5. Server Mode and API Access: For programmatic access to llama.cpp models, one can run llama.cpp in server mode, which exposes an HTTP API.
    # Assuming one is in the 'build' directory
    ./bin/server -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -c 2048 --host 0.0.0.0 --port 8001

    This command starts the llama.cpp server, listening on port 8001. For secure API access, Nginx will be configured as a reverse proxy in front of this server, enforcing API key authentication.

  6. Running llama.cpp as a Systemd Service: To ensure llama.cpp starts automatically on boot and runs reliably in the background, it can be configured as a systemd service.
    sudo nano /etc/systemd/system/llama-server.service

    Add the following content to the file:

    [Unit]
    Description=Llama.cpp API Server
    After=network.target
    
    [Service]
    Type=simple
    User=user
    Group=user
    WorkingDirectory=/home/user/llama.cpp/build
    ExecStart=/home/user/llama.cpp/build/bin/server --model /home/user/llama.cpp/models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf --port 8001 --host 0.0.0.0 --n-gpu-layers 999
    Restart=always
    RestartSec=5
    
    [Install]
    WantedBy=multi-user.target

    Reload systemd, enable, and start the service:

    sudo systemctl daemon-reload
    sudo systemctl enable llama-server.service
    sudo systemctl start llama-server.service

    Check service status:

    sudo systemctl status llama-server.service

4. Using the Embedding Model with llama.cpp

Embedding models are crucial for tasks like Retrieval Augmented Generation (RAG), where you need to convert text into numerical vectors (embeddings) to find relevant information. llama.cpp can also run embedding models.

Setup and Usage:

  1. Downloading an Embedding Model: Download a GGUF-formatted embedding model, specifically nomic-embed-text-v1.5.Q4_K_M.gguf. Place it in the llama.cpp/models directory.
    # Example: Download Nomic-Embed-Text-v1.5
    wget -P models/ https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf
  2. Running the Embedding Model: Use the server executable with the --embedding flag.
    # Assuming one is in the 'build' directory
    ./bin/server --model ../models/nomic-embed-text-v1.5.Q4_K_M.gguf --embedding --port 8003

    This command will start the embedding server on port 8003.

  3. Running the Embedding Model as a Systemd Service (API Access): To expose the embedding model via an API for programmatic access, you can run it in server mode on a dedicated port (e.g., 8003) and configure it as a systemd service.
    sudo nano /etc/systemd/system/llama-embedding-server.service

    Add the following content to the file:

    [Unit]
    Description=Llama.cpp Embedding API Server
    After=network.target
    
    [Service]
    User=user
    Group=user
    WorkingDirectory=/home/user/llama.cpp/build
    ExecStart=/home/user/llama.cpp/build/bin/llama-server --model /home/user/llama.cpp/models/nomic-embed-text-v1.5.Q4_K_M.gguf --port 8003 --host 0.0.0.0 --embedding --n-gpu-layers 999
    StandardOutput=journal
    StandardError=journal
    Restart=always
    RestartSec=5
    
    [Install]
    WantedBy=multi-user.target

    Reload systemd, enable, and start the service:

    sudo systemctl daemon-reload
    sudo systemctl enable llama-embedding-server.service
    sudo systemctl start llama-embedding-server.service

    Check service status:

    sudo systemctl status llama-embedding-server.service

    Once the service is running, you can send a POST request to the `/embeddings` endpoint of this new server for programmatic access:

    # Example POST request (using curl) to the llama.cpp Embedding API
    # Replace decrai.com:8002 with your actual API endpoint
    curl -X POST \
      https://decrai.com:8002/embeddings \
      -H "Content-Type: application/json" \
      -H "X-API-Key: your_secret_api_key_1" \
      -d '{
        "content": "This is a sentence to embed."
      }'

    The API will return a JSON object containing the embedding vector. This allows external applications to programmatically generate embeddings using your server's GPU resources.

5. Open WebUI Setup and Configuration

Open WebUI provides a user-friendly web interface for interacting with various LLMs. It is deployed using Docker for ease of management.

Setup Steps:

  1. Docker and Docker Compose Installation: Docker and Docker Compose were installed on the Ubuntu server to manage containerized applications.
    # Install Docker
    sudo apt-get update
    sudo apt-get install ca-certificates curl gnupg
    sudo install -m 0755 -d /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    sudo chmod a+r /etc/apt/keyrings/docker.gpg
    echo \
      "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
      "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get update
    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    
    # Add the user to the docker group
    sudo usermod -aG docker user
    newgrp docker # Apply group changes immediately
  2. Open WebUI Deployment: Open WebUI was deployed using Docker Compose.
    # Example docker-compose.yml (simplified)
    version: '3.8'
    services:
      open-webui:
        image: ghcr.io/open-webui/open-webui:main
        container_name: open-webui
        ports:
          - "8080:8080" # Default port for Open WebUI
        volumes:
          - /data/open-webui:/app/backend/data # Persistent data for WebUI
          - /var/run/docker.sock:/var/run/docker.sock # For Docker integration
          - /data/models:/app/backend/models # Mount LLM models
        restart: always # This ensures the container automatically starts on reboot
  3. Starting Open WebUI:
    sudo docker compose up -d
  4. Initial Access: Open WebUI would then be accessible via http://the_server_ip:8080 to create an admin user and start interacting with models.
  5. Enabling Custom Embedding Model and Web Search in Open WebUI: Once Open WebUI is running and you've logged in as an admin, you can configure it to use your custom embedding model and enable web search.
    • Access your Open WebUI interface (e.g., https://decrai.com or http://the_server_ip:8080).
    • Log in with your admin account.
    • Navigate to **Settings** (usually a gear icon in the sidebar).
    • For the Embedding Model:
      • Look for a section like **Connections**, **Models**, or **Integrations**.
      • You may need to add a new "Custom API" or "OpenAI Compatible API" connection.
      • Point the **API URL** to your `llama.cpp` embedding server (e.g., http://127.0.0.1:8003/v1).
      • Give it a descriptive name (e.g., "Nomic Embeddings").
      • Save the connection. You might then need to select this embedding model in the RAG or document processing settings.
    • For Web Search (DDGS):
      • Look for a section related to **Tools** or **Integrations**.
      • Find the option for **Web Search** or **DuckDuckGo Search** and **enable** it.
      • Ensure that the `ddgs` service (which you'll set up in the next section) is running and accessible by the Open WebUI container.
      • Save the changes.

    Once configured, your LLMs in Open WebUI should be able to leverage your local embedding model for RAG tasks and perform real-time web searches using DDGS.

6. Setting up DDGS Search as a Service

Integrating DuckDuckGo Search (DDGS) allows your LLMs within Open WebUI to perform real-time web searches, providing more up-to-date and factual responses. Running it as a dedicated service enhances stability and resource management.

Setup Steps:

  1. Install `ddgs` Python Library: This library is used by Open WebUI to interact with DuckDuckGo. You'll install it inside the Open WebUI container.
    # First, get the container ID or name of your running Open WebUI container
    sudo docker ps -a
    
    # Then, execute a shell inside the container and install the package
    sudo docker exec -it open-webui /bin/bash -c "pip install ddgs"

    After installation, you might need to restart the Open WebUI container for the changes to take effect:

    sudo docker restart open-webui
  2. Running DDGS as a Systemd Service: You have a dedicated `ddgs.service` for this purpose.
    sudo nano /etc/systemd/system/ddgs.service

    Add the following content to the file:

    [Unit]
    Description=DDGS API Service
    After=network.target
    
    [Service]
    User=user
    Group=user
    WorkingDirectory=/home/user/ddgs
    ExecStart=/home/user/ddgs/venv/bin/gunicorn -w 4 -b 0.0.0.0:5000 app:app
    Restart=on-failure
    RestartSec=5
    
    [Install]
    WantedBy=multi-user.target

    Reload systemd, enable, and start the service:

    sudo systemctl daemon-reload
    sudo systemctl enable ddgs.service
    sudo systemctl start ddgs.service

    Check service status:

    sudo systemctl status ddgs.service

7. Nginx Setup and Configuration

Nginx was set up as a reverse proxy to provide secure, web-standard access to Open WebUI, and to handle SSL/TLS encryption. It is also used to secure the `llama.cpp` API endpoints.

Setup Steps:

  1. Nginx Installation:
    sudo apt install nginx
  2. Configure Nginx for API Key Enforcement: To enforce API keys, Nginx uses the `map` directive to define valid keys and a variable to check against. This is configured in `/etc/nginx/nginx.conf`.
  3. Creating a Consolidated Server Block (Virtual Host) for Open WebUI and `llama.cpp` APIs:
    # /etc/nginx/sites-available/decrai.com
    # HTTP to HTTPS Redirect (Standard Port 80)
    server {
        listen 80;
        listen [::]:80;
        server_name decrai.com www.decrai.com;
        return 301 https://$host$request_uri;
    }
    
    # HTTPS on Standard Port 443 (Hosts Static Content)
    server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;
        server_name decrai.com www.decrai.com;
        ssl_certificate /etc/letsencrypt/live/decrai.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/decrai.com/privkey.pem;
        include /etc/letsencrypt/options-ssl-nginx.conf;
        ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
        root /var/www/html;
        index index.html index.htm;
        location / {
            try_files $uri $uri/ =404;
        }
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;
        add_header X-XSS-Protection "1; mode=block";
    }
    
    # HTTPS on Custom Port 8000 (Proxies to internal service on 8080)
    server {
        listen 8000 ssl http2;
        listen [::]:8000 ssl http2;
        server_name decrai.com www.decrai.com;
        ssl_certificate /etc/letsencrypt/live/decrai.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/decrai.com/privkey.pem;
        include /etc/letsencrypt/options-ssl-nginx.conf;
        ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
        location / {
            proxy_pass http://127.0.0.1:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_read_timeout 300s;
            proxy_send_timeout 300s;
        }
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;
        add_header X-XSS-Protection "1; mode=block";
    }
    
    # HTTPS on Custom Port 8002 (Proxies to internal service on 8001 with API Key)
    server {
        listen 8002 ssl http2;
        listen [::]:8002 ssl http2;
        server_name decrai.com;
        ssl_certificate /etc/letsencrypt/live/decrai.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/decrai.com/privkey.pem;
        include /etc/letsencrypt/options-ssl-nginx.conf;
        ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
        if ($api_key_valid = 0) {
            return 403 "Forbidden: Invalid API Key\n";
        }
        location / {
            proxy_pass http://127.0.0.1:8001;
            proxy_set_header Host $host:$server_port;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_read_timeout 300s;
            proxy_send_timeout 300s;
        }
        add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;
        add_header X-XSS-Protection "1; mode=block";
    }
  4. Enabling the Server Block and Testing Nginx:
    # Create symlink for the consolidated configuration file
    sudo ln -s /etc/nginx/sites-available/decrai.com /etc/nginx/sites-enabled/
    
    sudo nginx -t # Test Nginx configuration for syntax errors
    sudo systemctl restart nginx
  5. SSL Setup with Certbot (Let's Encrypt): For secure HTTPS access, Certbot was used to automatically obtain and configure SSL certificates from Let's Encrypt.
    sudo apt install certbot python3-certbot-nginx
    # Run Certbot for the main domain and subdomains:
    sudo certbot --nginx -d decrai.com -d www.decrai.com

    Certbot automatically modifies the Nginx configuration to include SSL directives and sets up automatic certificate renewal, ensuring sites remain secure.

  6. Firewall Configuration (UFW): The Uncomplicated Firewall (UFW) was configured to allow necessary traffic (SSH, HTTP, HTTPS, and the new API ports).
    sudo ufw allow OpenSSH
    sudo ufw allow 'Nginx Full' # Allows both HTTP (80) and HTTPS (443)
    sudo ufw allow 8000/tcp # Allow the Open WebUI proxy port
    sudo ufw allow 8002/tcp # Allow the llama.cpp API proxy port
    sudo ufw enable
  7. Hardware Firewall ACLs (Access Control Lists): Beyond the server's local UFW, it's crucial to configure the network's hardware firewall (e.g., on the router, dedicated firewall appliance, or cloud security group) to restrict incoming traffic.

    Key Principles for Hardware Firewall ACLs:

    • Deny All by Default: The most secure approach is to configure the hardware firewall to block all incoming connections by default.
    • Allow Only Necessary Ports: Explicitly create rules to permit traffic only on the ports the server needs to be accessible from the internet. This would typically include:
      • Port 80 (HTTP): For initial web access and Certbot's HTTP-01 challenge (which redirects to HTTPS).
      • Port 443 (HTTPS): For secure web access to static content.
      • Port 8000 (HTTPS/WebUI): For secure external access to the Open WebUI.
      • Port 8002 (HTTPS/API): For secure external access to the `llama.cpp` APIs.
    • Source IP Restriction (where possible): For services or APIs that should only be accessed from known locations, configure rules to only allow connections from specific source IP addresses.
    • No Internal Port Exposure: Do not directly expose internal application ports (like 8080 for Open WebUI or 8001 for `llama.cpp`'s internal listening) to the internet without Nginx as a proxy. Nginx acts as the secure gateway.

    Consult the specific router or cloud provider documentation for exact instructions on configuring Access Control Lists (ACLs) or port forwarding rules.

8. Backup Solution (BorgBackup)

After experiencing issues with Timeshift's configuration persistence and its primary focus on system files, BorgBackup was chosen as a more flexible and robust solution for comprehensive system and user data backups.

Reasons for Choosing BorgBackup:

Setup and Usage:

  1. Installation:
    sudo apt update
    sudo apt install borgbackup
  2. Repository Initialization: A Borg repository was initialized on the dedicated data partition (/data/borg_repo), secured with a strong passphrase.
    sudo mkdir -p /data/borg_repo
    sudo borg init --encryption=repokey /data/borg_repo
  3. Custom Backup Script: A custom shell script (/usr/local/bin/backup_all.sh) was created to perform the backup.
    sudo nano /usr/local/bin/backup_all.sh

    This script includes:

    • Backing up / and /home/.
    • Excluding common temporary, cache, and system-specific directories.
    • Creating a unique archive name based on the hostname and current timestamp.
    • Performing a repository integrity check (borg check) after the backup.
    • Compacting the repository (borg compact) to reclaim space.
    • (Note: The prune command was removed from the script to allow for manual pruning.)
  4. On-Demand Execution: The script is executed manually whenever a backup is desired.
    sudo /usr/local/bin/backup_all.sh

    The user is prompted for the Borg repository passphrase during execution.

  5. Verification and Restoration:
    • Listing Contents: sudo borg list /data/borg_repo::archive_name to see files within a specific backup.
    • Mounting for Browse: sudo borg mount /data/borg_repo::archive_name /mnt/restore_point allows Browse the backup like a live filesystem.
    • Extracting Specific Files/Full Restore: sudo borg extract can be used to pull specific files or perform a full system restore (typically from a live Linux environment).

Conclusion

This setup provides a powerful and flexible server environment. The robust hardware supports demanding AI applications, while Ubuntu Server offers a stable foundation. Nginx provides secure web access and API protection, and BorgBackup ensures that both critical system files and valuable personal data (including the installed applications) are reliably backed up and easily restorable, offering peace of mind against data loss or system corruption.



What's Next: Utilizing the AI Server Environment

With the AI server now fully configured, a powerful and versatile platform for a wide range of applications and integrations exists. This section outlines how the environment can be leveraged.

How the Environment Can Be Used:

Potential Applications and Integrations:

The possibilities are vast. The new AI server is now a versatile tool ready to power the next AI-driven project or application.