Building an AI Server with Web Interface and API Programmatic Access

I set up a powerful AI server, complete with a web-based interface for easy interaction and a robust API for programmatic access. This guide covers hardware selection, operating system installation, configuration of AI applications (llama.cpp for inference and embeddings, and Open WebUI), Nginx for secure web access, integration of web search capabilities, and a reliable BorgBackup solution.

1. Hardware Choices

The server is custom-built for high-performance AI workloads and general server operations, featuring a powerful and balanced selection of components:

CPU: AMD Ryzen 9 9950X3D Granite Ridge AM5 4.30GHz 16-Core Boxed Processor.
CPU Cooler: Thermalright Peerless Assassin 120 SE CPU Air Cooler.
Motherboard: MSI B650-P PRO WiFi AMD AM5 ATX Motherboard.
RAM: G.Skill Flare X5 Series 64GB (2 x 32GB) DDR5-6000 PC5-48000 CL30 Dual Channel Desktop Memory.
GPU: NVIDIA GeForce RTX 3090 Ti Founders Edition Dual Fan 24GB GD6X PCIe 4.0 Graphics Card.
Storage (OS Disk): Samsung 990 PRO 2TB Samsung V NAND 3-bit MLC PCIe Gen 4 x4 NVMe M.2 Internal SSD (/dev/nvme0n1).
Storage (Data Disk): Samsung 990 PRO 2TB Samsung V NAND 3-bit MLC PCIe Gen 4 x4 NVMe M.2 Internal SSD (/dev/nvme1n1, mounted as /data).
Power Supply: MSI MAG A850GL PCIE5 850 Watt 80 Plus Gold ATX Fully Modular Power Supply.
Case: Lian Li LANCOOL 217 Tempered Glass ATX Mid-Tower Computer Case - Black.

2. Operating System Installation

Ubuntu Server 22.04 LTS was chosen as the operating system. This is a robust, stable, and widely supported Linux distribution, ideal for server environments due to its long-term support (LTS), strong community, and extensive package repositories.

Installation & Partitioning:

The OS was installed on the 2TB NVMe SSD (/dev/nvme0n1).
The OS disk was configured with Logical Volume Management (LVM), providing flexibility for partition resizing and management. This typically includes:
- A small boot partition (e.g., /boot) outside LVM.
- A physical volume (PV) created on the rest of the disk.
- A volume group (VG) created on the physical volume.
- Logical volumes (LVs) created within the volume group for the root filesystem (/) and potentially other system directories.
The second 2TB NVMe SSD (/dev/nvme1n1) was formatted as ext4 and mounted as /data.

3. `llama.cpp` Setup and Configuration

llama.cpp is a C/C++ implementation of the LLaMA inference, optimized for various hardware, including CPU and GPU (via CUDA/cuBLAS).

Setup Steps:

Cloning the Repository: The llama.cpp source code was obtained from its official GitHub repository:
```
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
```

Building with CUDA/cuBLAS Support (using CMake): To leverage the RTX 3090 Ti GPU, llama.cpp was compiled with CUDA and cuBLAS support using CMake.

# (Pre-requisite: NVIDIA drivers, CUDA Toolkit, and cmake installed)
# Install cmake if not already present:
# sudo apt install cmake

# Create a build directory and navigate into it
mkdir build
cd build

# Configure CMake to enable CUDA/cuBLAS
cmake .. -DLLAMA_CUBLAS=ON

# Build the project
cmake --build . --config Release

This compilation ensures that llama.cpp can offload computational tasks to the powerful GPU, significantly speeding up inference.

Downloading Models: Large Language Models (LLMs) in GGUF format were downloaded and placed into the llama.cpp/models directory. These models are the core AI components that llama.cpp will run.

# Example: Download the primary models (replace with actual model URLs if different)
# mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf
# Qwen3-30B-A3B-Q5_K_M.gguf
wget -P models/ https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/resolve/main/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf
wget -P models/ https://huggingface.co/Qwen/Qwen1.5-32B-Chat-GGUF/resolve/main/qwen1_5-32b-chat-Q5_K_M.gguf

Basic Usage: Models can be run directly from the command line using the compiled main executable.

# Assuming one is still in the 'build' directory after compilation
./bin/main -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -p "Hello, AI!" -n 128
# Or for the other model:
# ./bin/main -m ../models/qwen1_5-32b-chat-Q5_K_M.gguf -p "Hello, AI!" -n 128

Server Mode and API Access: For programmatic access to llama.cpp models, one can run llama.cpp in server mode, which exposes an HTTP API.
```
# Assuming one is in the 'build' directory
./bin/server -m ../models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -c 2048 --host 0.0.0.0 --port 8123
```
This command starts the llama.cpp server, typically listening on port 8123. For secure API access, Nginx will be configured as a reverse proxy in front of this server, enforcing API key authentication.

Running llama.cpp as a Systemd Service: To ensure llama.cpp starts automatically on boot and runs reliably in the background, it can be configured as a systemd service.

sudo nano /etc/systemd/system/llama-cpp-api.service

Add the following content to the file:

[Unit]
Description=Llama.cpp API Server
After=network.target

[Service]
Type=simple
User=user # Replace 'user' with the actual username
Group=user # Replace 'user' with the actual username
WorkingDirectory=/home/user/llama.cpp/build # Adjust if the llama.cpp clone is elsewhere
ExecStart=/home/user/llama.cpp/build/bin/server -m /home/user/llama.cpp/models/mlabonne_gemma-3-27b-it-abliterated-Q5_K_M.gguf -c 2048 --host 0.0.0.0 --port 8123
# One can change the model above to Qwen3-30B-A3B-Q5_K_M.gguf if that's the primary
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Reload systemd, enable, and start the service:

sudo systemctl daemon-reload
sudo systemctl enable llama-cpp-api.service
sudo systemctl start llama-cpp-api.service

Check service status:

sudo systemctl status llama-cpp-api.service

4. Using the Embedding Model with `llama.cpp`

Embedding models are crucial for tasks like Retrieval Augmented Generation (RAG), where you need to convert text into numerical vectors (embeddings) to find relevant information. llama.cpp can also run embedding models.

Setup and Usage:

Downloading an Embedding Model: Download a GGUF-formatted embedding model, specifically nomic-embed-text-v1.5.Q4_K_M.gguf. Place it in the llama.cpp/models directory.

# Example: Download Nomic-Embed-Text-v1.5
wget -P models/ https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q4_K_M.gguf

Running the Embedding Model: Use the embedding executable within llama.cpp to generate embeddings.
```
# Assuming one is in the 'build' directory
./bin/embedding -m ../models/nomic-embed-text-v1.5.Q4_K_M.gguf -p "This is a test sentence for embedding."
```
This command will output a vector of numbers representing the input text. This output can then be used for similarity searches in a vector database or for RAG applications.

Running the Embedding Model as a Systemd Service (API Access): To expose the embedding model via an API for programmatic access, you can run it in server mode on a dedicated port (e.g., 8124) and configure it as a systemd service.

sudo nano /etc/systemd/system/llama-cpp-embedding-api.service

Add the following content to the file:

[Unit]
Description=Llama.cpp Embedding API Server
After=network.target

[Service]
Type=simple
User=user # Replace 'user' with the actual username
Group=user # Replace 'user' with the actual username
WorkingDirectory=/home/user/llama.cpp/build # Adjust if the llama.cpp clone is elsewhere
ExecStart=/home/user/llama.cpp/build/bin/llama-cpp-embedding -m /home/user/llama.cpp/models/nomic-embed-text-v1.5.Q4_K_M.gguf --host 0.0.0.0 --port 8124
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Reload systemd, enable, and start the service:

sudo systemctl daemon-reload
sudo systemctl enable llama-cpp-embedding-api.service
sudo systemctl start llama-cpp-embedding-api.service

Check service status:

sudo systemctl status llama-cpp-embedding-api.service

Once the service is running, you can send a POST request to the `/embeddings` endpoint of this new server for programmatic access:

# Example POST request (using curl) to the llama.cpp Embedding API
# Replace api.your_domain.com:8124 with your actual API endpoint
# Replace "your_secret_api_key_1" with your actual API key (if Nginx is configured for this port)
curl -X POST \
  https://api.your_domain.com:8124/embeddings \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_secret_api_key_1" \
  -d '{
    "content": "This is a sentence to embed."
  }'

The API will return a JSON object containing the embedding vector. This allows external applications to programmatically generate embeddings using your server's GPU resources.

5. Open WebUI Setup and Configuration

Open WebUI provides a user-friendly web interface for interacting with various LLMs. It is deployed using Docker for ease of management.

Setup Steps:

Docker and Docker Compose Installation: Docker and Docker Compose were installed on the Ubuntu server to manage containerized applications.

# Install Docker
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Add the user to the docker group
sudo usermod -aG docker $USER
newgrp docker # Apply group changes immediately

Open WebUI Deployment: Open WebUI was deployed using Docker Compose.

# Example docker-compose.yml (simplified)
version: '3.8'
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "8080:8080" # Default port for Open WebUI
    volumes:
      - /data/open-webui:/app/backend/data # Persistent data for WebUI
      - /var/run/docker.sock:/var/run/docker.sock # For Docker integration
      - /data/models:/app/backend/models # Mount LLM models
    restart: always # This ensures the container automatically starts on reboot

Starting Open WebUI:
```
sudo docker compose up -d
```
Initial Access: Open WebUI would then be accessible via http://the_server_ip:8080 to create an admin user and start interacting with models.
Enabling Custom Embedding Model and Web Search in Open WebUI: Once Open WebUI is running and you've logged in as an admin, you can configure it to use your custom embedding model and enable web search.
- Access your Open WebUI interface (e.g., http://your_domain.com or http://the_server_ip:8080).
- Log in with your admin account.
- Navigate to **Settings** (usually a gear icon in the sidebar).
- For the Embedding Model:
  - Look for a section like **Connections**, **Models**, or **Integrations**.
  - You may need to add a new "Custom API" or "OpenAI Compatible API" connection.
  - Point the **API URL** to your `llama.cpp` embedding server (e.g., http://127.0.0.1:8124/v1 if using the default OpenAI-compatible endpoint, or just http://127.0.0.1:8124 if it's a direct embedding endpoint).
  - Give it a descriptive name (e.g., "Nomic Embeddings").
  - Save the connection. You might then need to select this embedding model in the RAG or document processing settings.
- For Web Search (DDGS):
  - Look for a section related to **Tools** or **Integrations**.
  - Find the option for **Web Search** or **DuckDuckGo Search** and **enable** it.
  - Ensure that the `ddgs` service (which you'll set up in the next section) is running and accessible by the Open WebUI container.
  - Save the changes.
Once configured, your LLMs in Open WebUI should be able to leverage your local embedding model for RAG tasks and perform real-time web searches using DDGS.

6. Setting up DDGS Search as a Service

Integrating DuckDuckGo Search (DDGS) allows your LLMs within Open WebUI to perform real-time web searches, providing more up-to-date and factual responses. Running it as a dedicated service enhances stability and resource management.

Setup Steps:

Install `ddgs` Python Library: This library is used by Open WebUI to interact with DuckDuckGo. You'll install it inside the Open WebUI container.

# First, get the container ID or name of your running Open WebUI container
sudo docker ps -a

# Then, execute a shell inside the container and install the package
sudo docker exec -it open-webui /bin/bash -c "pip install ddgs"

After installation, you might need to restart the Open WebUI container for the changes to take effect:

sudo docker restart open-webui

Running DDGS as a Systemd Service: While Open WebUI can call `ddgs` directly, running it as a separate service can provide more stability and dedicated resource management, especially if you plan to use it extensively or from other applications.

sudo nano /etc/systemd/system/ddgs-api.service

Add the following content to the file. This assumes you have a simple Python Flask/FastAPI app that exposes `ddgs` functionality as an API, listening on a port (e.g., `8125`).

[Unit]
Description=DDGS Search API Service
After=network.target

[Service]
Type=simple
User=user # Replace 'user' with the actual username
Group=user # Replace 'user' with the actual username
WorkingDirectory=/home/user/ddgs_api # Create this directory and place your ddgs API script here
ExecStart=/usr/bin/python3 /home/user/ddgs_api/app.py # Assuming app.py runs a simple API for ddgs
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

You would need a simple Python script (e.g., `app.py` in `/home/user/ddgs_api/`) that uses the `ddgs` library and exposes it via a web framework like Flask or FastAPI. For example, a basic `app.py`:

# /home/user/ddgs_api/app.py
from flask import Flask, request, jsonify
from duckduckgo_search import DDGS # Make sure to use DDGS as per your update

app = Flask(__name__)

@app.route('/search', methods=['POST'])
def search_ddgs():
    query = request.json.get('query')
    if not query:
        return jsonify({"error": "Query parameter is missing"}), 400
    
    try:
        results = DDGS().text(keywords=query, max_results=5) # Adjust max_results as needed
        return jsonify(results)
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8125) # Run on port 8125

Reload systemd, enable, and start the service:

sudo systemctl daemon-reload
sudo systemctl enable ddgs-api.service
sudo systemctl start ddgs-api.service

Check service status:

sudo systemctl status ddgs-api.service

7. Nginx Setup and Configuration

Nginx was set up as a reverse proxy to provide secure, web-standard access to Open WebUI, typically on port 80 (HTTP) and 443 (HTTPS), and to handle SSL/TLS encryption. It will also be used to secure the llama.cpp API endpoint using API key enforcement, hosted on a separate port.

Setup Steps:

Nginx Installation:
```
sudo apt install nginx
```

Configure Nginx for API Key Enforcement: To enforce API keys, Nginx uses the map directive to define valid keys and a variable to check against.

# /etc/nginx/conf.d/api_keys.conf
# This map defines valid API keys.
# $http_x_api_key refers to the 'X-API-Key' header sent by the client.
# $api_key_valid will be 1 if the key matches, 0 otherwise.
map $http_x_api_key $api_key_valid {
    default 0; # Default to invalid
    "your_secret_api_key_1" 1; # Replace with the actual API key
    "another_secret_api_key_2" 1; # Add more keys as needed
}

IMPORTANT: Replace "your_secret_api_key_1" and "another_secret_api_key_2" with actual, strong, randomly generated API keys. Keep these keys secure.

Creating a Consolidated Server Block (Virtual Host) for Open WebUI and llama.cpp APIs:

# /etc/nginx/sites-available/the_domain.com.conf
# HTTP server block to redirect to HTTPS for the main domain
server {
    listen 80;
    server_name your_domain.com;
    return 301 https://$host$request_uri;
}

# HTTPS server block for Open WebUI on port 443
server {
    listen 443 ssl;
    server_name your_domain.com;

    ssl_certificate /etc/letsencrypt/live/your_domain.com/fullchain.pem; # Managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/your_domain.com/privkey.pem; # Managed by Certbot
    ssl_trusted_certificate /etc/letsencrypt/live/your_domain.com/chain.pem; # Managed by Certbot

    # Include common SSL settings for security
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Managed by Certbot

    # Location for Open WebUI
    location / {
        proxy_pass http://127.0.0.1:8080; # Proxy to Open WebUI's default port
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for interactive AI chat
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

# HTTPS server block for llama.cpp main API on port 8123
server {
    listen 8123 ssl; # Listen directly on port 8123 with SSL
    server_name api.your_domain.com; # This is the dedicated API subdomain

    ssl_certificate /etc/letsencrypt/live/api.your_domain.com/fullchain.pem; # Managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/api.your_domain.com/privkey.pem; # Managed by Certbot
    ssl_trusted_certificate /etc/letsencrypt/live/api.your_domain.com/chain.pem; # Managed by Certbot

    # Include common SSL settings for security
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Managed by Certbot

    # Location for llama.cpp API
    location / {
        # Check if the API key is valid
        if ($api_key_valid = 0) {
            return 403; # Forbidden if key is invalid or missing
        }

        proxy_pass http://127.0.0.1:8123; # Proxy to llama.cpp server on its internal port 8123
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-API-Key $http_x_api_key; # Pass the API key header to backend (optional)

        # WebSocket support if llama.cpp server uses it
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

# HTTPS server block for llama.cpp Embedding API on port 8124
server {
    listen 8124 ssl; # Listen directly on port 8124 with SSL
    server_name embed-api.your_domain.com; # Dedicated subdomain for embedding API

    ssl_certificate /etc/letsencrypt/live/embed-api.your_domain.com/fullchain.pem; # Managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/embed-api.your_domain.com/privkey.pem; # Managed by Certbot
    ssl_trusted_certificate /etc/letsencrypt/live/embed-api.your_domain.com/chain.pem; # Managed by Certbot

    # Include common SSL settings for security
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # Managed by Certbot

    # Location for llama.cpp Embedding API
    location / {
        # Check if the API key is valid (optional, but recommended for external access)
        if ($api_key_valid = 0) {
            return 403; # Forbidden if key is invalid or missing
        }

        proxy_pass http://127.0.0.1:8124; # Proxy to llama.cpp embedding server on its internal port 8124
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-API-Key $http_x_api_key; # Pass the API key header to backend (optional)
    }
}
```


                Enabling the Server Block and Testing Nginx:
                    # Ensure any old symlinks are removed if separate files were used
# sudo rm -f /etc/nginx/sites-enabled/openwebui.conf
# sudo rm -f /etc/nginx/sites-enabled/llama-api.conf

# Create symlink for the consolidated configuration file
sudo ln -s /etc/nginx/sites-available/your_domain.com.conf /etc/nginx/sites-enabled/

sudo nginx -t # Test Nginx configuration for syntax errors
sudo systemctl restart nginx
                
                SSL Setup with Certbot (Let's Encrypt): For secure HTTPS access, Certbot was used to automatically obtain and configure SSL certificates from Let's Encrypt.
                    sudo apt install certbot python3-certbot-nginx
# Run Certbot for the main domain:
sudo certbot --nginx -d your_domain.com
# Run Certbot for the API subdomain:
sudo certbot --nginx -d api.your_domain.com
# Run Certbot for the Embedding API subdomain:
sudo certbot --nginx -d embed-api.your_domain.com
                    Certbot automatically modifies the Nginx configuration to include SSL directives and sets up automatic certificate renewal, ensuring sites remain secure.
                
                Firewall Configuration (UFW): The Uncomplicated Firewall (UFW) was configured to allow necessary traffic (SSH, HTTP, HTTPS, and the new API ports).
                    sudo ufw allow OpenSSH
sudo ufw allow 'Nginx Full' # Allows both HTTP (80) and HTTPS (443)
sudo ufw allow 8123/tcp # Allow the main API port
sudo ufw allow 8124/tcp # Allow the embedding API port
sudo ufw enable
                
                Hardware Firewall ACLs (Access Control Lists): Beyond the server's local UFW, it's crucial to configure the network's hardware firewall (e.g., on the router, dedicated firewall appliance, or cloud security group) to restrict incoming traffic.
                    Key Principles for Hardware Firewall ACLs:
                    
                        Deny All by Default: The most secure approach is to configure the hardware firewall to block all incoming connections by default.
                        Allow Only Necessary Ports: Explicitly create rules to permit traffic only on the ports the server needs to be accessible from the internet. This would typically include:
                            
                                Port 80 (HTTP): For initial web access and Certbot's HTTP-01 challenge (which redirects to HTTPS).
                                Port 443 (HTTPS): For secure web access to Open WebUI.
                                Port 8123 (HTTPS/API): For secure external access to the llama.cpp main API.
                                Port 8124 (HTTPS/API): For secure external access to the llama.cpp Embedding API.
                            
                        
                        Source IP Restriction (where possible): For services or APIs that should only be accessed from known locations, configure rules to only allow connections from specific source IP addresses.
                        No Internal Port Exposure: Do not directly expose internal application ports (like 8080 for Open WebUI or 8123 for llama.cpp's internal listening) to the internet without Nginx as a proxy. Nginx acts as the secure gateway.
                    
                    Consult the specific router or cloud provider documentation for exact instructions on configuring Access Control Lists (ACLs) or port forwarding rules.



        
            8. Backup Solution (BorgBackup)
            
                After experiencing issues with Timeshift's configuration persistence and its primary focus on system files,
                BorgBackup was chosen as a more flexible and robust solution for comprehensive system and user data backups.
            
            Reasons for Choosing BorgBackup:
            
                Comprehensive Coverage: Capable of backing up the entire root filesystem (/) and all user home directories in a single, unified repository.
                Easy Restoration (Mountable Archives): Borg allows one to "mount" any backup archive as a regular filesystem, enabling easy browsing and granular restoration of specific files or directories using standard file management tools.
                Deduplication: Significantly reduces storage space by only storing unique data blocks across multiple backups, making it very efficient for incremental backups of frequently changing data (like user files and app installations).
                Compression: Further reduces backup size with various compression algorithms.
                Authenticated Encryption: Provides strong, built-in encryption to secure data at rest.
                Command-Line Focused: Ideal for scripting and automation in a server environment.
            
            Setup and Usage:
            
                Installation:
                    sudo apt update
sudo apt install borgbackup
                
                Repository Initialization: A Borg repository was initialized on the dedicated data partition (/data/borg_repo), secured with a strong passphrase.
                    sudo mkdir -p /data/borg_repo
sudo borg init --encryption=repokey /data/borg_repo
                
                Custom Backup Script: A custom shell script (/usr/local/bin/backup_all.sh) was created to perform the backup.
                    sudo nano /usr/local/bin/backup_all.sh
                    This script includes:
                    
                        Backing up / and /home/.
                        Excluding common temporary, cache, and system-specific directories.
                        Creating a unique archive name based on the hostname and current timestamp.
                        Performing a repository integrity check (borg check) after the backup.
                        Compacting the repository (borg compact) to reclaim space.
                        (Note: The prune command was removed from the script to allow for manual pruning.)
                    
                
                On-Demand Execution: The script is executed manually whenever a backup is desired.
                    sudo /usr/local/bin/backup_all.sh
                    The user is prompted for the Borg repository passphrase during execution.
                
                Verification and Restoration:
                    
                        Listing Contents: sudo borg list /data/borg_repo::archive_name to see files within a specific backup.
                        Mounting for Browsing: sudo borg mount /data/borg_repo::archive_name /mnt/restore_point allows browsing the backup like a live filesystem.
                        Extracting Specific Files/Full Restore: sudo borg extract can be used to pull specific files or perform a full system restore (typically from a live Linux environment).
                    
                
            
        

        
            Conclusion
            
                This setup provides a powerful and flexible server environment. The robust hardware supports demanding AI applications,
                while Ubuntu Server offers a stable foundation. Nginx provides secure web access and API protection,
                and BorgBackup ensures that both critical system files and valuable personal data (including the installed applications)
                are reliably backed up and easily restorable, offering peace of mind against data loss or system corruption.
            
        



        
            What's Next: Utilizing the AI Server Environment
            
                With the AI server now fully configured, a powerful and versatile platform for a wide range of applications and integrations exists. This section outlines how the environment can be leveraged.
            

            How the Environment Can Be Used:
            
                Local AI Chatbot: Through Open WebUI, a private, web-based interface to interact with the locally hosted LLMs is available. This is ideal for personal use, research, or small team collaboration without relying on external cloud services.
                Retrieval Augmented Generation (RAG): By combining the local embedding model (Nomic Embed) and the ability to perform web searches (DDGS), sophisticated RAG applications can be built. This allows the LLMs to answer questions using up-to-date information retrieved from the web or from private documents (after embedding them).
                Custom AI Applications: The `llama.cpp` server APIs (for both inference and embeddings) provide programmatic access to the models. Developers can build custom applications, scripts, or services that leverage the server's AI capabilities.
                Data Processing and Analysis: The powerful CPU and GPU, combined with ample RAM and fast storage, make this server suitable for large-scale data processing, machine learning model training (for smaller models), and complex analytical tasks.
            

            Potential Applications and Integrations:
            
                Knowledge Base Chatbot: Create a RAG system by embedding personal documents (notes, research papers, manuals) and using the LLM to answer questions based on this private knowledge base, augmented by web search for external information.
                Automated Content Generation: Integrate the `llama.cpp` API into scripts to automate the generation of text, summaries, or creative content for various purposes.
                Smart Home Integration: Connect the AI server to smart home platforms to enable more intelligent voice commands, contextual automation, or personalized responses.
                Research and Development Sandbox: Use the server as a dedicated environment for experimenting with new AI models, fine-tuning existing ones, and developing custom machine learning solutions.
                Offline AI Assistant: Build a privacy-focused AI assistant that operates entirely on the local network, without sending sensitive data to external cloud providers.
                Educational Tool: Utilize the setup to teach about large language models, embeddings, and server administration in a hands-on environment.
            
            
                The possibilities are vast. The new AI server is now a versatile tool ready to power the next AI-driven project or application.

Building an AI Server with Web Interface and API Programmatic Access

1. Hardware Choices

2. Operating System Installation

Installation & Partitioning:

3. llama.cpp Setup and Configuration

Setup Steps:

4. Using the Embedding Model with llama.cpp

Setup and Usage:

5. Open WebUI Setup and Configuration

Setup Steps:

6. Setting up DDGS Search as a Service

Setup Steps:

7. Nginx Setup and Configuration

Setup Steps:

8. Backup Solution (BorgBackup)

Reasons for Choosing BorgBackup:

Setup and Usage:

Conclusion

What's Next: Utilizing the AI Server Environment

How the Environment Can Be Used:

Potential Applications and Integrations:

3. `llama.cpp` Setup and Configuration

4. Using the Embedding Model with `llama.cpp`