Deploying NudgeBee AI on Ollama (GPU Machine)

Overview

This guide provides detailed steps to deploy the NudgeBee AI model on a GPU machine using Ollama, including configuration for both the NudgeBee RAG server and LLM server to use the deployed model for embedding generation and inference.

Prerequisites

Before proceeding, ensure you have:

A machine with a compatible NVIDIA GPU (CUDA enabled)
Installed Ollama with GPU support
A trained NudgeBee AI model in a compatible format
Docker installed (if using a containerized setup)
The NudgeBee RAG server and LLM server properly configured to interact with the deployed model

Step 1: Install Ollama with GPU Support

Install Ollama:

curl -fsSL https://ollama.ai/install.sh | sh

Verify Ollama installation:
```
ollama --version
```
Ensure GPU support is enabled:
```
nvidia-smi
```
If the GPU is detected, proceed with model deployment.

Step 2: Download and Load the Model into Ollama

Download the NudgeBee model from the NudgeBee store.
Place the NudgeBee model file in an accessible directory.

Import the model into Ollama:

ollama create nudgebee-ai --from nudgebee_model

Verify that the model is available:
```
ollama list
```

Step 3: Run the Ollama Model

Start the Ollama model as a background service:
```
ollama run nudgebee-ai
```
This keeps the model ready to accept queries.

Test the model:

ollama run nudgebee-ai "Hello, how can I help?"

Step 4: Configure NudgeBee RAG Server to Use Ollama

Update the environment variables in the NudgeBee RAG server to connect to the deployed model:

Environment Variables for RAG Server

EMBEDDINGS_PROVIDER=ollama
EMBEDDINGS_MODEL_NAME=<Model name in Ollama>
EMBEDDINGS_PROVIDER_API_ENDPOINT=<Ollama embeddings endpoint URL>

Step 5: Configure NudgeBee LLM Server to Use Ollama

NudgeBee LLM server uses the OpenAI-compatible endpoint provided by Ollama and requires the following configuration:

Environment Variables for LLM Server

LLM_PROVIDER=openai
LLM_MODEL_NAME=<Model name in Ollama>
LLM_PROVIDER_API_KEY=<if Ollama is configured with security>
LLM_PROVIDER_API_ENDPOINT=<Ollama model endpoint URL>

Conclusion

You have successfully deployed the NudgeBee AI model on an Ollama-supported GPU machine and configured both the NudgeBee RAG server and LLM server to use the deployed model for embedding generation and inference.

Overview​

Prerequisites​

Step 1: Install Ollama with GPU Support​

Step 2: Download and Load the Model into Ollama​

Step 3: Run the Ollama Model​

Step 4: Configure NudgeBee RAG Server to Use Ollama​

Environment Variables for RAG Server​

Step 5: Configure NudgeBee LLM Server to Use Ollama​

Environment Variables for LLM Server​

Conclusion​