Deploying NudgeBee AI on Ollama (GPU Machine)
Overview
This guide provides detailed steps to deploy the NudgeBee AI model on a GPU machine using Ollama, including configuration for both the NudgeBee RAG server and LLM server to use the deployed model for embedding generation and inference.
Prerequisites
Before proceeding, ensure you have:
- A machine with a compatible NVIDIA GPU (CUDA enabled)
- Installed Ollama with GPU support
- A trained NudgeBee AI model in a compatible format
- Docker installed (if using a containerized setup)
- The NudgeBee RAG server and LLM server properly configured to interact with the deployed model
Step 1: Install Ollama with GPU Support
- Install Ollama:
curl -fsSL https://ollama.ai/install.sh | sh - Verify Ollama installation:
ollama --version - Ensure GPU support is enabled:
If the GPU is detected, proceed with model deployment.
nvidia-smi
Step 2: Download and Load the Model into Ollama
- Download the NudgeBee model from the NudgeBee store.
- Place the NudgeBee model file in an accessible directory.
- Import the model into Ollama:
ollama create nudgebee-ai --from nudgebee_model - Verify that the model is available:
ollama list
Step 3: Run the Ollama Model
- Start the Ollama model as a background service:
This keeps the model ready to accept queries.
ollama run nudgebee-ai - Test the model:
ollama run nudgebee-ai "Hello, how can I help?"
Step 4: Configure NudgeBee RAG Server to Use Ollama
Update the environment variables in the NudgeBee RAG server to connect to the deployed model:
Environment Variables for RAG Server
EMBEDDINGS_PROVIDER=ollama
EMBEDDINGS_MODEL_NAME=<Model name in Ollama>
EMBEDDINGS_PROVIDER_API_ENDPOINT=<Ollama embeddings endpoint URL>
Step 5: Configure NudgeBee LLM Server to Use Ollama
NudgeBee LLM server uses the OpenAI-compatible endpoint provided by Ollama and requires the following configuration:
Environment Variables for LLM Server
LLM_PROVIDER=openai
LLM_MODEL_NAME=<Model name in Ollama>
LLM_PROVIDER_API_KEY=<if Ollama is configured with security>
LLM_PROVIDER_API_ENDPOINT=<Ollama model endpoint URL>
Conclusion
You have successfully deployed the NudgeBee AI model on an Ollama-supported GPU machine and configured both the NudgeBee RAG server and LLM server to use the deployed model for embedding generation and inference.