RUNE DIGITAL // RESEARCH

Multi-Provider AI Infrastructure

Vertex AI + NVIDIA NIM + Ollama Hybrid Stack

Executive Summary: RUNE operates a hybrid AI infrastructure combining Google Vertex AI (Gemini 1.5 Pro), NVIDIA NIM APIs (developer access), and Ollama local inference (rune-ai-v1 on RTX 4070). This multi-provider approach optimizes for cost, latency, and capability—using cloud APIs for complex reasoning and local models for high-frequency, low-latency tasks.

Live API Configuration

Active API Integrations

Currently configured and operational in CONFIG/RUNE_CONFIG.json:

NVIDIA Integration

Currently using NVIDIA developer APIs. Google rep recommended NVIDIA Inception program (application pending).

Ollama Local Stack

Three models running locally on RTX 4070 Super (16GB VRAM):

Hardware Infrastructure

Production Rigs

Documented in STATE.json hardware_stack:

Inference Stack

The Google Cloud rep specifically recommended pursuing NVIDIA Inception for additional compute credits and support. Current strategy: Use NVIDIA NIM APIs for cost-effective inference, escalate complex multi-modal tasks to Gemini 1.5 Pro, and run high-frequency orchestration locally via Ollama. This hybrid approach keeps costs near $0 for development while maintaining access to premium capabilities.

See It Live

Neural Hub

See Vertex AI integration in action. Multi-model orchestration with real-time cost tracking.

OPEN DASHBOARD →

Ollama Docs

Documentation for local LLM inference. Run models on consumer hardware at $0/month.

OLLAMA DOCS →

NVIDIA NIM

Developer documentation for NVIDIA inference endpoints. Enterprise-grade model hosting.

NVIDIA NIM →

Vertex AI Docs

Official Google Cloud documentation for Vertex AI platform and Gemini models.

READ DOCS →