Intelligent NRP + Kubernetes Routing & Management System
An intelligent router that classifies user input as command or explanation and dispatches accordingly, integrating NRP LLMs with Kubernetes operations.
Official GSoC 2025 Project
Project ID: fPp1JXbl
👤 Contributor
Manish K Reddy
GSoC 2025 Participant
🏛️ Mentor Organization
UC OSPO
University of California Open Source Program Office
🚀 Program
Google Summer of Code 2025
Open Source Research Experience (OSRE)
📋 Official Project Description
Develop an intelligent router that classifies user input as command (kubectl operations) or explanation (documentation/guidance) and dispatches accordingly. The system integrates NRP LLMs with Kubernetes to provide both actionable operations and contextual answers, delivering a clean, extensible Python package suitable for real cluster use and future observability features.
Project Goals
The primary objective was to develop an intelligent router that classifies user input as command (kubectl ops) or explanation (docs/guidance) and dispatches accordingly, integrating NRP LLMs with Kubernetes to provide both actionable operations and contextual answers.
High-level Objectives
- Build an intelligent router that classifies user input as command (kubectl ops) or explanation (docs/guidance) and dispatches accordingly
- Integrate NRP LLMs with Kubernetes to provide both actionable operations and contextual answers
- Deliver a clean, extensible Python package suitable for real cluster use and future observability features
What I Built
Intent Router
intelligent_router.py: LLM-aided + keyword fallback classification into COMMAND
, EXPLANATION
, or UNCLEAR
.
K8s Ops Module
systems/k8s_operations.py: CRUD for pods/deployments, list/describe, logs/exec; defaults to gsoc
namespace.
Interactive Shell & CLI
cli.py: one-shot or chat-style interactive usage.
Modular Package Structure
Environment templating (config/default.env
), clean separation of core/system logic, and cache isolation.
LLM Integration
core/nrp_init.py: NRP API setup, model selection (e.g., gemma3
) with graceful error handling + fallbacks.
Architecture & Components
nrp_k8s_system/
├── intelligent_router.py # Routing: classify → dispatch (command/explanation)
├── cli.py # CLI entry points (single-shot & REPL)
├── core/
│ └── nrp_init.py # NRP LLM init / model config
├── systems/
│ ├── k8s_operations.py # K8s CRUD ops, logs, exec, list/describe
│ └── qain.py # (optional) doc-answering scaffolding
├── config/
│ └── default.env # Env template (NRP creds, model, base URL)
├── requirements.txt
└── pyproject.toml
Data Flow
User input → intelligent_router
(LLM + heuristics) →
• COMMAND → k8s_operations
(Python K8s client)
• EXPLANATION → NRP LLM answers with contextual guidance
• UNCLEAR → safe defaults & clarification prompts
Project Timeline
Planning
Architecture design and requirements gathering
Core Development
Router and K8s operations implementation
Integration
NRP LLM integration and testing
Deployment
Testing on Nautilus cluster
Current State
Status: Shipped - a working router + K8s ops package tested on Nautilus (gsoc
namespace)
✅ Working Features
- Robustness: LLM unavailable? Fallback classifier keeps command path operational
- Docs & Examples: README includes usage patterns, examples, and troubleshooting
- Intelligent router with LLM + keyword fallback
- Complete K8s operations (CRUD, logs, exec)
- Interactive CLI and one-shot commands
- NRP LLM integration with graceful fallbacks
What's Left / Next Steps
Resource Expansion
Broaden resource coverage: Jobs, StatefulSets, ConfigMaps, Services (advanced), PVCs
Observability
Observability integrations: Prometheus queries, DCGM GPU telemetry, alerting hooks
Safety Features
Uncertainty handling: add Conformal Prediction / CROQ loops for safer tool-calling
CI/CD
CI/CD: unit/integration tests and GitLab pipelines
Code Links
Main Repository: gitlab.nrp-nautilus.io/mreddy10/breakwater
Repository Link: https://gitlab.nrp-nautilus.io/mreddy10/breakwater
Demo Scenarios
Command Path Examples
# List resources
python -m nrp_k8s_system.intelligent_router "list my pods"
# Create deployment
python -m nrp_k8s_system.intelligent_router "create deployment web image=nginx replicas=3"
# Inspect logs
python -m nrp_k8s_system.intelligent_router "logs web-abcdef-12345"
Explanation Path Examples
python -m nrp_k8s_system.intelligent_router "How do I request A100 GPUs?"
python -m nrp_k8s_system.intelligent_router "What are storage best practices on Nautilus?"
✅ Works out-of-the-box with .env
and a configured kube context.
Challenges & Learnings
Multi-tenant K8s Safety
Navigating RBAC/service accounts & default namespaces cleanly required careful design of permissions and namespace isolation.
LLM + Ops Integration
Designing a router that stays useful even when the LLM is slow/unavailable required robust fallback mechanisms and graceful degradation.
DX & Extensibility
Keeping the package simple to install, configure, and extend while maintaining powerful functionality required careful architectural decisions.
Frequently Asked Questions
How to Use / Reproduce
Prerequisites
Quick Start
# 1) Install
pip install -r requirements.txt
# or
pip install -e .
# 2) Configure
cp config/default.env .env
# then edit .env:
# NRP_API_KEY=YOUR_KEY
# NRP_BASE_URL=https://llm.nrp-nautilus.io/
# NRP_MODEL=gemma3
# 3) Run one-shot commands
python -m nrp_k8s_system.intelligent_router "list my pods"
python -m nrp_k8s_system.intelligent_router "create deployment web image=nginx replicas=3"
python -m nrp_k8s_system.intelligent_router "How do I request GPUs?"
# 4) Interactive mode
python -m nrp_k8s_system.intelligent_router
✅ Defaults to the gsoc
namespace. Works with local kubeconfig or in-cluster config.
Mentor Organization
UC OSPO - University of California Open Source Program Office
Bolstering academic research through open source
The UC OSPO Network is a groundbreaking initiative that harnesses the collective power of six UC campuses to revolutionize open-source practices in academia. UC Santa Cruz OSPO serves as a mentor organization in Google Summer of Code 2025, supporting students through the Open Source Research Experience (OSRE) program.
🎓 Project Context & Mentorship
This project was developed as part of the National Research Platform (NRP), a distributed cyberinfrastructure for scientific computing. Work was conducted under the guidance of mentors from the San Diego Supercomputer Center (SDSC), with Mohammad Firas Sada providing direct technical mentorship on distributed systems architecture and Kubernetes integration.
🎯 Mission
Institutionalize open source practices across the University of California system while providing students hands-on experience with expert mentors.
📊 Impact
Summer 2024 OSRE supported 40 students working on Open Source and Reproducibility projects through GSoC and NSF FAIROS RCN program.
🌐 Network
Collaboration across UC campuses: Santa Cruz, Berkeley, Davis, Los Angeles, Santa Barbara, and San Diego, supported by Alfred P. Sloan Foundation.
Research Infrastructure
National Research Platform (NRP)
A distributed cyberinfrastructure supporting scientific computing across research institutions. This project contributes to NRP's mission of providing seamless access to computational resources through intelligent routing and management systems.
San Diego Supercomputer Center (SDSC)
Leading computational research facility providing advanced cyberinfrastructure and expert mentorship. SDSC researchers guided this project's development, ensuring alignment with production-scale research computing needs.
🔬 Research Impact
This intelligent routing system addresses real challenges in research computing environments, where users need both operational control and educational guidance when working with complex Kubernetes-based scientific workflows on the NRP infrastructure.
Acknowledgments
🎓 Research Mentorship
Mohammad Firas Sada (SDSC) - For exceptional technical mentorship and guidance on distributed systems architecture throughout the project
🌐 Platform Infrastructure
National Research Platform - For providing the distributed cyberinfrastructure context and real-world research computing environment
🤝 Community
Nautilus Community - Community members who helped validate RBAC configurations and cluster access patterns
🌟 Program
Google Summer of Code 2025 - For this incredible opportunity to contribute to open source research