PodSage-AI

🧠 PodSage AI

AI-Powered Kubernetes Observability & Infrastructure Intelligence

Real-time telemetry • AI-driven anomaly detection • Operational intelligence

🏢 About PodSage AI

PodSage AI is the organization behind this repository. The PodSage-AI project is a flagship open-source platform built by PodSage AI to deliver intelligent Kubernetes observability, AI-assisted anomaly detection, and infrastructure intelligence.

📖 Overview

PodSage-AI is a flagship project built by the PodSage AI organization. It is an intelligent Kubernetes observability platform that monitors, analyzes, and correlates real-time infrastructure behavior using AI-powered operational insights.

The project includes a React/Vite frontend dashboard for interactive metric visualization, dependency maps, and AI insights.

Built for the ABB Accelerator 2026 challenge, PodSage-AI combines Kubernetes telemetry, Prometheus metrics, anomaly detection, dependency analysis, and infrastructure intelligence into a unified monitoring ecosystem.

The mission is simple:

Transform raw Kubernetes metrics into actionable operational intelligence.

❓ Why PodSage AI?

Traditional observability platforms expose metrics.

PodSage AI focuses on transforming telemetry into actionable operational intelligence using AI-assisted infrastructure analysis.

Instead of only showing dashboards, PodSage AI helps explain:

why infrastructure issues happen
which services are affected
how anomalies correlate
what actions engineers should take

✨ Core Features

📡 Real-time Kubernetes monitoring
🧠 AI-powered anomaly detection
🔥 Infrastructure intelligence engine
📈 CPU, memory & restart analytics
🔗 Pod dependency mapping
⚡ WebSocket live updates
📊 Prometheus integration
🐳 Dockerized deployment
☸️ Kubernetes-native architecture
🧩 Modular AI service architecture
🛡️ Fault-tolerant metric fallback handling
🚀 Lightweight FastAPI backend

🏗️ System Architecture

flowchart LR

    A["Applications / Microservices"]

    B["Data Collection Layer
    • Prometheus
    • Node Exporter
    • kube-state-metrics
    • cAdvisor"]

    C["AI Intelligence Layer
    • CPU Analysis Engine
    • Memory Analysis Engine
    • Dependency Mapper
    • Correlation Engine"]

    D["Infrastructure Intelligence Layer
    • Prometheus
    • SQLite
    • Loki
    • ML Models"]

    E["Dashboard & Visualization Layer
    • React / Vite
    • Recharts
    • React Flow
    • WebSockets"]

    A --> B
    B --> C
    C --> D
    D --> E

⚙️ Tech Stack

Backend

Python 3.11
FastAPI
Uvicorn
WebSockets
SQLite

Monitoring & Metrics

Prometheus
Node Exporter
Kubernetes Metrics API
cAdvisor

Infrastructure

Docker
Docker Compose
Kubernetes
Minikube
K3s
MicroK8s

AI & Analysis

AI-assisted anomaly detection
Infrastructure correlation engine
Forecast-ready analytics architecture
Operational intelligence pipeline

Frontend

React
Vite
Recharts
React Flow
WebSocket live dashboard

📁 Project Structure

PodSage-AI/
├── backend/
│   ├── app/
│   │   ├── api/
│   │   ├── database/
│   │   ├── models/
│   │   ├── services/
│   │   ├── websocket/
│   │   └── main.py
│   │
│   ├── Dockerfile
│   ├── docker-compose.yml
│   ├── prometheus.yml
│   ├── requirements.txt
│   └── podsage.db
├── frontend/
│   ├── index.html
│   ├── package.json
│   ├── README.md
│   ├── vite.config.js
│   ├── src/
│   │   ├── App.jsx
│   │   ├── main.jsx
│   │   ├── api/
│   │   │   └── client.js
│   │   ├── components/
│   │   │   ├── AIInsights.jsx
│   │   │   ├── AnomalyTable.jsx
│   │   │   ├── ClusterSummary.jsx
│   │   │   ├── DependencyGraph.jsx
│   │   │   ├── Header.jsx
│   │   │   ├── JsonPreview.jsx
│   │   │   ├── MetricCard.jsx
│   │   │   └── SeriesChart.jsx
│   │   ├── styles/
│   │   │   └── global.css
│   │   └── utils/
│   │       └── metrics.js
├── README.md
├── LICENSE
└── .gitignore

🚀 Getting Started

Prerequisites

Before starting, ensure you have:

Python 3.11+
Docker & Docker Compose
Kubernetes cluster (optional but recommended)
Prometheus installed or accessible

📦 Installation

1. Clone Repository

git clone https://github.com/PodSageAI/PodSage-AI.git
cd PodSage-AI/backend

2. Create Virtual Environment

python -m venv venv

Linux / macOS

source venv/bin/activate

Windows

venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Frontend Setup

cd ../frontend
npm install

▶️ Running the Backend

Local Development

uvicorn app.main:app --reload

Backend URL:

http://localhost:8000

Swagger Documentation:

http://localhost:8000/docs

ReDoc Documentation:

http://localhost:8000/redoc

▶️ Running the Frontend

Local Development

cd frontend
npm run dev

Frontend URL:

http://localhost:5173

🎥 Demo

Live Metrics API

curl http://localhost:8000/metrics/cpu

Open Swagger UI

http://localhost:8000/docs

🐳 Docker Usage

Start Services

docker compose up --build

Stop Services

docker compose down

Run in Detached Mode

docker compose up -d

☸️ Kubernetes Deployment

Apply Kubernetes Resources

kubectl apply -f k8s/

Verify Pods

kubectl get pods

Port Forward Backend

kubectl port-forward svc/podsage-ai 8000:8000

📡 API Endpoints

Health Endpoints

Endpoint	Description
`/`	Root status
`/health`	Health check

Metrics Endpoints

Endpoint	Description
`/metrics/cpu`	CPU metrics
`/metrics/memory`	Memory metrics
`/metrics/restarts`	Restart metrics

AI & Intelligence Endpoints

Endpoint	Description
`/anomalies`	Detected anomalies
`/insights`	AI-generated insights
`/dependencies`	Dependency mapping

📘 Example API Responses

CPU Metrics

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1778683850.411,
          "0.2482235237555631"
        ]
      }
    ]
  }
}

Anomaly Detection

[
  {
    "type": "High CPU Usage",
    "pod": "node-exporter:9100",
    "value": 24.82,
    "unit": "%"
  }
]

AI Insights

[
  {
    "pod": "node-exporter:9100",
    "insight": "Pod node-exporter:9100 is consuming unusually high CPU resources.",
    "recommendation": "Consider scaling replicas or optimizing workload."
  }
]

🧠 AI Capabilities

Current AI functionality includes:

High CPU usage detection
High memory usage detection
Restart anomaly detection
Infrastructure correlation
Dependency intelligence

Default Thresholds

CPU_THRESHOLD = 0.2
MEMORY_THRESHOLD = 500000000
RESTART_THRESHOLD = 5

🛡️ Fault-Tolerant Monitoring

PodSage AI automatically falls back to node-level metrics when container-level Kubernetes metrics are unavailable.

This ensures monitoring continuity even in partially configured environments.

Example fallback query:

1 - avg(rate(node_cpu_seconds_total{mode="idle"}[1m]))

📊 Observability Workflow

Kubernetes metrics are scraped via Prometheus
Metrics are processed by intelligence services
Infrastructure anomalies are detected
Correlation engine generates operational insights
Real-time updates stream through WebSockets
Dashboards visualize cluster intelligence

✅ Current Capabilities

Live CPU monitoring
Memory monitoring
Pod restart tracking
AI anomaly detection
Infrastructure insights
Dependency mapping
Prometheus querying
Real-time backend APIs
Node-level fallback monitoring

🧪 Example Use Cases

Detect abnormal pod CPU spikes
Identify memory leaks across services
Correlate pod restart storms
Monitor Kubernetes cluster health
Analyze infrastructure dependencies
Stream live telemetry dashboards

🛣️ Roadmap

🤖 LLM-powered operational intelligence
📚 NLP infrastructure querying
📈 Predictive forecasting
🔗 Advanced dependency graph visualization
🧠 ML-based anomaly scoring
🌐 Multi-cluster observability
⚡ Intelligent auto-remediation
🛰️ eBPF network tracing

🏆 ABB Accelerator 2026

PodSage AI was developed as part of the ABB Accelerator 2026 innovation challenge focused on:

AI-powered infrastructure intelligence
Kubernetes observability
Cloud-native analytics
Operational automation

🤝 Contributing

Contributions are welcome.

Steps to Contribute

1. Fork the Repository

2. Create a Feature Branch

git checkout -b feature/my-feature

3. Commit Changes

git commit -m "Add new feature"

4. Push to Branch

git push origin feature/my-feature

5. Open a Pull Request

👥 Maintainers

Abhrankan Chakrabarti
PodSage AI Team

� Release Notes

PodSage AI v0.1.4-alpha

Improvements

Added React/Vite frontend dashboard
Added interactive metrics, anomaly, and dependency visualization
Added WebSocket live update support for the UI
Improved frontend/backend integration and setup documentation
Updated project structure to include the frontend/ app

Frontend Enhancements

Added React-based chart and dashboard components
Added Recharts and React Flow for visual analytics
Added responsive UI for AI insights, anomaly tables, and service dependency maps

Backend Enhancements

Retained resilient Prometheus query handling
Preserved backend stability during Prometheus downtime
Improved API stability for frontend consumption

Infrastructure

Added frontend build and development workflow
Improved documentation for running frontend and backend together

📄 License

📌 Project Status

Version: v0.1.4-alpha
Status: Active Development

🌟 Vision

PodSage AI aims to evolve into a next-generation autonomous infrastructure intelligence platform capable of understanding, predicting, and optimizing Kubernetes environments in real time.

Future versions aim to transition from observability into fully autonomous operational intelligence.

Built with ❤️ for cloud-native infrastructure intelligence

This site is open source. Improve this page.