Documentation

Introduction & Overview

Welcome to the **VisioAI Operations Intelligence** platform. VisioAI is an enterprise cloud control plane that integrates database telemetry, runbook indexes, server logs, and real-time AI incident analysis into a unified dashboard.

By pointing your telemetry streams to VisioAI, the platform automatically diagnoses anomalies, links matching playbooks, and offers automated workflow triggers to scale replica pools, clear deadlocks, and resolve production incidents.

Quick Start Guide

Connect your services to the VisioAI Cloud in less than 5 minutes.

1. Register Your SaaS Workspace

Go to visioai.app/register to provision your tenant space and launch your workspace command center.

2. Generate an Access Token

Navigate to your Workspace Settings → API Keys and create a new key with ingest:write permissions.

3. Stream Observability Telemetry

Post logs directly to our secure ingest endpoint. Here is an example using `curl`:

curl -X POST https://api.visioai.app/v1/logs \
  -H "Authorization: Bearer <your_workspace_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "level": "error",
    "msg": "Postgres thread pool exhausted",
    "service": "checkout-api",
    "metadata": { "instance_id": "i-89af41d" }
  }'

4. Configure Auto-Mitigation Triggers

In your dashboard, map incoming incident classifications (e.g. database locking) directly to a runbook workflow trigger. When matching alert signatures occur, VisioAI can invoke automated mitigation routines.

Workspace & Org Setup

When creating an account, a default organization is provisioned for you. Workspaces represent operational partitions within that organization (e.g. `Production`, `Staging`, `Security-Sandbox`).

Organizations: Serve as the primary billing and user management partition.
Workspaces: Serve as the boundary for metrics, runbook ingestion, and active workflows.

Organizations & Isolation

Data privacy is core to VisioAI. Every organization's documents, embeddings, search indexes, and workflow logs are physically or logically separated. Custom keys are isolated at the database schema layer and Redis cache partition layer to guarantee zero cross-tenant leakage.

Knowledge Base & Retrieval

VisioAI processes uploaded manuals, runbook markdown documents, and SOP guidelines by splitting them into semantic chunks, generating 1536-dimensional embeddings, and indexing them inside a secure vector search index.

When an alert fires, VisioAI queries the index using cosine similarity to extract the most relevant incident resolution instructions.

AI Reasoning & Memory

Our AI Reasoning loop runs on top of LLM reasoning engines. It is configured with short-term context memory (storing active incidents and execution state logs) to construct dynamic mitigation steps.

Prompt Lifecycle & Providers

The reasoning engine routes operations prompts through multiple providers (e.g. OpenAI GPT-4o or Anthropic Claude 3.5 Sonnet) based on task complexity. Prompt templates are dynamically populated with the active system log state, matching runbook text chunks, and previous mitigation logs.

Observability & Telemetry

VisioAI continuously records operations telemetry:

Traces: End-to-end execution diagrams linking logs to AI diagnosis and subsequent actions.
Inference Metrics: Track input/output tokens, cost limits, and latency spikes across OpenAI and Anthropic API pipelines.

Knowledge Center

The Knowledge Center allows uploading operational files, managing text parsing chunk overlays, and checking indexing task queues (PENDING, PROCESSED, FAILED).

AI Copilot & Prompts

Our interactive terminal is where operators and team members input queries, analyze operational bottlenecks, and retrieve structured operational reasoning logs.

Workflow Builder & DAGs

Design multi-step mitigation routines in our visual DAG editor or write them as declarative YAML files. Configure automatic retry loops and manual validation gates to protect live traffic nodes.

Observability Console

Inspect active API token logs, provider latencies, and workflow runs. Real-time updates stream directly to the dashboard, providing detailed charts of your operational health.

Triggers & Conditions

Workflows can be triggered by:

Webhook alerts: Alerts pushed from external monitoring systems (e.g. Prometheus, Datadog).
Metric Thresholds: Internal rules matching active telemetry filters (e.g. error rate > 5%).

AI & Action Nodes

DAG steps consist of:

AI Diagnosis Nodes: Invoke our LLM reasoning engines to read database locks and output a recommended fix.
Approval Gate Nodes: Halt execution until verified by an operator in the support dashboard.
Action Nodes: Trigger tasks like calling scale-up scripts, rebooting pods, or clearing caches.

Best Practices

Always set strict step timeouts on action nodes and configure fallback nodes (e.g. alerting Slack/PagerDuty) to handle unexpected execution failures gracefully.

System Architecture

Below is the logical data flow map for VisioAI Cloud:

[Telemetry Ingest] ---> (Redis Buffer) ---> (NestJS Controller)
                                               |
                                               v
[Vector DB] <---------- (AI Diagnostics) <--- [PostgreSQL]
  (Pinecone/PgVector)        |
                             v
                       [Action DAGs] ---> (Kube/Cloud APIs)

RBAC & Security

VisioAI secures APIs using JSON Web Token headers. User identities are mapped to organization scopes, protecting against unauthorized access.

Tenant Isolation Design

Tenant database queries are explicitly partitioned using Prisma database row-level parameters mapping to `organizationId`. Redis keys are prefixed with matching client workspace tokens to prevent cross-contamination.

Authentication

All HTTP requests to the VisioAI Cloud APIs require a Bearer token passed in the Authorization header.

Authorization: Bearer visio_key_xxxxxxxxxxxxxxxx

Document Endpoints

Upload PDF playbooks or markdown documents into the knowledge base to feed the AI diagnostic context.

POST /v1/documents
Content-Type: application/json

{
  "title": "Postgres Lock Playbook",
  "content": "To resolve pg_locks, query pg_stat_activity and run pg_terminate_backend..."
}

Workflows Endpoints

Trigger, monitor, or check execution statuses of multi-step incident mitigation playbooks dynamically.

POST /v1/workflows/run/db-lock-resolver
{
  "vars": { "target_db": "prod-orders-db" }
}

AI & Query Endpoints

Query our AI reasoning engines directly to diagnose operational risks, search processes, and retrieve structured logs.

POST /v1/chat/reasoning
{
  "prompt": "Why is checkout-service throwing 500 error spikes?"
}