Developer docs

The Smart Router for LLMs.

SKYLINE INTELLIGENCE gives customer applications a stable OpenAI-compatible gateway, smart model routing, quota-aware billing, and usage visibility across Claude, GPT, Gemini, DeepSeek, Qwen, and local providers.

Start in minutes API reference

OpenAI-compatible Drop-in SDK base URL

Smart routing Quality, speed, and cost

Usage ledger Tokens, credits, requests

01 Application OpenAI SDK or cURL

02 SKYLINE INTELLIGENCE Router Auth, quota, route, bill

Claude

GPT

Gemini

DeepSeek

POST /v1/chat/completions
Authorization: Bearer sk_aibrg_...
model: "auto"

Build with SKYLINE INTELLIGENCE

Recommended path for new developers

Follow this sequence when integrating a new app. It mirrors the customer console: create credentials, confirm model access, make one request, then monitor usage and quota.

1 Make your first API call Create or rotate a key, copy the Base URL, and call chat completions. 2 Understand the runtime API Use OpenAI-compatible chat first; add Responses or Messages when needed. 3 Choose a model or route automatically Use enabled model codes from your organization or ask the router to select. 4 Watch spend and reliability Review request count, tokens, credits, latency, and degraded fallback headers.

First steps

Make your first API call

The fastest path is to use SKYLINE INTELLIGENCE as the base URL for an existing OpenAI-compatible client. The public site is https://skylineintelligence.top/; customer API calls should use https://api.skylineintelligence.top/v1.

1
Create an API key
Open API Keys, create a key, and keep the raw key from the one-time creation screen. Default customer scopes include model:invoke, chat:read, and chat:write.
2
Copy the Base URL
Use the console connection panel or call GET /api/connection-info with your console session token. The returned baseUrl should be https://api.skylineintelligence.top/v1.
3
Send one chat request
Use model: "auto" for router-owned selection, or choose a specific model from /v1/models or the Model Plaza.

Connect

Use existing OpenAI clients

Most apps only need two changes: set baseURL to SKYLINE INTELLIGENCE and replace the API key. The request body stays familiar.

export SKYLINE_BASE_URL="https://api.skylineintelligence.top/v1"
export SKYLINE_API_KEY="sk_aibrg_your_key"

curl "$SKYLINE_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $SKYLINE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      { "role": "user", "content": "Explain SKYLINE INTELLIGENCE in one sentence." }
    ]
  }'

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.SKYLINE_API_KEY,
  baseURL: process.env.SKYLINE_BASE_URL
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [
    { role: "user", content: "Write a launch checklist." }
  ],
  stream: false
});

process.stdout.write(response.choices[0].message.content);

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["SKYLINE_API_KEY"],
    base_url=os.environ["SKYLINE_BASE_URL"],
)

response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "user", "content": "Summarize this product for a CTO."}
    ],
)

print(response.choices[0].message.content)

API reference

Runtime API surface

Customer traffic goes through the runtime service under /v1. Start with chat completions unless you specifically need Responses, Anthropic-style Messages, or file operations.

GET /v1/models List models visible to the caller's organization.

GET /v1/models/{modelId} Fetch one enabled model. Returns not found or forbidden if unavailable.

POST /v1/chat/completions Primary OpenAI-compatible chat endpoint. Supports streaming SSE.

POST /v1/responses Responses-style input mapped into SKYLINE INTELLIGENCE chat routing.

POST /v1/messages Anthropic-style Messages surface for compatible clients.

POST /v1/messages/count_tokens Estimate input tokens before sending a Messages request.

POST /v1/files Upload files for file-aware model workflows.

Authentication

Pass Authorization: Bearer sk_aibrg_.... Runtime also accepts x-api-key for Anthropic-style clients.

Request IDs

Responses include a request-id header. Keep it with application logs so support can trace routing and billing events.

Streaming

OpenAI-compatible streams emit SSE chunks and terminate with data: [DONE]. Billing continues to drain provider usage even if the client disconnects.

Models and routing

Choose a model, or let the router choose

SKYLINE INTELLIGENCE separates customer-facing model codes from upstream provider details. Your organization only sees models enabled by an admin. The router then picks an eligible channel based on model access, capability requirements, priority, weight, health, and fallback availability.

Use a specific model

Call GET /v1/models or open Model Plaza, then send a known model code. This is best when your app has strict quality, context, or cost expectations.

{
  "model": "gpt-4o-mini",
  "messages": [{ "role": "user", "content": "Hello" }]
}

Use automatic routing

Send model: "auto" when the organization wants SKYLINE INTELLIGENCE to choose an eligible route. The router considers request features like streaming, tool calls, and vision before selecting a target.

{
  "model": "auto",
  "stream": true,
  "messages": [{ "role": "user", "content": "Draft release notes" }]
}

AccessOrg model enabled

AbilityTools, vision, stream

RoutePriority and weight

ProtectQuota and circuit state

Billing and usage

Every request is authorized before it runs

SKYLINE INTELLIGENCE reserves estimated quota and billing capacity before the provider call, then settles against actual usage after completion. That keeps quota enforcement predictable during concurrent traffic and gives finance teams a request-level ledger.

Before request Quota reservation

Checks API key status, organization status, enabled model access, key rate limits, and estimated cost.

After response Credit settlement

Calculates billable input, output, cache, and tool dimensions from provider usage.

In console Usage analytics

View requests, tokens, credits, latency, status, model usage, and usage by key.

GET/api/usage/summaryAggregate by model with optional time range and comparison.

GET/api/usage/summary/by-keyAggregate by API key and project.

GET/api/usage/recordsRequest-level rows with token, latency, credit, status, and time.

GET/api/billing/ledgerLedger entries for customer credit movement.

Console workflow

Operate the gateway from the customer UI

The docs map directly to the current console navigation so operators and developers can speak the same language.

USER API Keys

Create, rotate, disable, and delete keys. Copy the Base URL and first cURL from the quick-connect panel.

USER Model Plaza

Search enabled models by provider, capability, model code, context window, and per-token pricing.

USER Usage

Review by model, by key, or at request-detail level with credits, tokens, latency, and status.

ORG ADMIN Available Models

Enable or disable models distributed to the organization and manage member quotas.

SUPER ADMIN Provider Services

SUPER ADMIN Routing

Bind model abilities to channels, priority, weight, provider model codes, and adapter regions.

Errors and fallback

Handle failures explicitly

Runtime errors use a stable JSON shape with a request ID. Rate limits and temporary service pressure include Retry-After. Fallback responses expose degradation metadata so your app can decide whether to display, retry, or audit the result.

Error body

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "API key rate limit exceeded"
  },
  "request_id": "req_..."
}

Fallback headers

X-SKYLINE-Degraded: true
X-SKYLINE-Fallback-Reason: channel_unhealthy
X-SKYLINE-Fallback-From: claude-sonnet
X-SKYLINE-Fallback-To: gpt-4o-mini

Before live traffic

Go-live preflight

Use this as a final client-integration check before an app sends real traffic through SKYLINE INTELLIGENCE. It is not a platform production runbook.

Store API keys in a secret manager, never in code or client-side bundles. Log request-id, selected model, latency, and fallback headers for every call. Set per-key budgets and expiration dates for services, jobs, and developer sandboxes. Use streaming for long responses and set client-side read timeouts. Watch usage by key during launch, then tune model choices or automatic routing policy. Treat 429, 402, and fallback-degraded responses as separate operational states.