velo-query · GPU profile-query CLI

VeloQ

Pure CLI in / JSON contract out. No GUI required.

An agent-friendly profile-query CLI for Nsight Systems timelines and Nsight Compute kernel reports plus PyTorch/Kineto Chrome traces — one binary, one versioned JSON envelope, one shot per call. Built so a coding agent (or a shell script) can reason about GPU profiles without opening a GUI.

Get started View on GitHub

curl -fsSL
              https://raw.githubusercontent.com/lucifer1004/veloq/main/scripts/install.sh
              | bash

why VeloQ

Built for the way agents read profiles

VeloQ reads exported profiler evidence; it does not replace nsys or ncu. It gives programs a stable, compact contract for asking profile questions.

// stable contract

Versioned JSON envelope

Every success is a v1 envelope; every list response is canonical data.rows[] with a stable per-row key. Parse .data or .error, never stderr.

// token economy

Shaped rows, not dumps

Headline columns plus truncation signals (count vs total_matched) so an agent spends tokens on answers, not on scrolling raw text.

// scriptable

One shot, pipe to jq

Stateless calls, CSV/table projections where useful, and stable keys that make two captures diffable across a run.

// extensible

Three sources, one shape

Nsight Systems, Nsight Compute, and PyTorch/Kineto behind a pluggable ProfileSource trait. PyTorch covers Perfetto-style Chrome traces.

report-ready figures

Static timeline SVGs, generated from the CLI

veloq viz timeline writes a bounded NSys timeline artifact while stdout stays a JSON envelope. Agents can cite the row metadata, use per-track density bins to keep dense windows readable, and embed the SVG directly in a report. See the NSys timeline report example for a scrubbed end-to-end output.

Example VeloQ NSys timeline SVG with multi-GPU streams, NVTX lanes, density bins, and highlighted kernels — Example `viz timeline` artifact with top-kernel highlights, per-device NVTX lanes, and density bins.

query → jq → answer

A query is one line

Ask a question, get rows, reshape with jq. No session state or scraping.

$ ask

# top 3 kernels by total GPU time
veloq stats trace.nsys-rep --type kernel \
  --sort total:desc --limit 3 \
  | jq '.data.rows[]
      | {name, ms: (.total_ns/1e6)}'

→ get

{ "name": "ampere_sgemm_128x64_nn", "ms": 812.4 }
{ "name": "elementwise_kernel",      "ms": 506.1 }
{ "name": "ncclDevKernel_AllReduce",  "ms": 333.9 }

how it compares

Compared with common alternatives

Agents often fall back to the Nsight GUI, raw tool text in context, or hand-rolled SQLite + jq. VeloQ keeps the same work scriptable and contract-shaped.

	Nsight GUI	Raw nsys/ncu text	SQLite + jq	VeloQ
Scriptable / one-shot	✗	~ ad hoc	✓	✓
Token-efficient for an agent	n/a	✗ broad dumps	~	✓ shaped rows
Stable typed contract	✗	✗ free text	✗ schema you own	✓ versioned envelope
Cross-capture diffable	✗	✗	~	✓ stable per-row key
Zero setup per query	✓	✓	✗	✓

When not to use VeloQ: use the Nsight GUI for interactive timeline exploration or one-off visual inspection. Use VeloQ for programmatic, repeatable, agent- or script-driven querying.

the surface

One tool, one envelope, many verbs

NSys verbs are hoisted to the top level (also under veloq nsys ...); NCU and PyTorch/Kineto verbs live under veloq ncu ... and veloq pytorch ....

// Nsight Systems

summarystatssearch inspectcorrelategaps timelineviz timelineconcurrencygraph-replays sliceshardwaremetrics ncu-commandprepcorrelation-stats schema

// Nsight Compute

summarylaunchesinspect metricsdisasmranges graphssourcessource-metrics warp-stallsschema

// PyTorch/Kineto

summarysearchinspectstatscorrelatetimelineslicescollectivesprepschema

// meta

infosourcesclean recipesself-update

install

One script, three skills

Installs the veloq binary plus three Agent Skills (nsys-profile-analysis, ncu-profile-analysis, and pytorch-profile-analysis) under ~/.agents/skills/. Linux x86_64/aarch64 and macOS x86_64/arm64.

# quick install (binary + skills)
curl -fsSL https://raw.githubusercontent.com/lucifer1004/veloq/main/scripts/install.sh | bash

# keep it current (binary AND bundled Agent Skills)
veloq self-update

# build from source
cargo build --release -p veloq   # → target/release/veloq

Re-running the installer or veloq self-update refreshes bundled Agent Skills and removes stale skill files.

Codex plugin install from a VeloQ checkout: codex plugin marketplace add ., then codex plugin add veloq@veloq. The plugin installs Agent Skills only; they still require the VeloQ CLI for evidence extraction.

Claude Code plugin metadata is also provided: /plugin marketplace add https://github.com/lucifer1004/veloq.git then /plugin install veloq@veloq. Codex plugin metadata lives under .codex-plugin/; Claude plugin metadata remains under .claude-plugin/. Full docs in the README.