Tuesday, June 23, 2026
Merged timeline of 266 items — blog publish times and listing timestamps, cut at midnight . Page 1 of 6.
- LLM
Merged timeline of 266 items — blog publish times and listing timestamps, cut at midnight . Page 1 of 6.
Mistral OCR 4 extracts and structures content from documents, featuring bounding boxes, block classification, and inline confidence scores in 170 languages. It excels in multilingual document processing and is designed…
Unlimited OCR is designed for one-shot long-horizon parsing of documents. It enhances the capabilities of previous OCR models, enabling efficient document processing.
Apply UX thinking to improve product decisions and user flows.
Pose classification using ST-GCN (Spatial Temporal Graph Convolutional Network). Classifies skeleton sequences
Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or…
Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major),…
Sparse4D for multi-camera temporal 3D object detection and tracking. Uses sparse queries with deformable
Run `tao-daft validate` to check NVIDIA TAO DAFT datasets for structure, schema, and cross-reference errors. Do
Standard single-step train/eval/export workflow for any TAO model. Use when training a TAO model on a dataset
Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tas…
Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning.
Use when adding, modifying, optimizing, or debugging CuTile autotuning code. Trigger signals: `exhaustive_search` / `replace_hints` / `hints_fn` / `cuda.tile.tune` in code, `autotune` in filenames, or correctness/perfor…
Use to deploy the vss-behavior-analytics service standalone (entrypoint, config-source, optional calibration). Not for the full warehouse deploy.
Use to deploy the vss-video-analytics-api REST service standalone (config-source, data-log bind, Elasticsearch, optional Kafka). Not for full warehouse deploy.
Use this skill to run top-level VSS fusion search on archived video, or to ingest video files / RTSP streams for search. Do NOT use for ad-hoc visual Q&A (use vss-ask-video), live captioning (use vss-deploy-dense-captio…
Use this skill when reading video-analytics metrics, incidents, alerts, and sensor data via the VA-MCP server (port 9901). Not for live VLM or incident-range narrative reports.
Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.
Use for VSS alert workflows — real-time monitoring, Alert-Bridge subscriptions, Slack notifications, incident queries, camera onboarding. Not for non-alert analytics.
Use to run AutoMagicCalib on local MP4s, RTSP, or the bundled sample dataset, and to deploy vss-auto-calibration when needed. Do not use for non-AMC calibration or runtime analytics.
OneFormer for universal image segmentation. Unifies panoptic, instance, and semantic segmentation with a
Visual ChangeNet for binary image classification and segmentation in AOI defect detection. Use when training,
Optical Inspection for defect detection using Siamese networks. Compares image pairs to detect manufacturing
OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC
Person re-identification (ReID). Learns discriminative embeddings to match the same person across different
RT-DETR (Real-Time DEtection TRansformer) for 2D object detection. Designed for real-time inference with
Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent schedu…
Converts cuTile GPU kernels (@ct.kernel) to Triton (@triton.jit). Handles standard in-repo conversion, debugging (cudaErrorIllegalAddress, shape mismatch, numerical mismatch), and mapping cuTile idioms (ct.load/ct.store…
Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to…
Use this skill when deploying standalone RT-VLM dense captioning or calling its REST API (uploads, captions, streams, chat-completions, Kafka). Not for VSS profile deploy or video-search ingestion.
Use this skill to ask the VSS agent's video_understanding tool a fresh visual question about a recorded clip. Not for prior tool output, search hits, or metadata-answerable questions.
Use to select, configure, deploy, verify, debug, or tear down a VSS profile (base, search, lvs, warehouse, edge). Not for standalone microservices — use the vss-deploy-* skill.
Use this skill when the user wants to deploy, run, debug, tear down, or call the REST API of the RTVI-CV 2D detection / tracking microservice. Trigger when the user says things like 'deploy rtvi-cv', 'start warehouse 2d…
PointPillars for 3D object detection from LiDAR point clouds. Encodes point clouds into a pseudo-image via a
Use this skill when producing a VSS analysis report — Mode A per-clip VLM, Mode B incident-range via video-analytics. Not for standalone video summarization, real-time alerts or ad-hoc Q&A.
SegFormer for semantic segmentation. Lightweight transformer-based architecture with hierarchical feature
Four-step image referring-expression pipeline: turns images plus KITTI bounding-box labels into region
OCDNet for scene text detection. Detects arbitrary-oriented text regions in natural images using a
Run `tao-daft convert` to convert NVIDIA TAO DAFT datasets between supported formats. Do not use for non-DAFT data.
NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation
NVDINOv2 for self-supervised visual representation learning. Trains vision transformers via self-distillation
Metric-learning recognition (ml-recog) for fine-grained visual recognition. Learns embeddings for
Mask2Former for universal image segmentation (panoptic, instance, and semantic). Transformer-based with
Mask Grounding DINO for grounded instance segmentation. Extends Grounding DINO with a mask-prediction head for
MAL (Mask Auto-Label) for weakly-supervised segmentation. Produces segmentation masks from minimal annotations
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions.
Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them
Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of
Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate n…