agent▌
146 indexed skills · max 10 per page
agent-evaluation
davila7/claude-code-templates · Productivity
Behavioral testing and reliability metrics for LLM agents, catching production failures benchmarks miss. \n \n Covers five core evaluation areas: agent testing, benchmark design, capability assessment, reliability metrics, and regression testing \n Emphasizes statistical test evaluation (multiple runs, result distribution analysis) and behavioral contract testing over single-run or string-matching approaches \n Includes adversarial testing patterns to actively probe agent failure modes and ident
agent-tools
inference-sh/skills · Productivity
Access 150+ cloud-based AI apps via CLI—image generation, video creation, LLMs, search, 3D modeling, and Twitter automation. \n \n Supports major models including FLUX, Veo, Claude, Gemini, Grok, Seedance, OmniHuman, Tavily, and Exa with no local GPU required \n Automatic local file upload for images, audio, and video inputs; run apps synchronously or asynchronously with task status tracking \n Covers six capability categories: image generation, video generation, LLM inference, web search, 3D mo
agent-teams-simplify-and-harden
pskoett/pskoett-ai-skills · Productivity
A two-phase team loop that produces production-quality code: implement, then audit using simplify + harden passes, then fix audit findings, then re-audit, repeating until the codebase is solid or the loop cap is reached.
agent-browser
everyinc/compound-engineering-plugin · Productivity
The CLI uses Chrome/Chromium via CDP directly. Install via npm i -g agent-browser, brew install agent-browser, or cargo install agent-browser. Run agent-browser install to download Chrome. Run agent-browser upgrade to update to the latest version.
claude-agent-sdk
jezweb/claude-skills · AI/ML
$22
agent-browser
jezweb/claude-skills · Productivity
Every browser automation follows this pattern:
sub-agent-patterns
jezweb/claude-skills · Productivity
Delegate specialized tasks to isolated AI assistants with custom tools, models, and system prompts. \n \n Sub-agents preserve main context by isolating verbose tool outputs and intermediate reasoning, enabling longer sessions and cleaner conversations \n Three built-in agents available: Explore (Haiku, read-only codebase search), Plan (Sonnet, plan-mode research), and General-Purpose (Sonnet, full read/write access) \n Create custom agents in .claude/agents/ with YAML frontmatter and markdown pr
cloudbase-agent-ts
tencentcloudbase/skills · Cloud
TypeScript SDK for deploying AI agents as HTTP services with AG-UI protocol support. \n \n Supports three adapter patterns: LangGraph for stateful graph-based workflows, LangChain for chain-based agents, and custom adapters via AbstractAgent interface \n Includes @cloudbase/agent-server for HTTP service deployment with built-in CORS, logging, and observability configuration \n Provides UI client libraries for web applications ( @ag-ui/client ) and WeChat Mini Programs ( @cloudbase/agent-ui-minip
agent-evaluation
sickn33/antigravity-awesome-skills · Productivity
Framework for testing LLM agents across behavioral, capability, and reliability dimensions with production-focused evaluation patterns. \n \n Covers five core evaluation areas: agent testing, benchmark design, capability assessment, reliability metrics, and regression testing \n Emphasizes statistical test evaluation (multiple runs with distribution analysis) and behavioral contract testing over single-run or string-matching approaches \n Includes adversarial testing patterns and guards against
agent-tool-builder
sickn33/antigravity-awesome-skills · Frontend
Design LLM-facing tool schemas that prevent hallucination, silent failures, and token waste. \n \n Focuses on JSON Schema design, input examples, and error handling patterns that help LLMs use tools correctly \n Emphasizes explicit documentation and clear descriptions over implementation details, since LLMs only see the schema \n Identifies anti-patterns like vague descriptions, silent failures, and tool overload that cause agent failures \n Covers function-calling, MCP tools, and tool validatio