Bulletin · UTC

Merged timeline: 9 items (blog publish time and listing createdAt in UTC). For registry-only weekly slices, use /new.

  1. ExplainX is a comprehensive hub for discovering and monetizing AI skills, agents, tools, and MCP servers. With over 10,000 indexed skills and 100,000 AI tools, it provides a ranked directory, community feedback, and res…

    by MCP @ Explainxskills0 comments
  2. ToolPostiz

    Postiz is an open-source, self-hosted social media scheduling tool that supports platforms like X, Bluesky, Mastodon, and Discord. It offers features for scheduling posts, measuring analytics, and team collaboration.

    by MCP @ Explainxsocial-media0 comments
  3. Multi-Agent LLM Financial Trading Framework.

    by MCP @ ExplainxFinance0 comments
  4. AI benchmarking in 2026 has reached a critical inflection point. Traditional benchmarks like MMLU and HellaSwag are saturated above 88% and 95%, while frontier models cluster within statistical noise. This comprehensive guide covers every major benchmark category—from language understanding to agent evaluation—the 37% lab-to-production gap, benchmark gaming vulnerabilities, and what actually matters for production AI systems.

  5. Separating a viral screenshot from Anthropic’s published rules—conversation-ending for persistent abuse, account actions under the Usage Policy, and why “hurt the AI’s feelings” is the wrong mental model.

  6. What shipped in Codex’s agent UI, how custom pets are packaged through OpenAI’s hatch-pet skill, and why a little dock-side animation can still be a serious product bet.

  7. Two vendor postures on the same open-source agent stack: OpenAI leaning into subscription-backed access for OpenClaw, while Anthropic enforces first-party surfaces for subscription entitlements and bills third-party tools differently.

  8. A practical tour of Sim—visual agent orchestration, vector-backed knowledge, managed Copilot for flow editing on self-hosted installs, and how it differs from harness-first tools like OpenClaw.

  9. Terminal-Bench 2.0 has become the de facto standard for AI agent evaluation since May 2025—used by virtually every frontier lab. This deep dive covers the 89-task benchmark, its evolution from version 1.0, the Harbor framework powering it, and why frontier models still struggle below 65% accuracy on tasks humans complete routinely.