ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
Explore ClawsBench, a novel benchmark evaluating the capability and safety of LLM agents within high-fidelity simulated productivity environments like Gmail ...