Evaluating Functions From a Graph Worksheet

philschmid/ai-agent-benchmark-compendium

Compendium of over 50 benchmarks for evaluating AI agents, categorized into Function Calling & Tool Use, General Assistant & Reasoning, Coding & Software Engineering, and Computer Interaction. - ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

philschmid/ai-agent-benchmark-compendium

Trending now