Benchmarks is in alpha.

Our benchmarks site is currently in alpha and is intended solely for research purposes. The benchmarks provided on this site are created and maintained with the utmost care and attention to accuracy. However, prior to full release, benchmarks and testing methodology will likely change many times.

Red Teaming Resistance Benchmark

Our code, data, and methodology can be found in our Github repo.

Adversarial Datasets

We've compiled some of the most commonly used red-teaming datasets and evaluated their attack success rate against various open-sourced models:

Adversarial Content

Using OpenAI's 14 Risk Policies, we've classified each prompt from our collected datasets into their respective content violation policy and measured the resulting model performance.

Leaderboard

Submit model

Model Name

Average

More benchmarks coming soon