Our benchmarks site is currently in alpha and is intended solely for research purposes. The benchmarks provided on this site are created and maintained with the utmost care and attention to accuracy. However, prior to full release, benchmarks and testing methodology will likely change many times.
Our code, data, and methodology can be found in our Github repo.
We've compiled some of the most commonly used red-teaming datasets and evaluated their attack success rate against various open-sourced models:
Using OpenAI's 14 Risk Policies, we've classified each prompt from our collected datasets into their respective content violation policy and measured the resulting model performance.
Leaderboard
Submit model
Model Name |
---|
Average |
---|
More benchmarks coming soon