Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

This research details an advanced framework for measuring policy violations using adaptive sampling and fine-tuned multimodal LLMs. It explores how to achiev...

Level: advanced

By Attila Dobi, Aravindh Manickavasagam, Benjamin Thompson, Xiaohan Yang, Faisal Farooq

Category: discussion