WebDevJudge introduces a rigorous benchmark for assessing how Large Language Models and Multimodal LLMs critique web development quality, revealing critical ...
Level: advanced
By Chunyang Li and 7 other authors
Category: research