Designing Reliable LLM-Assisted Rubric Scoring for Constructed Responses: Evidence from Physics Exams
This research evaluates GPT-4o's reliability in scoring handwritten physics exams, revealing how rubric design and model settings impact human-AI agreement i...