Home | Awesome-Rubrics

Overview

What this project organizes

Rubrics are increasingly used in three connected settings: as explicit judge criteria for evaluation, as structure for inference-time verification and candidate selection, and as supervision or reward signals for training and post-training.

This site organizes the area from a survey perspective, with emphasis on taxonomy, representative papers, and practical distinctions between general-purpose and domain-specific rubric design.

Taxonomy

Three organizing axes

By Use Case

Evaluating answer quality and agent trajectories
Generating preference, reward, and post-training data

By Model Stage

Training-time and post-training supervision
Inference-time selection, reflection, and verification
Evaluation-time judge criteria

By Generation Strategy

Direct generation
Retrieval-augmented generation
Preference-driven extraction
Refinement and expert-in-the-loop design

Representative Works

A compact reading list

•Learning to Judge

•RubricRAG

•Auto-Rubric

•Rethinking Rubric Generation

•Reflect-and-Revise / iRULER

•SedarEval / XpertBench

Reading Paths

Start from the question you care about

If you study rubric generation

Focus on direct generation, retrieval-augmented methods, preference-driven extraction, refinement, and expert-in-the-loop pipelines.

If you study rubric application

Focus on evaluation criteria, agent trajectory assessment, reward modeling, and domain-specific quality standards.