apps/learning-api/evals-playground/TEST_SET.md

Quiz Eval Test Set

The quiz eval test set is a 45-deck manifest defined in test_set.py. It is used by the eval pipeline to run paired comparisons across audience, subject, and language slices.

Quiz Eval Test Set

The quiz eval test set is a 45-deck manifest defined in test_set.py. It is used by the eval pipeline to run paired comparisons across audience, subject, and language slices.

Composition

Audience:

  • university: 18 decks
  • highschool: 18 decks
  • other: 9 decks

Language:

  • English: 23 decks
  • German: 10 decks
  • French: 7 decks
  • Dutch: 5 decks

Subject:

  • base/highschool: 18 decks
  • other: 9 decks
  • business: 4 decks
  • law: 4 decks
  • medicine: 4 decks
  • humanities: 2 decks
  • science: 2 decks
  • social sciences: 2 decks

Local Source Files

Raw source documents live in a local-only folder:

Test-set/

That folder is gitignored because it contains eval materials and can be large. The expected layout is:

Test-set/
  university/{subject}/{language}/{deck_id}__{name}.pdf
  university/{subject}/{language}/{deck_id}__{name}.docx
  university/{subject}/{language}/{deck_id}__{name}.pptx
  highschool/{language}/{deck_id}__{name}.pdf
  other/{language}/other/{deck_id}__{name}.pdf

Only files whose names start with a test_set.py deck id are picked up by the local ingestion script.