Quiz and Summary Feedback Current State

Date: 2026-04-12 Source: unified_feedback_enriched.csv — all negative-sentiment feedback with body text ≥5 chars. Method: Every row read and classified by LLM (no keyword heuristics). 2,309 entries total.

Quiz Feedback (605 negative entries)

Category	Count	%	Description
`too_few_questions`	222	36.7%	User requested N questions, got far fewer (often 1–3)
`too_easy`	75	12.4%	Distractors too obvious, answers identifiable by length/position
`content_mismatch`	72	11.9%	Questions not about the uploaded material
`repetitive_questions`	51	8.4%	Same questions repeated across or within quizzes
`not_working`	47	7.8%	Quiz didn't generate or load at all
`unclear_questions`	35	5.8%	Poorly worded, confusing, or overly long questions
`incorrect_answers`	30	5.0%	Wrong answer marked correct, factual errors
`too_superficial`	25	4.1%	Surface-level, not exam-relevant, covers meta/admin info
`other`	22	3.6%	Doesn't fit above (too hard, feature requests, etc.)
`wrong_language`	18	3.0%	Quiz generated in wrong language
`rendering_bug`	8	1.3%	Formulas/formatting broken or displayed incorrectly

Key Findings — Quiz

36.7% of complaints = too few questions. This is the single largest issue by far. Users set a max question count and receive far fewer — sometimes just 1. This is likely a generation pipeline issue (content extraction → question generation throughput).
Quantity + generation failures account for ~53%. Combining too_few_questions (36.7%), repetitive_questions (8.4%), and not_working (7.8%) — over half of all negative feedback is about not getting enough usable quiz content.
Quality issues account for ~39%. too_easy (12.4%), content_mismatch (11.9%), unclear_questions (5.8%), incorrect_answers (5.0%), and too_superficial (4.1%) together indicate quiz quality problems — distractors that don't work, questions from the wrong part of the material, and factual errors.
too_easy is structurally exploitable. Multiple users report that the longest answer option is always correct, or that true/false answers always appear on the same side. This is a prompt/generation bias, not a content issue.

Summary Feedback (1,704 negative entries)

Category	Count	%	Description
`wants_bullets_structure`	298	17.5%	Wants bullet points, headings, organized layout — not prose
`missing_content`	293	17.2%	Summary skipped chapters/sections from source material
`too_long`	251	14.7%	Summary too lengthy/verbose
`too_short`	216	12.7%	Summary too brief, not detailed enough
`other`	168	9.9%	Vague complaints, feature requests, unclear feedback
`wrong_language`	109	6.4%	Summary in wrong language
`not_generated`	93	5.5%	Empty page, nothing generated
`content_mismatch`	84	4.9%	Summary not about the uploaded document at all
`too_complex_language`	81	4.8%	Language too difficult, hard to understand for user's level
`rendering_formatting`	68	4.0%	Font size, layout, display issues, broken rendering
`missing_visuals`	22	1.3%	Diagrams/images from source not included
`unwanted_sections`	21	1.2%	Reflection questions or sections user didn't ask for

Key Findings — Summary

Format is the #1 complaint. 17.5% of users explicitly want bullet points or structured layout instead of continuous prose. Many EU students (Dutch, French, German) use summaries for exam prep and need scannable, structured content.
Coverage is the #2 complaint. 17.2% say the summary skipped chapters or sections. Common pattern: user uploads 50–80 page document, summary only covers the first few pages/chapters.
Length calibration is broken in both directions. 14.7% say too long, 12.7% say too short — together 27.4%. The system isn't matching user expectations for detail level. Users who want "detailed" get walls of text; users who want "concise" get multi-page output.
Wrong language is significant at 6.4%. Many non-English EU users (Dutch, Swedish, French, German) receive summaries in English or a different language than their source material.
Generation failures at 5.5%. Users report empty pages or nothing generated — a reliability issue.
too_complex_language (4.8%) reveals a user-level mismatch. Secondary school students (vmbo, 6de leerjaar) receive university-level language. The system doesn't adapt to the user's educational level.

Actionable Priorities

Quiz — High Impact

Fix question count generation — investigate why the pipeline produces far fewer questions than requested
Improve distractor quality — eliminate structural tells (longest answer = correct, positional bias in T/F)
Improve content coverage — questions should span the full uploaded material, not just the first section

Summary — High Impact

Default to structured/bullet format — or respect user format preferences more reliably
Improve full-document coverage — ensure all chapters/sections are represented
Better length calibration — map user detail preferences to actual output length
Fix language detection/respect — generate in the source material's language

Source Data

quiz_feedback_classified.csv — 605 negative quiz feedback entries with complaint_category column
summary_feedback_classified.csv — 1,704 negative summary feedback entries with complaint_category column

Both files include all original columns from unified_feedback_enriched.csv plus the classification. They are source artifacts and are intentionally not tracked in this PR.