Content pillar misses subtopics in dense documents
Status: Open Categories: content-pillars, mindmaps, flashcards Document types: Medical lectures, physics textbooks, dense academic papers Deck IDs: 21ec69d5 (Rheumatology), f37842c8 (Rotational Dynamics), 9d0ed3b0 (Virtue Ethics), 559bdab8 (Sleep & Memory)
Content pillar misses subtopics in dense documents
Status: Open
Categories: content-pillars, mindmaps, flashcards
Document types: Medical lectures, physics textbooks, dense academic papers
Deck IDs: 21ec69d5 (Rheumatology), f37842c8 (Rotational Dynamics), 9d0ed3b0 (Virtue Ethics), 559bdab8 (Sleep & Memory)
Receipts
Source documents (docling markdown) and generated mindmap trees for each deck:
| Deck | Source | Mindmap |
|---|---|---|
| Rheumatology | source_rheumatology.md | mindmap_rheumatology.txt |
| Rotational Dynamics | source_rotational_dynamics.md | mindmap_rotational_dynamics.txt |
| Virtue Ethics | source_virtue_ethics.md | mindmap_virtue_ethics.txt |
| Sleep & Memory | source_sleep_memory.md | mindmap_sleep_memory.txt |
| WW2 (control — good) | source_ww2.md | mindmap_ww2.txt |
Problem
When a document is dense (many distinct concepts across many pages), the content pillar generator produces too few subtopics. Entire sections of the source material get no subtopic assignment, meaning downstream features (mindmaps, flashcards, quizzes) have no coverage of those concepts.
Evidence
Rheumatology (46k chars, 22 subtopics)
Missing from content pillars entirely:
- Osteoarthritis — symptoms, diagnosis, RA vs OA comparison, management (extensive source coverage)
- Psoriatic arthritis, reactive arthritis, SLE — seronegative/seropositive distinction, post-infection mechanism, butterfly rash, ANA/anti-dsDNA
- Motor units — types (slow S/type I, fast fatigue-resistant FR/type IIA, fast fatigable FF/type IIB), recruitment, rate coding, size principle
- Isometric/concentric/eccentric contraction types
- Tendon/ligament injury — tendinopathy classification, ligament injury grades I/II/III
- Bone healing — primary vs secondary, Wolff's Law, callus formation
These topics have substantial source content but no subtopic in the pillar structure.
Rotational Dynamics (38k chars, 10 subtopics)
Missing:
- Parallel Axis Theorem — has its own section in the source
- Conservation of Angular Momentum — dedicated section with figure-skater example
- Moment of inertia table — critical reference for exams (formulas for cylinders, hoops, spheres, rods)
- Newton's Second Law for Rotation (τ = Iα) — lumped into broad "Torque and Its Effects"
Virtue Ethics (18k chars, 11 subtopics)
Missing as named nodes:
- Phronesis (practical wisdom) — standalone section in source, key Aristotelian concept
- Eudaimonia — core concept, not surfaced
- Held's three rethinkings — collapsed into one subtopic name that only mentions "rethinking reason"
Sleep & Memory (63k chars, 11 subtopics)
Missing:
- Hippocampal replay/reactivation — central mechanistic claim, buried in generic "Neurophysiological Evidence"
- Named neurotransmitters (cortisol, ACh, serotonin, norepinephrine) — heavily discussed but not surfaced
- Sleep deprivation effects — extensive source coverage, no dedicated node
- Specific experimental paradigms (VDT, finger tapping task) — exam-relevant, absent
Root cause
The content pillar LLM sees only chunk headings + first 60 chars per chunk. For documents where:
- Many distinct concepts exist under the same heading
- The first 60 chars don't distinguish between sub-concepts
- The document has more concepts than the model's tendency to create subtopics
...the result is too few, too broad subtopics that miss important content.
Impact
- Mindmaps miss entire branches a student would need for exam prep
- Flashcards can't be generated for missing concepts
- Worse for dense professional/medical/technical documents than for simpler ones
- WW2 (simpler, well-sectioned document) scored well; Rheumatology (dense medical lecture) had the worst gaps