Backlog
- OCR interaction split:
ReaderSettingscurrently persists OCR interaction throughocrEnabledandocrAutoScanImages, which supports Auto, Tap or hover, and Off. Add an explicit OCR trigger schema before exposing separate Click, Hover, and Manual OCR choices in settings or the puck radial menu. - Page scanning mode split: users do not understand "manual scan only" because OCR and DOM text scanning are separate concerns. Add a three-state page scan control (
Off,Auto,Manual) in onboarding and settings, with a visible shortcut field whenManualis selected. - Generic layout safety: Yomu must not push, clip, wrap, or resize host UI when adding highlights/furigana. Current reports include BookWalker galleries/carousels, Wikibooks, ChatGPT/Claude/Google login/search inputs, Investing mobile, Discord names/messages, Crunchyroll consent text, Polymarket, and YouTube chrome. Treat compact controls, buttons, nav, inputs, and cards as passive scan targets without ruby unless the user explicitly looks them up.
- Text-entry controls: placeholders and aria labels in empty editable controls should not be mirrored into the page as visible Yomu text. Reports include ChatGPT's composer placeholder and Google account chooser/login UI.
- Contrast and dark-mode readability: default OCR/highlight colors must remain readable on any page background. Canna's feedback showed white text on pale highlight is unreadable, while dark-mode text can become black or disappear on Discord-like UIs.
- BookWalker reader: continuous scroll mode currently becomes laggy and fails to OCR images; normal page mode can stop triggering OCR after several page turns and still need a refresh. The BookWalker settings popover has furigana wrapping/misalignment, and the homepage image carousel/gallery can break when passive scan highlights are active.
- Yomu PDF: scanned PDFs must not render noisy OCR overlays over already-readable scan text. Text PDFs should use native PDF text parsing; scanned PDFs should use Yomu OCR in a readable in-place layer, with Playwright coverage for both cases.
- Study/newtab: remove the large answer-card section under the prompt and keep the study surface minimal. Reveal-side terms need furigana/pitch, Jiten/Kanji sections should left-align consistently, the study audio button must work without CORS errors, and the frequency number should appear as a lookup pill rather than hardcoded header text.
- Newtab/homepage PWA install UX: move install guidance into the overflow menu, make the overflow menu match the main docs menu, add install/update actions where available, and avoid inert standalone install icons.
- Settings help: show the installed Yomu version, whether an update is available, and a one-click reinstall/update action so users do not have to find the website again.
- Yomu Video / YouTube: side panel alignment and resizing must be instant and must not stretch or push videos; auto-scroll must keep the current subtitle visible; fullscreen controls must be themed, correctly placed, and not block native fullscreen escape; subtitles must not follow the page into comments; font sizing must be stable across short and long captions; subtitle settings controls need reset/defaults and must not click through to the video.
- Caption interaction: pause on caption click is P0, optional pause on hover needs a setting, and both must apply to Yomu Video and the "Read captions in any player" demo.
- Audio playback reliability: repeated speaker presses should always play the current word, interrupting/restarting cleanly when needed; hover audio should not stop after several words; avoid premature fallback to TTS when the real source arrives shortly after.
- Pointer and pencil support: Apple Pencil taps should behave like normal pointer/click activation for dictionary kanji links, show/hide trace buttons, fullscreen OCR word taps, and other interactive controls.
- Gaming app: build an actual Yomu-branded Electron app with invisible overlay behavior, first-run onboarding, capture shortcut setup, settings sync, whole-screen instant capture, in-place OCR overlay, and release artifacts. Docs must not advertise competitors as the solution; gaming docs should explain how to install and use Yomu itself on PC/Steam Deck.
- Onboarding: add a Game capability slot ("Install the Yomu app to use in games or anywhere on the PC"), expose scan modifier key setup beside immersion defaults, and include capture shortcut setup in Electron.
Verified Slices
- Yomu Gaming Electron onboarding is now part of the Electron app, inside the Yomu Settings shell. It exposes the capture shortcut, local OCR endpoint, first-run capture actions, and compact feature status (
data-yomu-gaming-first-run,data-capture-shortcut-input,data-action="test-capture-overlay",data-action="start-overlay"). Verified bynpm run smoke:gamingand visual smoke screenshots. - Yomu Gaming package and release commands now run the gaming smoke before packaging. Verified by
npm run package:gaming, which produceddist-gaming/packages/mac-arm64/Yomu Gaming.app. - Yomu Gaming app icon now uses a generated 512px Yomu asset (
public/app-icons/yomu-gaming-512.png) instead of the default Electron icon or the too-small extension icon. - Generic passive chrome scan slice: compact UI controls now suppress ruby and use passive inline scan styling, while regular prose keeps ruby. Verified with
npm test -- tests/reader/reader-theme.test.ts tests/reader/styles.test.ts tests/reader/nested-text-parse.test.ts tests/reader/bookwalker-site-scan.test.ts tests/reader/visible-page-scanner.test.ts,npm run smoke:mobile-prose-overflow,npm run smoke:compact-chrome, andnpm run smoke:bookwalker-carousel.
Remaining Gaps
- Steam Deck / gamescope / exclusive fullscreen capture still needs physical hardware validation. The smoke records this honestly in
qa-artifacts/gaming-hardware-gap.txt. - Live-site browser sweeps are still required after each release for BookWalker, YouTube, ChatGPT/Claude, Wikibooks, Google login/search, and mobile news pages because the failures depend on host-page layout and mutation behavior.