Translation
Playto offers multiple ways to translate game text — from classic OCR to AI vision, with local and cloud options.
Text Reading
The default translation mode. Playto captures a region of your screen and reads the text using Windows built-in OCR (WinRT), then sends it to a local LLM for translation.
Pipeline: Screen capture → WinRT OCR text recognition → text stability check → LLM translation → overlay display.
Text stability: Playto waits briefly for the text to settle before translating, and skips re-translation when the text barely changes to avoid flicker.
Text Reading +
Combines Text Reading's OCR with an AI vision model for translation. The OCR engine extracts text from the screen (fast and lightweight), then a local vision model produces the translation — useful when OCR gets the characters right but you want more natural phrasing than a pure text LLM can give.
Best for: Text-heavy games where Text Reading captures correctly but translation quality needs a lift.
Image Recognition
Instead of OCR, Playto sends the screen image directly to a Vision Language Model. The AI reads the image and translates in one pass — no OCR step needed.
Best for: Stylized fonts, text on complex backgrounds, handwritten-style text, and UI elements that OCR can't handle.
Streaming: the overlay updates as the model generates translation tokens, so text appears without waiting for the full response.
Cloud AI
If you don't have a dedicated GPU, you can use cloud API providers for translation. Add your API key in Settings and Playto will route requests to the cloud.
Supported providers:
- Google Gemini — free tier available
- OpenAI
- OpenRouter — access multiple models through one API
- Custom endpoint — any OpenAI-compatible API
Cost control: By default, Cloud AI uses a manual shutter — Playto only calls the API when you press the shutter button (or the shortcut), keeping cost predictable. You can switch to auto-capture in Settings if you prefer continuous translation.
Dialogue handling: Multi-line dialogue is automatically merged to reduce API calls and produce more natural translations.
Prompt Patterns
Playto offers three prompt patterns that control how captured text is sent to the AI for translation:
- Auto — Playto decides the best pattern based on the text structure. Best for most games.
- Dialogue — Treats the captured text as a continuous subtitle. Lines are merged into a single paragraph before translation. Best for dialogue-heavy games and cutscenes.
- UI — Each line is translated independently. Best for menus, item lists, and HUD text where lines are unrelated.
Configure per cursor preset or fixed region in the Capture Area card. You can also set per-game overrides in Game Packs.
Conversation Context
Playto passes the previous translation as context to keep tone consistent across rapid captures.
Local LLM Setup
Playto runs a llama.cpp server locally for translation inference. Models are downloaded with one click and tuned automatically for your GPU — advanced users can adjust GPU layers, context size, threads, and other knobs manually in Settings.
Text Reading and Image Recognition can run on separate servers simultaneously, so you can use both modes without switching models.