Translation

Playto offers multiple ways to translate game text — from classic OCR to AI vision, with local and cloud options.

Text Reading

The default translation mode. Playto captures a region of your screen and reads the text using Windows built-in OCR (WinRT), then sends it to a local LLM for translation.

Pipeline: Screen capture → WinRT OCR text recognition → text stability check → LLM translation → overlay display.

Text stability: Playto waits briefly for the text to settle before translating, and skips re-translation when the text barely changes to avoid flicker.

Text Reading +

Combines Text Reading's OCR with an AI vision model for translation. The OCR engine extracts text from the screen (fast and lightweight), then a local vision model produces the translation — useful when OCR gets the characters right but you want more natural phrasing than a pure text LLM can give.

Best for: Text-heavy games where Text Reading captures correctly but translation quality needs a lift.

Image Recognition

In this optional mode, Playto sends the screen image directly to a Vision Language Model, which reads and translates it in one pass. It's the heaviest of the three modes — best kept for the cases the OCR-based modes struggle with.

Best for: Stylized fonts, text on complex backgrounds, handwritten-style text, and UI elements that plain OCR struggles with.

Streaming: the overlay updates as the model generates translation tokens, so text appears without waiting for the full response.

Lightweight NMT

A compact neural machine translation engine that runs after the OCR step in place of a local LLM. It translates in tens of milliseconds and needs very little VRAM — useful on lower-end hardware or when you want the fastest possible overlay.

Trade-off: the quality ceiling is lower than the AI translation modes and it carries no conversation context. NMT covers a fixed set of language pairs — Japanese ↔ English, plus English → several European languages. See Language Support for the full list; for any other pair Playto falls back to AI translation.

Best for: low-VRAM setups, or pairing fast lightweight translation with demanding games.

Prompt Patterns

Playto offers three prompt patterns that control how captured text is sent to the AI for translation:

Auto — Playto decides the best pattern based on the text structure. Best for most games.
Dialogue — Treats the captured text as a continuous subtitle. Lines are merged into a single paragraph before translation. Best for dialogue-heavy games and cutscenes.
UI — Each line is translated independently. Best for menus, item lists, and HUD text where lines are unrelated.

Configure per cursor preset or fixed region in the Capture Area card. You can also set per-game overrides in Game Packs.

Conversation Context

Playto passes the previous translation as context to keep tone consistent across rapid captures.

Local LLM Setup

Playto runs a llama.cpp server locally for translation inference. Models are downloaded with one click and tuned automatically for your GPU — advanced users can adjust GPU layers, context size, threads, and other knobs manually in Settings.

Text Reading and Image Recognition can run on separate servers simultaneously, so you can use both modes without switching models.