In Post 1, we covered the i18n frameworks and infrastructure we built. With this in place, the next challenge was wrapping every user-facing string in our codebase with translation utility calls to enforce compile time extraction and runtime lookups, across Python, TypeScript, Thrift, and Protobuf files.
In this post we’ll cover how we migrated at scale, looking at the AI-assisted approach that replaced months of manual work, the codemods that handled Thrift and Proto generated code, the linter rules that prevented regressions, and the multi-layered QA process that caught what automation missed.
Our codebase had strings spread across:
Performing the i18n migration completely manually would be months of tedious, error-prone work. We instead relied on three approaches, suited to different parts of the codebase.
This was our primary approach for both frontend and backend files. We built scripts that crawl a directory file by file, passing each file's contents along with a detailed migration prompt to an LLM, and receiving back the transformed code.
We didn't get this right on the first try. Here's what we learned across models:
GPT-4o (~60% accuracy): Our first attempt. It frequently made errors: removing comments or lint-ignores, breaking bracket matching on unrelated code, over-refactoring beyond what was asked. About 40% of files needed spot fixes.
o1-mini (slightly better): Fewer hallucinations but still inconsistent. It would sometimes remove lint-ignore comments, try to "fix" code unrelated to the migration, or fail to properly use pluralization tooling.
o1-preview (90%+ accuracy): The breakthrough. Despite initial concerns about cost and speed, the API performance was comparable to the other models, and the accuracy improvement was dramatic. Even complex string concatenation scenarios were handled correctly.
Here's a representative transformation the AI performed:
// Before
<ZText bold small>
'{foo.bar}' is being used in {a}
{foo.count === 1 ? 'approval' : 'approvals'}
</ZText>
// After (AI-generated)
<ZText bold small>
<Trans>
'{foo.bar}' is being used in {a}{" "}
<Plural one="approval" other="approvals" value={foo.count} />
</Trans>
</ZText>The AI correctly identified the ternary as a pluralization pattern and converted it to Lingui's <Plural> component, something a simple codemod would struggle with.
# Simplified version of our migration approach
async def migrate_file(file_path: str, model: str = "o1-preview"):
content = read_file(file_path)
# Skip files that don't need migration
if should_skip(file_path): # tests, admin, generated code
return
response = await llm_call(
model=model,
prompt=MIGRATION_PROMPT,
file_content=content,
)
if response != "NO_CHANGES_NEEDED":
write_file(f"{file_path}.migrated", response)The migration prompt was iteratively refined based on observed AI mistakes. Key rules included:
WRAPPING RULES:
- Wrap all user-facing strings with t() or <Trans>
- Use .format() for variables, NEVER f-strings
- Module-level strings: wrap with t() for extraction
- Skip: snake_case identifiers, log messages, dict keys,
technical strings, GraphQL field descriptions
PLURAL HANDLING:
- Convert ternary count checks to ngettext/Plural
- Use %(num)d format for Python ngettext
DO NOT:
- Modify comments
- Change code logic unrelated to i18n
- Remove or add imports beyond i18n ones
- Wrap strings that are only used for comparison/logicWhen we noticed a pattern of mistakes (e.g., the AI modifying @ts-expect-error comments), we'd add a specific counter-example to the prompt. This iterative refinement was key to reaching 90%+ accuracy.
We went one step further: using AI to verify AI's work. A separate verification script analyzed the git diff of each migrated file and flagged potential issues:
This two-pass approach (migrate with one model, verify with another) caught an additional ~5% of issues that slipped through the initial migration.
Migrating ~500 frontend files cost roughly $15-20 in API calls. the full codebase migration was well under $500, which is little compared to the cost in eng time to conduct manually.
The high accuracy migration scripts failed to fully utilize context across files and directories. For example, we do not want to localize strings that we then persist to db, as our i18n frameworks are optimized for forward lookups only.
In this example, we would persist "Config name" in the active user’s language instead of in base English.
def save_config():
new_config.create().name(
t("Config name")
).save()This issue was pervasive in Python, making our backend migration much more white-glove.
Thrift and Proto files required a different approach. These are schema definition files that get code-generated into Python and TypeScript. We can't wrap strings in the schema files themselves, so we need the generated code to include translation calls.
Thrift files define static string maps like:
const map<BillStatus, string> BillStatusString = {
BillStatus.INITIAL: "Uncoded",
BillStatus.PENDING_APPROVAL: "Pending approval",
BillStatus.APPROVED: "Ready for payment",
BillStatus.REJECTED: "Rejected",
}Running thriftgen produces Python:
BillStatusString = {
BillStatus.INITIAL: "Uncoded",
BillStatus.PENDING_APPROVAL: "Pending approval",
BillStatus.APPROVED: "Ready for payment",
BillStatus.REJECTED: "Rejected",
}We need the generated code to instead be
from i18n.utils import t
BillStatusString = {
BillStatus.INITIAL: t("Uncoded"),
BillStatus.PENDING_APPROVAL: t("Pending approval"),
BillStatus.APPROVED: t("Ready for payment"),
BillStatus.REJECTED: t("Rejected"),
}1. Fork the Thrift compiler and Proto generator (rejected). We prototyped modifications to the IDL Thrift compiler to inject t() calls during code generation. This did work, but would be a maintainability nightmare. We'd need to maintain a custom Thrift fork forever, and it only solved Thrift, not Proto. For Proto, we have a custom AST visitor. Updating the Python and TypeScript codegen visitors to wrap strings was straightforward for this case.
2. Post-processing codemod. Instead of modifying generators, we run a codemod as a post-processing step after every thriftgen and protogen call. The script:
t() and adds the importThis approach handles both Thrift and Proto with a single script, runs automatically as part of the code generation pipeline, and requires no changes to upstream compilers.
Automation, the involved AI migration, and a deep QA process ensured our static strings were initially migrated. Maintaining 99%+ localization is a challenge though.
Ruff (Python): The f-string-in-get-text-func-call rule catches the most common backend mistake: wrapping an f-string in t(), which silently breaks extraction.
Lingui ESLint plugin (TypeScript): Catches unwrapped JSX text, incorrect macro usage, missing @lingui/macro imports, and strings that should use <Plural> instead of ternary expressions.
TypeScript compiler: Catches type errors introduced by the migration, for example if wrapping a string in t() changes its type from string to string | undefined.
The Lingui ESLint linter ensures high coverage for Typescript. For python however, we built out a suite of linters to enforce backend engineers were properly localizing, which we enforced in a pre-commit hook. Primarily, we enforce sentence case strings are at least extracted to our message catalog.
Our pseudo-locale (pseudo-LOCALE) replaces all strings extracted to message catalogs with accented versions on read: "Pending approval" becomes "P̂ëñd̂ïñg̈ àp̂p̂r̈ôv̈àl̂". This makes untranslated strings visually obvious during manual QA, as any plain English text is a missed string.
The combination of AI migration, codemods, and multi-layered QA let us internationalize the codebase with a small team across both frontend and backend. The AI approach was the multiplier. It turned what would have been a massive effort across multiple teams of manual wrapping into a few months for two people.
Static strings are only half the story. In Post 3, we'll tackle a larger problem: translating user-generated content.