Frontend

i18n Part 2: Migrating 10,000+ Strings with AI, Codemods, and a Multi-Layered QA Process

[Apr 07, 2026 - 14 min read]
i18n Part 2: Migrating 10,000+ Strings with AI, Codemods, and a Multi-Layered QA Process

In Post 1, we covered the i18n frameworks and infrastructure we built. With this in place, the next challenge was wrapping every user-facing string in our codebase with translation utility calls to enforce compile time extraction and runtime lookups, across Python, TypeScript, Thrift, and Protobuf files.

In this post we’ll cover how we migrated at scale, looking at the AI-assisted approach that replaced months of manual work, the codemods that handled Thrift and Proto generated code, the linter rules that prevented regressions, and the multi-layered QA process that caught what automation missed.

The Scale of the Problem

Our codebase had strings spread across:

  • 10,000+ TypeScript/TSX files in the React frontend
  • 10,000+ Python files in the Flask backend
  • 64 Thrift files containing 1,485 static strings across 243 maps
  • Protobuf schema files with additional constant definitions

Performing the i18n migration completely manually would be months of tedious, error-prone work. We instead relied on three approaches, suited to different parts of the codebase.

Method 1: AI-Assisted Code Migration

This was our primary approach for both frontend and backend files. We built scripts that crawl a directory file by file, passing each file's contents along with a detailed migration prompt to an LLM, and receiving back the transformed code.

The Evolution of Model Accuracy

We didn't get this right on the first try. Here's what we learned across models:

GPT-4o (~60% accuracy): Our first attempt. It frequently made errors: removing comments or lint-ignores, breaking bracket matching on unrelated code, over-refactoring beyond what was asked. About 40% of files needed spot fixes.

o1-mini (slightly better): Fewer hallucinations but still inconsistent. It would sometimes remove lint-ignore comments, try to "fix" code unrelated to the migration, or fail to properly use pluralization tooling.

o1-preview (90%+ accuracy): The breakthrough. Despite initial concerns about cost and speed, the API performance was comparable to the other models, and the accuracy improvement was dramatic. Even complex string concatenation scenarios were handled correctly.

Here's a representative transformation the AI performed:

// Before
<ZText bold small>
  '{foo.bar}' is being used in {a}
  {foo.count === 1 ? 'approval' : 'approvals'}
</ZText>

// After (AI-generated)
<ZText bold small>
  <Trans>
    '{foo.bar}' is being used in {a}{" "}
    <Plural one="approval" other="approvals" value={foo.count} />
  </Trans>
</ZText>

The AI correctly identified the ternary as a pluralization pattern and converted it to Lingui's <Plural> component, something a simple codemod would struggle with.

The Migration Script

# Simplified version of our migration approach
async def migrate_file(file_path: str, model: str = "o1-preview"):
    content = read_file(file_path)

    # Skip files that don't need migration
    if should_skip(file_path):  # tests, admin, generated code
        return

    response = await llm_call(
        model=model,
        prompt=MIGRATION_PROMPT,
        file_content=content,
    )

    if response != "NO_CHANGES_NEEDED":
        write_file(f"{file_path}.migrated", response)

The Prompt Engineering

The migration prompt was iteratively refined based on observed AI mistakes. Key rules included:

WRAPPING RULES:
- Wrap all user-facing strings with t() or <Trans>
- Use .format() for variables, NEVER f-strings
- Module-level strings: wrap with t() for extraction
- Skip: snake_case identifiers, log messages, dict keys,
  technical strings, GraphQL field descriptions

PLURAL HANDLING:
- Convert ternary count checks to ngettext/Plural
- Use %(num)d format for Python ngettext

DO NOT:
- Modify comments
- Change code logic unrelated to i18n
- Remove or add imports beyond i18n ones
- Wrap strings that are only used for comparison/logic

When we noticed a pattern of mistakes (e.g., the AI modifying @ts-expect-error comments), we'd add a specific counter-example to the prompt. This iterative refinement was key to reaching 90%+ accuracy.

AI-Assisted Verification

We went one step further: using AI to verify AI's work. A separate verification script analyzed the git diff of each migrated file and flagged potential issues:

  • Unwrapped strings that should have been wrapped
  • Incorrectly wrapped strings (e.g., snake_case keys)
  • f-string mishandling
  • Plural form errors
  • Unintended comment modifications

This two-pass approach (migrate with one model, verify with another) caught an additional ~5% of issues that slipped through the initial migration.

Cost

Migrating ~500 frontend files cost roughly $15-20 in API calls. the full codebase migration was well under $500, which is little compared to the cost in eng time to conduct manually.

What AI missed

The high accuracy migration scripts failed to fully utilize context across files and directories. For example, we do not want to localize strings that we then persist to db, as our i18n frameworks are optimized for forward lookups only.

In this example, we would persist "Config name" in the active user’s language instead of in base English.

def save_config():
    new_config.create().name(
        t("Config name")
    ).save()

This issue was pervasive in Python, making our backend migration much more white-glove.

Method 2: Codemods for Thrift and Protobuf

Thrift and Proto files required a different approach. These are schema definition files that get code-generated into Python and TypeScript. We can't wrap strings in the schema files themselves, so we need the generated code to include translation calls.

The Problem

Thrift files define static string maps like:

const map<BillStatus, string> BillStatusString = {
    BillStatus.INITIAL: "Uncoded",
    BillStatus.PENDING_APPROVAL: "Pending approval",
    BillStatus.APPROVED: "Ready for payment",
    BillStatus.REJECTED: "Rejected",
}

Running thriftgen produces Python:

BillStatusString = {
    BillStatus.INITIAL: "Uncoded",
    BillStatus.PENDING_APPROVAL: "Pending approval",
    BillStatus.APPROVED: "Ready for payment",
    BillStatus.REJECTED: "Rejected",
}

We need the generated code to instead be

from i18n.utils import t

BillStatusString = {
    BillStatus.INITIAL: t("Uncoded"),
    BillStatus.PENDING_APPROVAL: t("Pending approval"),
    BillStatus.APPROVED: t("Ready for payment"),
    BillStatus.REJECTED: t("Rejected"),
}

Approaches We Explored

1. Fork the Thrift compiler and Proto generator (rejected). We prototyped modifications to the IDL Thrift compiler to inject t() calls during code generation. This did work, but would be a maintainability nightmare. We'd need to maintain a custom Thrift fork forever, and it only solved Thrift, not Proto. For Proto, we have a custom AST visitor. Updating the Python and TypeScript codegen visitors to wrap strings was straightforward for this case.

2. Post-processing codemod. Instead of modifying generators, we run a codemod as a post-processing step after every thriftgen and protogen call. The script:

  • Scans generated files for string map values
  • Checks if values are in Sentence Case (indicating user-facing text)
  • Wraps matching values with t() and adds the import
  • Skips snake_case values (identifiers, not display text)

This approach handles both Thrift and Proto with a single script, runs automatically as part of the code generation pipeline, and requires no changes to upstream compilers.

Preventing Regressions

Automation, the involved AI migration, and a deep QA process ensured our static strings were initially migrated. Maintaining 99%+ localization is a challenge though.

Layer 1: Out of the box linters

Ruff (Python): The f-string-in-get-text-func-call rule catches the most common backend mistake: wrapping an f-string in t(), which silently breaks extraction.

Lingui ESLint plugin (TypeScript): Catches unwrapped JSX text, incorrect macro usage, missing @lingui/macro imports, and strings that should use <Plural> instead of ternary expressions.

TypeScript compiler: Catches type errors introduced by the migration, for example if wrapping a string in t() changes its type from string to string | undefined.

Layer 2: Custom linters

The Lingui ESLint linter ensures high coverage for Typescript. For python however, we built out a suite of linters to enforce backend engineers were properly localizing, which we enforced in a pre-commit hook. Primarily, we enforce sentence case strings are at least extracted to our message catalog.

Layer 3: Pseudo-Locale Testing

Our pseudo-locale (pseudo-LOCALE) replaces all strings extracted to message catalogs with accented versions on read: "Pending approval" becomes "P̂ëñd̂ïñg̈ àp̂p̂r̈ôv̈àl̂". This makes untranslated strings visually obvious during manual QA, as any plain English text is a missed string.

Results

The combination of AI migration, codemods, and multi-layered QA let us internationalize the codebase with a small team across both frontend and backend. The AI approach was the multiplier. It turned what would have been a massive effort across multiple teams of manual wrapping into a few months for two people.

What's Next

Static strings are only half the story. In Post 3, we'll tackle a larger problem: translating user-generated content.