The conversation about AI in software engineering tends toward two poles. On one side: AI will replace most of what engineers do, and organizations that do not adopt it immediately will be outcompeted by those that do. On the other: AI-generated code is unreliable, the quality risks are underestimated, and the productivity claims do not hold up under serious scrutiny.
Both of these positions miss the more useful observation. AI tooling is genuinely valuable in software engineering work — but the value is specific, not universal. It concentrates in the parts of the work that are systematic and scale-dependent. It does not extend to the parts of the work that require judgment, domain understanding, and the ability to reason about whether a result is correct in a context that the tool cannot understand.
This distinction matters practically, and it shows up most clearly in modernization work on domain-heavy platforms.
What domain-heavy means in this context
A domain-heavy platform is one where the business logic embedded in the code reflects specialized knowledge that engineers developed over time — knowledge that is not captured in any external reference, is not self-evident from the code structure, and cannot be reconstructed without understanding the domain itself.
Genomics platforms are a clear example. A DNA testing platform does not just move data from one format to another. It calculates breed probability distributions across allele combinations. It scores health markers using weighting models grounded in veterinary genetics research. It interprets lineage relationships in ways that affect result classification. The logic that does these things encodes real domain knowledge — knowledge that has to be correct because the outputs go to customers making decisions about their animals' health.
This kind of platform is not unusual. Healthcare clinical decision support systems encode medical knowledge. Insurance underwriting platforms encode actuarial logic. Public health surveillance systems encode epidemiological processing rules. Scientific computing platforms encode domain-specific analytical methods. In each case, the code is not just an implementation of generic business rules. It is a representation of specialized knowledge, and its correctness is judged against domain standards that have nothing to do with the code itself.
What this means for modernization is that the work has two distinct competence requirements: software engineering discipline and domain understanding. Both are required. Neither is sufficient alone.
Where AI actually helps in modernization
Modernization of a long-lived, domain-heavy platform begins with understanding what the platform actually contains. This sounds straightforward. In practice, it is the slow, expensive part of the engagement.
A codebase that has been in active development for several years accumulates patterns that take time to map. Where does the domain logic actually live? Which functions carry business rules that should be in service classes? Which endpoints have accumulated multiple responsibilities over successive product cycles? Where are the validation, transformation, and persistence concerns interleaved in ways that make them impossible to test independently? What does the call graph look like for the processing path that handles the highest-risk operations?
Answering these questions manually requires reading a large amount of code systematically. In a substantial codebase, that work takes weeks. It requires maintaining a mental model of the whole system while reading its parts — a cognitively expensive task that is easy to do imprecisely.
AI tooling compresses this phase. Traversing a large codebase to identify all the places a particular type of logic appears, mapping the dependency relationships between modules, surfacing every endpoint that shares a specific structural pattern, identifying processing chains where a change to one component would propagate to others — these are tasks that AI handles quickly and consistently. The output is not a complete understanding of the system. It is evidence: a structured map of what the codebase contains and where attention is warranted. Experienced engineers then interpret that evidence, apply judgment about what is significant, and design the modernization approach from an informed position rather than an incomplete one.
Test generation is the second high-value application. In a tightly coupled codebase, writing tests is expensive because the test has to exercise more of the system than the behavior it is trying to validate. When domain logic lives in serializers and view handlers alongside validation and persistence, a test for a single business rule requires setting up API contexts, database state, and serialization paths just to reach the logic being tested. AI tooling can generate initial test scaffolding that handles that setup overhead — producing a starting structure that exercises the right code paths and produces readable test cases. Engineers then do the work that actually matters: validating that the expected outputs in the tests are correct, adding the edge cases that the AI did not anticipate, and confirming that the tests are testing domain correctness rather than just code execution.
Documentation acceleration is a related case. Long-lived codebases almost universally contain functions that have never been documented. AI tooling can analyze a function, infer its purpose from its behavior, and produce an initial description. Engineers review that description, correct the parts that are semantically wrong (which requires domain knowledge), and refine the output into something accurate and useful. This is significantly faster than writing documentation from scratch and more reliable than leaving it undocumented.
Where AI cannot substitute for judgment
The limitation of AI tooling in domain-heavy modernization is precise: it cannot validate domain correctness.
A test that passes is not the same as a test that is correct. If the expected output in a test for a breed probability calculation is wrong — not because the code is wrong, but because the test was written by someone who did not understand how breed probability distributions work — the test will pass and provide false confidence. AI-generated tests carry this risk in domain-heavy contexts. The scaffolding is structurally reasonable. The expected values may be wrong in ways that only someone with domain knowledge can detect.
This is not a deficiency that better AI tooling will resolve. It is a fundamental property of the task. Validating domain correctness requires understanding the domain. A genomics platform engineer who does not understand how allele-based health marker scoring works cannot verify that a test for that scoring logic is testing the right thing, regardless of what tools are available. The knowledge requirement does not disappear because the tool is sophisticated.
Refactoring carries the same constraint. Extracting genomics domain logic from a view handler into a service class changes where the code lives, not what it does. But in a domain-heavy system, verifying that the extraction has preserved semantic correctness requires more than running the tests. It requires understanding what the logic is supposed to do and whether the refactored version still does it — in edge cases, at boundary conditions, and in the scenarios that the test suite does not cover. That understanding is human.
Architectural decisions are entirely in the human domain. How should the system be structured? Which responsibilities belong in which layer? What are the right service boundaries for a genomics processing platform? What validation architecture is appropriate for a system where incorrect outputs have downstream consequences? AI tooling can surface evidence relevant to those decisions. It cannot make the decisions.
The combination that makes modernization work
The practical pattern that works in AI-enabled modernization is a clear division between what AI does and what engineers do — with the division made deliberately rather than discovered by accident.
AI handles the scale problem. Codebase archaeology across a large system. Initial test scaffolding for a defined set of functions. Documentation drafts for a large number of undocumented routines. Dependency analysis across a complex module graph. These are tasks where the challenge is primarily coverage and consistency rather than judgment. AI handles them quickly and without the cognitive overhead that makes manual equivalents slow.
Engineers handle the judgment problem. Interpreting what the archaeological evidence means for the modernization approach. Validating that AI-generated tests are testing the right thing given domain knowledge. Determining the correct architectural direction. Making the refactoring decisions that require understanding what the code does, not just what it looks like. Communicating the findings and the plan to organizational stakeholders in terms that connect technical decisions to operational outcomes.
This division has a practical implication for how modernization engagements should be staffed and scoped. AI tooling creates leverage on the scale-dependent phases, but it does not reduce the requirement for engineers who combine software engineering discipline with domain understanding — or who can develop domain understanding quickly enough in the engagement to be effective. A modernization engagement that replaces domain-aware engineering judgment with AI analysis will produce technically plausible output that fails the correctness standard the platform actually requires.
The combination, used well, changes the economics of modernization work. Assessment work that would require several weeks of manual effort becomes a focused engagement cycle. Test coverage that would require extensive setup work to write correctly becomes achievable at a reasonable pace. Documentation that would realistically never get written gets drafted and refined into something accurate and useful. The work still requires engineering judgment at every stage. It moves faster because AI handles the parts that do not.
What this looks like in a genomics platform engagement
In a production DNA testing platform, the combination played out across several phases of the modernization work.
The assessment phase used AI analysis to map where breed probability, health marker scoring, and allele interpretation logic was distributed across the codebase — identifying the API endpoints carrying the most responsibility overlap and surfacing the processing paths that a refactoring would need to handle most carefully. This analysis took days rather than weeks. It produced a structured picture of the system that informed every subsequent decision in the engagement.
Test generation produced initial scaffolding for the highest-risk processing functions. Engineers reviewed every expected value in those scaffolded tests against their understanding of how the genomics logic was supposed to work. Some expected values were wrong — the AI had inferred plausible outputs that did not reflect the actual domain logic. Those were corrected before any refactoring proceeded. The test suite that resulted was reliable precisely because the review step was not skipped.
Refactoring used the verified test suite as validation. Extracting genomics domain logic into explicit service boundaries, separating the API layer from processing logic, and refactoring endpoints to single responsibilities — each step was validated against tests that engineers had confirmed were testing the right thing. The refactoring moved confidently because the validation layer was trustworthy.
Documentation for previously undocumented genomics processing functions was drafted with AI assistance and reviewed against domain expertise. Some drafts were accurate. Others described what the code appeared to do without capturing why — the domain rationale that a new engineer would need to understand to work on the function safely. Those were rewritten by engineers who understood both the code and the genetics context it reflected.
The CI/CD foundation — automated test execution, linting, build validation on pull requests — was established as the last structural step, so that all subsequent development would run against the same automated gate that had not existed before the engagement.
The knowledge transfer that makes it durable
An AI-enabled modernization engagement that delivers a better platform without transferring the practices to the internal team has a limited useful life. The platform is in better condition at handoff. Over the following months, the same patterns that required the modernization begin to re-accumulate, because the team does not have a way to apply the practices that improved the system.
The practices are transferable. AI-accelerated codebase analysis is a repeatable workflow. AI-assisted test generation is a technique that can be applied to any new function added to the system. Documentation drafting and review is a process that can be built into the development workflow. These are not advanced capabilities requiring specialized tooling access. They are approaches that engineering teams can apply with the same tools available to any practitioner.
Knowledge transfer in this context means more than handing over a codebase and documentation. It means the internal team understanding what the AI-enabled practices are, having used them in the engagement rather than watching them applied, and having a workflow model they can apply independently. The outcome of a well-executed AI-enabled modernization engagement is not just a better platform — it is an internal team better equipped to keep the platform improving after the engagement ends.
Protabyte applies AI-enabled engineering practices in modernization engagements for domain-heavy platforms across life sciences, genomics, healthcare, and regulated industries — combining AI-accelerated analysis with the engineering judgment that domain-specific correctness requires. If your platform carries the kind of specialized logic that conventional modernization approaches do not account for, we are available for a direct conversation.