← Insights

Field guide · Modernization

What a real modernization assessment looks like

Most assessments stop at the version numbers. A real one maps the full surface of risk — and tells you what to do about it in what order.

March 2026 · 12 min read

Most organizations that commission a modernization assessment know they have a problem. The codebase is old. The dependencies are out of date. The team moves slowly and deploys nervously. Something needs to change.

What they often receive in response is a framework upgrade recommendation and a list of libraries to update. Maybe a note about moving to containers, or a suggestion to adopt a newer frontend toolchain. Version numbers go up. The system looks more current. The underlying risk surface is largely the same.

This gap between what organizations get from most assessments and what they actually need is what this document is about. A real modernization assessment does not start with frameworks. It starts with risk — specifically, a systematic account of where risk is actually concentrated across the full surface of the system. That means security posture, runtime model, dependency health, test coverage, CI/CD maturity, configuration architecture, frontend stack, and deployment model. All of it. Not as a checklist, but as an integrated picture of what is fragile, what is exposed, and what is upstream of what.

The context for this guide is a long-running engagement with a public health software platform. Long-lived codebase. Django and React stack. Seasonal operational cadence with real consequences for disruption. A team operating under real capacity constraints. The output was a phased modernization roadmap grounded in direct codebase examination across every dimension covered here.

Security and compliance posture

Security posture in mature systems is rarely a single finding. It is a cluster — and the items in the cluster are often small individually and significant together.

In the public health platform context, that cluster included: password hasher configuration that retained legacy algorithms capable of validating credentials that should have required re-authentication; session and CSRF cookie security flags present in the configuration file but commented out, creating conditions where a reviewer could reasonably conclude they were enabled; and machine-to-machine API endpoints protected by CSRF exemption decorators with environment header checks rather than token-based authentication — a pattern that works until an assumption about header integrity is violated.

None of these items is catastrophic in isolation. Together they represent a security posture that has never been systematically reviewed against the standard the platform currently needs to meet. That distinction matters. The absence of systematic review means the team does not have confidence in what is and is not properly secured. That uncertainty is its own risk.

For regulated and government-adjacent systems, security posture carries additional compliance implications. Systems handling protected health data operate under documentation requirements that extend beyond the technical configuration itself. A security review in this context is not just an analysis of what the system does — it is also an input to the compliance documentation that validates that the system meets its regulatory obligations. Work done without that framing is work that may need to be redone when the compliance question is asked formally.

The practical output of security assessment is not a list of things to patch. It is a prioritized remediation plan that distinguishes between items that carry immediate risk and can be addressed quickly (commented-out security flags), items that require careful upgrade sequencing (dependency version upgrades with active CVEs), and items that require architectural decisions (access control patterns on API endpoints). Those three categories require different responses and different sequencing.

Dependency health and runtime model

Dependency health in long-lived systems is almost always worse than it appears from the outside. The problem is not that teams are careless — it is that dependency upgrades carry real change risk in systems with insufficient test coverage, and so teams defer them rationally until the deferral itself becomes the risk.

In the public health platform, this manifested as a combination of pinned dependencies with documented CVEs, an application server that required a custom patch applied via shell commands at Docker build time to address a Python path incompatibility, and a runtime model that had not been revisited since the platform was first deployed. The patch situation is instructive: a third-party library needed to be modified at image build time because the installed WSGI server had a path dependency on a Python version that the base image no longer provided in the expected location. The patch worked. It also meant that any base image update could silently break the build — and that there was no clean path to retiring the WSGI server without first resolving the dependency that required the patch.

This is the runtime model question. What application server is the platform running on, under what configuration, with what dependencies, and what would it take to change any of those things? Long-lived platforms often run on server configurations that were appropriate when they were chosen and have not been revisited since. Worker model, concurrency model, async request handling — or the absence of it — and containerization approach all need to be evaluated together, because they constrain each other.

A runtime model assessment also forces a conversation about infrastructure direction. If the organization intends to move toward Kubernetes-based orchestration, the application server configuration needs to be compatible with that model. If the platform currently relies on file-based session storage, that assumption needs to be addressed before a multi-instance deployment becomes viable. These are infrastructure implications that show up in the modernization roadmap as sequencing constraints, not optional improvements.

Configuration sprawl and deployment model

Configuration architecture is one of the most reliable indicators of a system's operational maturity — and one of the most consistently underexamined dimensions in modernization assessments.

In a long-lived Django application, the settings architecture tells you how the platform has been managed across its operational history. A monolithic settings file that manages all deployment environments through branching conditional logic — development, staging, production, and any number of intermediate configurations — is not just messy. It is a source of deployment risk. The risk is not theoretical: it is the risk of applying the wrong configuration to the wrong environment because the logic that distinguishes them is buried in a large file that requires careful reading to understand. It is the risk of debugging a production issue that turns out to be a configuration difference rather than a code problem. It is the risk of onboarding a new team member who makes a change to the settings that has environmental consequences they did not anticipate.

In the public health platform, the settings architecture included a 337-line monolithic file and five Docker Compose files for different deployment targets that had diverged over time with substantial duplication. Understanding what any specific deployment target actually configured required reading through multiple files simultaneously and reconciling discrepancies. That is not where team attention should be going during a deployment or an incident.

The deployment model assessment asks related questions at the infrastructure level. How are releases promoted through environments? What is the validation step before a change reaches production? What does a rollback look like? For government and regulated-environment systems, the answers to these questions have compliance implications — changes to systems handling protected health data may require documented validation procedures, and a deployment model that relies on informal processes rather than defined procedures creates documentation risk alongside operational risk.

Configuration and deployment model modernization typically does not require exotic changes. It requires organization: a settings architecture that separates environment-specific concerns explicitly, a Compose configuration that uses base-plus-override rather than full-file duplication, and a deployment process that is documented, repeatable, and does not depend on individuals who are physically present.

Test maturity and regression risk

Test coverage is one of those metrics that is easy to misread. A team with a test suite that runs in CI on every pull request sounds like a team with a good testing posture. The question is what the tests actually cover.

In the public health platform, the test suite ran and passed. It was composed almost entirely of HTTP response-code smoke tests: calls to API endpoints that verified a 200 or 404 response code, with no examination of whether the response payload was correct, whether the business logic behaved as expected under varied inputs, or whether the system would degrade gracefully under error conditions. The Playwright end-to-end test suite existed and provided meaningful coverage at the UI layer — but it was not connected to the CI pipeline, which meant it ran manually, infrequently, and had no integration with the change validation workflow.

This is a specific failure mode: test infrastructure that provides the appearance of validation without the substance of it. A team that runs its tests and sees green builds naturally develops confidence that the system is being validated. That confidence is not warranted if the tests are not actually testing behavior. The dangerous part is not the absence of coverage — it is the confidence that the absent coverage displaces.

A test maturity assessment maps three things: what is currently covered and at what level, what carries the highest regression risk if changed, and what coverage would need to exist to make the highest-risk changes safely. In a seasonal surveillance platform, the highest-risk paths are the data ingestion, transformation, and reporting workflows that run during peak periods. Those paths need behavioral coverage before anyone touches them. The assessment identifies those paths explicitly and sequences testing investment accordingly — starting where the risk is highest, not where the coverage is easiest to write.

CI/CD maturity

CI/CD maturity is the dimension that most directly determines how fast a team can safely improve a system. A team with strong CI/CD automation can make changes, validate them quickly, and deploy with confidence. A team without it is in a different situation: every change is higher-stakes because there is less automated validation standing between a merged pull request and a production deployment.

In the public health platform, the CI/CD state was close to baseline. The only GitHub Actions workflow in place was a dependency review scan. There was no automated test execution gate, no linting, no type-checking pipeline, and no build validation on pull requests. Changes could be merged and deployed with no automated validation of code quality, correctness, or configuration integrity at the point of change.

This creates a compounding effect on all other modernization work. Introducing TypeScript to the frontend is only partially valuable without a CI type-check gate — the type safety only applies to developers who run the checker locally. Building meaningful test coverage is less valuable without CI executing those tests on every change. Upgrading dependencies safely requires being able to validate that the upgrade has not altered behavior, which requires tests that run consistently against a defined environment.

CI/CD improvement is therefore a foundational step in the modernization sequence, not a later-stage improvement. It does not have to reach a sophisticated state before other work can proceed — a basic automated test execution gate, linting, and build validation on pull requests gets most of the foundational value. But it needs to be functional before the improvements that depend on it are invested in.

Frontend and UX technical debt

Frontend debt in long-lived systems tends to accumulate in three overlapping categories: accessibility and compliance gaps, maintainability constraints, and build toolchain complexity.

Accessibility is not optional in federally-affiliated systems. Section 508 compliance is a legal obligation, and a frontend that does not meet it carries real institutional risk. In the public health platform, the primary compliance gap was in datatable implementations — a technically specific but practically important failure point for a data-intensive surveillance platform where end users navigate and interact with tabular health data regularly. Accessibility is also not purely a compliance concern. An interface that is not accessible provides a degraded experience for users with assistive technology needs — a category that includes a meaningful portion of government and public health workforce users.

Maintainability constraints in the public health frontend included the absence of TypeScript across the React layer, reliance on a component library that had entered maintenance-only status, and state management patterns that had accumulated ad-hoc rather than being designed. These constraints do not block the system from functioning. They increase the cost of every change. A frontend developer adding a new feature to a TypeScript-absent codebase has less compiler assistance for catching errors. A developer building on a maintenance-only component library is working against a component set that does not evolve.

Build toolchain complexity is a maintenance tax. The public health platform had a heavily customized Webpack configuration with fifteen build-specific development dependencies. Most of those dependencies existed to solve problems that current build toolchains solve natively. Each one is a dependency that can fall out of maintenance, introduce a compatibility issue with a Node upgrade, or require specialized knowledge to debug when something goes wrong. Resolving this is not just about developer experience — it is about removing a fragility point that has no modern equivalent.

The practical implication of frontend assessment is sequencing. In the public health engagement, the 508 compliance remediation created the mandate for introducing TypeScript and modern React patterns. Once that foundation was in place, the scope expanded naturally to new feature delivery — Advanced Search functionality, task assignment workflows — built on the TypeScript and React foundation the compliance work had established. The Webpack configuration was resolved as part of this track. This is how frontend modernization often works when sequenced correctly: a compliance or maintainability driver creates a structural improvement that enables delivery work that was otherwise blocked.

Application architecture assessment

Application architecture in long-lived Django systems tends toward the same set of patterns regardless of the domain. Views acquire too many responsibilities. Serializers carry business logic that should live elsewhere. Validation, transformation, persistence, and side effects become interleaved rather than separated. The tests that would catch regressions in any of those areas are absent, or absent for the concerning paths, because writing isolated tests requires isolated behavior — and the behavior is not isolated.

The assessment question is not whether these patterns are present. They almost certainly are in any system that has been operating for several years under real delivery pressure. The question is where they are most consequential and what the refactoring path looks like.

In the public health context, the application architecture assessment focuses particularly on the concern separation in the data handling layer. The surveillance platform processes seasonal health data across ingestion, validation, transformation, and reporting steps. Each of those steps has a clear domain purpose, and each requires a different kind of correctness guarantee. Ingestion correctness is about completeness and format integrity. Validation correctness is about domain rule enforcement. Transformation correctness is about producing the expected output format for downstream consumption. Reporting correctness is about the accuracy of aggregate representations. When those concerns are distributed across a mix of model methods, serializer logic, and view-layer processing, testing any one of them requires exercising all of the others — which makes tests expensive to write and brittle to maintain.

An architecture assessment maps where those concerns currently live and what a refactoring sequence would look like that progressively separates them without requiring a rewrite. This is practical modernization work — not framework upgrades, not cloud migrations, but the structural changes that make the system easier to reason about, easier to test, and safer to change.

How AI fits into the assessment process

AI-assisted analysis has become a practical accelerant for modernization assessment work in a way that was not true even two years ago. The task that benefits most is codebase archaeology: understanding what a large, long-lived system actually contains before the substantive assessment work begins.

A codebase that has been in active development for several years accumulates patterns that take time to map manually. Dependency relationships between modules, the distribution of business logic across layers, the location of security-sensitive operations, the consistency of configuration patterns across files — these are not things that can be read from a README or an architecture diagram. They require systematic examination of the code itself, typically through a combination of structural analysis and targeted reading.

AI tooling substantially compresses this phase. Surfacing all the places a particular configuration value is read across a large codebase, identifying all the endpoints that share a specific authentication pattern, mapping the call graph for a data processing function to understand what depends on it — these are tasks that AI-assisted tools handle in seconds that would take a human reviewer hours or days. That time compression is not theoretical; in the public health engagement, it made the difference between an assessment that could cover the full surface of the system in a focused engagement cycle and one that would have required a much longer and more expensive investment.

The important qualifier is what AI does not replace. Engineering judgment — about which findings are significant, how to sequence remediation, what the right architectural approach is for a given refactoring need, how an organization's specific operational constraints should shape a roadmap — is not something a tool can provide. The output of AI-assisted codebase analysis is evidence. Human engineers are still responsible for interpreting that evidence, forming the recommendations, and communicating them in a way that decision-makers can act on.

The combination is more capable than either alone. AI handles the scale problem: covering a large surface quickly and consistently. Human engineers handle the judgment problem: knowing what matters, understanding the constraints, and designing a sequenced path forward that will actually work.

The roadmap as the deliverable

The output of a rigorous modernization assessment is not a findings document. Findings documents get filed. The output is a sequenced roadmap — one that is specific enough to act on and grounded enough in system realities that the sequence is defensible.

Sequencing is the hard part. It requires understanding the dependency relationships between improvement areas: which improvements are prerequisites for others, which improvements are safe to pursue in parallel, and which should wait until the prerequisites are in place. Getting the sequence wrong is not just inefficient — it can mean doing work twice or introducing instability that undermines the improvement effort.

In the public health engagement, the sequence was: security posture and immediate risk remediation first, because those items carry concrete risk that accumulates with every passing week. CI/CD foundational automation second, because almost everything else depends on it. Dependency stabilization and settings architecture decomposition third, now supported by the CI safety net that makes those changes less risky to execute. Test coverage expansion starting from the highest-risk operational paths. Frontend modernization and new feature delivery last — built on a foundation that is secure, tested, and deployable with confidence.

That sequence is not arbitrary. Each step creates the conditions for the next one to be safer and more effective. A team that attempts the frontend modernization before the CI/CD foundation is in place is working without a type-check gate. A team that attempts major dependency upgrades before test coverage is meaningful is deploying unvalidated changes. A team that attempts configuration architecture decomposition before the deployment process is documented is refactoring decisions whose full implications are not yet understood.

The roadmap also needs to account for capacity. Government and regulated-environment teams operate under real capacity constraints that do not bend to fit an idealized improvement plan. A roadmap that assumes full-time modernization effort will not be executed if the team also has operational responsibilities. Effort estimates calibrated to actual team capacity, with clear priorities at each stage, give leadership a realistic view of what can be accomplished in what timeframe — which is what decisions about investment and resourcing actually require.

What this kind of assessment tells you that a standard review does not

A standard technical review tells you what the system contains. A modernization assessment in the sense described here tells you what the system is ready to do next.

That distinction has practical value at the business and program level. Engineering directors making resourcing decisions need to know whether their team's current velocity is a reflection of platform constraints or team capacity constraints — because the interventions are different. Program owners responsible for compliance need to know which technical gaps create regulatory exposure versus which are development inconveniences. Technical architects planning additions to the platform need to know which architectural decisions are upstream of other decisions and need to be resolved first.

The assessment also produces a shared picture of risk that did not previously exist in explicit, documented form. Long-lived systems carry institutional knowledge about what is fragile and what is risky that is held informally, distributed across individuals, and often partially lost as team composition changes. A rigorous assessment captures that knowledge in a form that is transferable — which is itself a risk reduction outcome independent of whatever technical improvements follow.

Finally, a well-executed modernization assessment changes the conversation with stakeholders who control investment decisions. Abstract requests for technical improvement rarely receive the investment they need. A specific account of where risk is concentrated, what it would take to address it, in what sequence, and what the platform will be capable of at the conclusion of each phase — that is a conversation that connects technical decisions to business outcomes in a way that decision-makers can actually act on.

Protabyte conducts modernization assessments for technology organizations operating long-lived, mission-critical platforms — including public sector, federally-affiliated, and regulated-environment systems. If your platform has accumulated debt across multiple dimensions and you need a clear-eyed, sequenced account of what it would take to address it, we are available for a direct conversation.