Anthropic Says Claude Code Can Analyze COBOL. Here's Why Analysis Isn't Proof

Anthropic Says Claude Code Can Analyze COBOL. Here's Why Analysis Isn't Proof

Anthropic Says Claude Code Can Read COBOL. Here's Why Reading Isn't Understanding.

On February 23, 2026, Anthropic published a sweeping claim: Claude Code is "exceptionally proficient at deciphering and modernizing legacy languages like COBOL." IBM's stock took a hit. The enterprise media cycle went into orbit.

So let's talk about what Anthropic actually announced — what it means, what it doesn't mean, and why the distinction matters enormously for enterprises with billions of dollars of business logic locked inside their mainframes.

What Anthropic Is Claiming

Claude Code, Anthropic's agentic coding assistant, has been positioned as a tool that can automate the "exploration and analysis phases of COBOL modernization." Specifically, Anthropic and its partners (including Deloitte in a November 2025 webinar) have claimed the tool can:

  • Map dependencies across thousands of lines of code
  • Document "long-forgotten workflows"
  • Translate COBOL logic to modern languages
  • Reduce modernization timelines from years to quarters

These are not small claims. And to be clear: we respect Anthropic enormously. Claude is a remarkable language model. But there is a profound difference between a system that analyzes COBOL and a purpose-built system that truly understands it. That difference is not academic — it is the difference between a project that succeeds and one that fails catastrophically in production.

The Completeness Problem

Anthropic's claims for Claude Code are more substantive than simple code reading. The playbook describes tracing execution paths, mapping data flows across modules, and discovering implicit dependencies created by shared files, databases, and global state. That's real structural analysis, and we take it seriously.

But here's the question that matters for your core banking system: Can it guarantee completeness?

Every large language model — Claude included — is a probabilistic system at its foundation. Even when augmented with structural analysis, it produces outputs calibrated to be correct in the overwhelming majority of cases. For general software modernization, that's genuinely impressive and often sufficient.

Modernizing a COBOL transaction processing system that clears billions of dollars annually is not a general software task.

A typical Fortune 500 mainframe application contains:

  • Millions of lines of COBOL spanning 40+ years of ad-hoc modification with complex memory implications
  • Hundreds of copybooks with implicit data contracts
  • GOTO-heavy control flow that creates "spaghetti" logic, requiring careful formal analysis to trace completely
  • Business rules embedded in compute logic, often with no corresponding documentation
  • JCL job streams, DB2 binds, VSAM file structures, and CICS transactions forming a dependency web that no single developer holds in their head

When a system like this is analyzed probabilistically — even with structural augmentation — it will find most dependencies. It may find 95% of them. For mission-critical systems processing trillions of dollars annually, that remaining 5% is not a rounding error. It is a catastrophe waiting to happen.

IBM itself noted it plainly in response to the Anthropic announcement: "merely translating COBOL code does not fully address the complex architectural and integration challenges of mission-critical mainframe workloads."

What Understanding Actually Entails

COBOL Colleague was built on a fundamentally different premise: that true understanding of legacy code requires automated reasoning, not probabilistic inference.

Our approach constructs a normalized, functionally equivalent causal model and a mathematically-verified knowledge graph of your application through formal analysis of your actual source code — not from a model's training data, not from statistical likelihood, but from slicing the causal model for the “effects” of the application. This rigorous approach is based on cognitive science, several classes of logic, and mathematical techniques that are patented. 

Every relationship in that graph is derived from and traceable back to specific lines of code. These are causal dependencies –explicitly following an unbroken line of cause and effect through the code.  Memory is explicitly modeled, not guessed.

We won't claim Claude Code finds nothing — it may find a great deal. What we can tell you is what makes our approach categorically different:

  • Our results are auditable. Every finding links bi-directionally to the exact line of source code that produced it. You can verify any claim we make. You can show it to a regulator. You can run the analysis twice and get the same answer.
  • We tell you what we don't know. A deterministic system has a defined scope. If something falls outside our analysis, we explicitly say so. A probabilistic system will produce a confident-sounding answer regardless, and you won't always know when it's guessing.
  • Our knowledge graph is yours. The structured, verifiable, reasoned dataset we build from your codebase can be interrogated, integrated, and audited independently of any AI interface. It's not a black box producing outputs — it's a transparent artifact you own.

The question isn't whether Claude Code's analysis is good. It may be very good. The question is: can you prove it? For a regulated financial institution facing an audit, a modernization post-mortem, or a production failure, "the AI was confident" is not an answer.

Our approach is foundationally anchored in science and engineering, not probability. That’s what makes it deterministic and why it’s provable from code to compliance.

 

The Real Risk of "Good Enough"

Perhaps the most dangerous outcome of the Anthropic announcement is not the technology itself — it's the expectation it may set.

Enterprise buyers are already under enormous pressure to modernize. Boards and CFOs are excited by the prospect of AI shortcutting a process that has historically been expensive, slow, and risky. An AI that claims to "understand COBOL in quarters, not years" is exactly what they want to hear.

But when that AI produces a translation that is subtly wrong — when the edge case in the premium calculation logic is misread, when the copybook inheritance chain is incompletely traced, when the GOTO that looked like dead code turns out to control a critical batch job — the consequences are not a bad code review. They are regulatory violations, financial system failures, and data loss events that take years to remediate.

We have seen this pattern before. In the 2010s, a generation of COBOL "automated translation" tools promised the same thing with rule-based compilers. The projects they launched still litter the enterprise landscape as cautionary tales.

Where Claude Code Fits — Honestly

We don't believe the right answer is to dismiss Claude Code. For many modernization tasks, it represents a genuine leap forward. Structural analysis, dependency mapping, documentation generation, boilerplate acceleration — these are real capabilities that can meaningfully compress timelines on projects where "very good" is good enough.

The honest question is: what happens at the boundary where "very good" becomes "catastrophically wrong"?

That boundary exists in every mission-critical COBOL system. It lives in the edge case inside the premium calculation logic. In the copybook inheritance chain that's incompletely traced. In the GOTO that looks like dead code but controls a critical batch job. These aren't hypothetical failure modes — they're the specific patterns that have turned a generation of automated COBOL translation projects into cautionary tales.

This is precisely why COBOL Colleague uses LLM capabilities — including models like Claude — for natural language generation. Once our knowledge graph has deterministically established every relationship, every dependency, every causal path with mathematical certainty, an LLM is the ideal tool for translating those verified facts into fluent prose that business analysts and architects can act on.

That's the right division of labor: let probabilistic models do what they do brilliantly — communicate. Let deterministic analysis do what only it can do — guarantee.

The fluent voice of a deterministic mind. Not the other way around.

What This Moment Means

Anthropic's announcement — and IBM's stock reaction — confirm what we have believed for years: the market now recognizes that AI-driven COBOL modernization is not only possible but also urgent. That is genuinely good news for the industry and for the enterprises that have been waiting for a credible path forward.

Our job, as the company that has actually built the right solution for this problem, is to make sure buyers understand what they're evaluating. The question is not "Can AI analyze COBOL?" Claude Code has demonstrated it can. The question is: "Can you prove what it found — and prove what it missed?"

Phase Change is dedicated to providing the essential logical verification layer that anchors the high-velocity, probabilistic outputs of Generative AI agents in deterministic, auditable fact.


COBOL Colleague is available for an enterprise demonstration. If your organization is evaluating COBOL modernization approaches in light of recent announcements, we welcome the conversation. Contact us to schedule a technical briefing.