This is a meaningful transition, and the good news is you don’t need to do it all at once. Here’s a pragmatic, phased approach that works well for an existing multi-microservice system.
Phase 0: Foundation (Before Any AI Work)
Before handing anything to AI, your codebase needs to be AI-readable. This is the single biggest predictor of success.
Create a /docs directory at the repo root (or per service) containing:
ARCHITECTURE.md— service boundaries, data flow, sync vs async communication, tech stack per serviceCLAUDE.md(orAGENTS.mdfor Codex) — coding conventions, build/test commands, do’s and don’ts, repo layoutDOMAIN.md— business glossary, entities, invariants- ADRs (Architecture Decision Records) for past major decisions
- API contracts (OpenAPI/Protobuf) checked into the repo
- Per-service
README.mdwith run/test/deploy steps
Without these, AI agents will hallucinate context. With these, they perform like a senior engineer who’s been onboarded.
Phase 1: Map Your Current SDLC Stages
For each stage, identify the inputs, outputs, and humans involved today:
| Stage | Current Output | Owner |
|---|---|---|
| Plan | Roadmap, epics | PM |
| Architecture | C4 diagrams, ADRs | Architect |
| HLD | Service-level design doc | Tech Lead |
| LLD | Class/sequence diagrams, API specs | Senior Dev |
| Development | Code + PR | Dev |
| Testing | Unit/integration/E2E | Dev + QA |
| Security | SAST/DAST/dependency scan | SecOps |
This table becomes your migration checklist.
Phase 2: Tool Assignment
A reasonable split based on each tool’s strengths:
Claude (claude.ai, Claude Code, API) — better for ambiguous, document-heavy, reasoning-heavy work:
- Plan refinement, requirement decomposition
- Architecture and HLD/LLD generation
- ADR drafting, threat modeling
- Code review, security review of diffs
- Test strategy and test case generation
Codex (OpenAI’s coding agent) — better for in-IDE, repo-bound code generation:
- Implementation tasks against tickets
- Refactors, boilerplate, scaffolding
- Writing the actual unit/integration tests
You don’t have to pick one — many teams run both and let each do what it’s best at.
Phase 3: Migrate Stage-by-Stage (Don’t Big-Bang It)
Start with the lowest-risk, highest-leverage stage and expand. Suggested order:
Step 1 — Testing & Security Scanning first. These have objective pass/fail criteria, so AI mistakes are caught automatically. Have Claude generate test cases from your API specs; have Codex implement them. Wire SAST (Semgrep, Snyk) into CI and have Claude triage findings.
Step 2 — LLD generation. Feed Claude an HLD + the relevant service code and ask for sequence diagrams, API contracts, and data models. Human reviews. Low blast radius if wrong.
Step 3 — Development on small tickets. Pick well-scoped tickets (bug fixes, small features in one service). Use Codex or Claude Code with your CLAUDE.md/AGENTS.md in place. Require human PR review.
Step 4 — HLD and Architecture. Higher stakes. Use Claude to draft, but keep a human architect as final approver. Always produce ADRs.
Step 5 — Planning. Last, because it requires the most business context. AI assists with breakdown and estimation, humans set direction.
Phase 4: Guardrails (Non-Negotiable)
Regardless of stage, put these in place before going live:
- Branch protection + mandatory human PR review on
main. AI proposes, human disposes. - CI must pass: lint, unit tests, integration tests, SAST, dependency scan, container scan.
- Sandbox execution: Claude Code and Codex should run in isolated environments, not against prod.
- Secrets hygiene: never put live credentials in prompts or context. Use vault references.
- Audit trail: log every AI-generated PR with the prompt and model used.
- Rollback plan: per-service feature flags so AI-authored changes can be killed fast.
Phase 5: Define the Workflow
A typical end-to-end loop once mature:
A ticket is filed, Claude reads it plus the relevant service docs and produces an LLD with API changes, data migrations, and a test plan. A human tech lead approves the LLD. Codex picks up the LLD and implements the change in a feature branch, generating tests as it goes. CI runs SAST, dependency scanning, and the full test suite. Claude reviews the resulting diff against the LLD and the security checklist, leaving comments on the PR. A human engineer does final review and merges.
Phase 6: Measure and Iterate
Track these from day one so you know if it’s working:
- PR cycle time (ticket → merge)
- Defect escape rate to staging/prod
- Percentage of AI PRs merged without rework
- Security findings caught pre-merge vs post-merge
- Cost per ticket (token spend)
If escape rate goes up, pull back scope. If cycle time and quality both improve, expand.
Practical Starting Point This Week
Pick one microservice, ideally one with decent test coverage and clear boundaries. Write its CLAUDE.md and ARCHITECTURE.md. Migrate just the testing and security-scanning stages for that one service. Run it for two sprints. Then expand.
Trying to migrate all stages across all services simultaneously is the most common failure mode — the context burden becomes unmanageable and the team loses trust in the output.