From kickoff to working proof of concept in a week with Claude Code
Greenfield discovery used to mean weeks of read-only research before anything ran. With Claude Code feeding off the kickoff transcript and a deliberately stupid mock layer, I had a runnable front end and back end in front of the team by the end of week one.
The week-one problem on greenfield work
Greenfield projects with unfamiliar third-party systems have a familiar shape in the first month. The team reads documentation. Someone draws boxes on a whiteboard. A senior engineer disappears for two weeks to "do discovery". Nothing runs.
I have done that version of the job many times. It is not wasted work, but the gap between "we have a brief" and "the team has something to build on" tends to stretch to three or four weeks. On a programme with an inspection deadline driving everything else, that gap is expensive.
On a recent education-sector engagement I joined as Solutions Architect, I closed that gap to a week. The tool that made it possible was Claude Code. The discipline that made it work was treating every AI output as a draft to be verified against the actual systems.
This post is about how that week ran, what I asked the AI to do, and where I had to push back on it. It is the first in a short series. The second will cover the mock layer in detail. The third will cover what survived my departure from the team.
What discovery normally looks like
The brief on this engagement was a unified learner dashboard backed by a progress engine, integrated with an external e-portfolio platform and feeding a legacy VLE through SAML SSO. Three third-party systems I had not used before. A team of around nine developers, predominantly junior, including several bootcamp graduates. An inspection deadline that the wider programme had to evidence progress against.
The traditional path here is roughly:
- Read the docs for each third-party platform. Build a shared glossary.
- Draw an architecture diagram. Argue about it.
- Pick a stack. Set up repositories. Wire up auth.
- Write a thin vertical slice through one screen to prove the shape is right.
- Hand the rest of the team something concrete to extend.
That sequence usually takes three to four weeks on greenfield work. Not because any single step is hard, but because each one has to land before the next can start, and the team is mostly idle while one or two seniors do the reading.
The cost is not just time. The longer the team waits for something to build on, the more likely they are to start building speculatively against assumptions that will turn out to be wrong. Junior developers in particular need a running system to point at. Architecture diagrams alone do not give them that.
Feeding the kickoff straight in
The kickoff workshops on this engagement ran over two days. Domain experts walked through the learner journey, the progress model, and the integration points. I recorded the sessions with consent and produced transcripts.
Those transcripts went straight into Claude Code as context. Not summarised by me first. The full thing.
I then asked it specific questions. What are the named entities the domain experts kept returning to? Where did they describe a state transition? Where did two people use different words for what sounded like the same concept? Which integration points did they describe as "we already have that" versus "we need to build that"?
The answers were not always right. The model conflated two of the third-party platforms more than once because the domain experts themselves used overlapping language. I had to go back and ask "where in the transcript does the speaker actually distinguish between platform A and platform B" and walk through the quoted lines. That correction loop is the work. It is also far faster than reading 90 minutes of transcript end to end myself.
By the end of day two I had a draft domain glossary, a draft list of bounded contexts, and a draft list of integration points. Each one cited the transcript line it came from, so the senior domain expert on the client side could verify or reject each entry in about an hour.
Design screenshots into a component library
The kickoff also produced a set of design screenshots. Not a full design system, just annotated mockups of the main learner views.
I fed those screenshots in and asked Claude Code to derive a Vue.js component library from them. What components recur? What variants does each have? What is the minimum set of primitives that would let the team rebuild every screen shown?
The first pass was over-eager. It invented components that did not appear in the screenshots and missed two that did. I corrected it by pointing at specific screenshot regions and asking it to reconcile its component list with what was actually visible. Three iterations got me to a list I was willing to defend.
The output was a scaffolded Vue.js component library with sensible names, prop shapes, and stub implementations. Not production code. A starting point that meant the front-end developers on the team did not have to argue about naming conventions before they could write their first component.
Parallel research across unfamiliar platforms
Three third-party platforms I had not worked with before: a learner experience platform, an e-portfolio system, and an internal LMS. Each had its own data model, its own auth story, and its own opinions about what a "learner" was.
I ran parallel research agents, one per platform. Each agent had a narrow brief: read the public documentation, identify the entities relevant to our integration points, produce a data mapping back to our domain glossary, flag any ambiguity.
The outputs were data mapping tables with citations. Every row in every table linked back to the documentation page that supported it. Where the documentation was silent or contradictory, the agent said so explicitly rather than guessing.
This is the verification discipline that matters. An AI mapping with no citations is worse than no mapping, because it looks authoritative. A mapping with citations is a draft a domain expert can confirm in an afternoon. The senior domain expert on the client side did exactly that, and corrected about a fifth of the rows. The other four-fifths held up.
A failure mode worth naming
The one place the AI struggled, and kept struggling, was telling where one third-party platform ended and the next began. Two of the platforms had overlapping concepts of "course" and "enrolment". Their documentation used the same words for different things.
I could not get the model to reliably separate them from documentation alone. What worked was showing it the actual API responses from each platform, side by side, and asking it to produce a mapping rooted in the response shapes rather than the prose.
This is a useful pattern in general. When AI output is unreliable on a topic, the fix is usually to ground it in something concrete the model can quote. Documentation prose is the worst grounding. Real data is the best.
The deliberately stupid mock layer
By the end of week one I had a runnable Vue.js front end and a matching C# backend-for-frontend, orchestrated locally with .NET Aspire. Auth0 was wired up. SAML SSO into the legacy VLE was stubbed but the shape was right.
The thing that made it runnable was a deliberately stupid mock service layer. Every endpoint returned canned responses. No database. No third-party calls. No conditional logic beyond "if the route matches, return this fixture".
That sounds trivial. It is the most important decision I made in week one.
Because every endpoint returned something plausible, the team could navigate the application end to end on day one of week two. They could click through the learner dashboard, see the progress engine outputs, follow the e-portfolio link out and back. None of it was real. All of it had the shape of the real thing.
The data contract baked into those mocks became the contract the back end was built against. There was no rebuild between PoC and MVP. The MVP shipped roughly two weeks after the inspection itself, and the build state at the inspection date was enough for the client to evidence the gap was being actively addressed.
I am going to write the mock layer up properly in the next post in this series. The short version: stupidity is a feature. The moment a mock layer starts to have logic, it starts to lie about what the real system will do.
What this approach does not replace
It would be easy to read the above and conclude that AI replaces architectural judgement. It does not.
Every decision that mattered in week one was mine. Which boundaries to draw between services. Which third-party platform owned which concept. What the data contract should look like. Where to put the seams that would let the team move in parallel.
Claude Code accelerated the reading, the drafting, and the scaffolding. It did not make the calls. When the originally-planned upstream data pipeline turned out to be a labelled proof of concept that was not viable for production, I had to pivot the architecture to read directly from the source SFTP. No AI was going to make that judgement for me. It needed someone who had seen that pattern fail before.
The same goes for the team shape. A senior teammate left mid-engagement. A junior went off sick. The architecture had to be one that the remaining people, mostly juniors, could keep extending without me sitting on every PR. That is an architectural choice rooted in knowing the team, not in any prompt I could write.
What it makes possible
The honest claim is narrower than "AI does discovery for you". It is this: AI lets one experienced architect produce, in a week, the artefacts that used to take a small team three or four weeks to produce. Glossaries, mappings, component libraries, scaffolded code, runnable mocks.
The team gets something to build on faster. The client sees motion sooner. The architect spends more time on the calls that actually need a human, and less time on the reading that does not.
The discipline is everything. Verify every output against the real system. Cite sources. Show the team where the AI was wrong on the actual codebase, so they learn to distrust it in the right places. Treat the model as a fast, slightly unreliable junior who needs supervision, not as an oracle.
If you are starting a greenfield project against a hard deadline and you are nervous about the discovery phase eating your runway, the workflow above is a reasonable starting point. I am happy to talk it through if it would help. Drop me a line and we can compare notes on what your first week could look like.
