Case Study

Building Product Aggregation Infrastructure for Sustainability Comparison

How web scraping and data normalisation powered a cross-vendor comparison platform

Client: UK sustainability comparison startup

Industry:Sustainability & GreenTech Solutions

Services:Architecture Advisory

BP

Key results at a glance

2
Approach
scraping methods (HTTP + Selenium)
3
Tech
platforms (Docker, K8s, GCP)
4
Duration
months

The challenge

The Problem

An Innovate UK-funded startup was building a sustainability-focused product comparison platform. The vision: a Chrome extension that detects products users view online and surfaces alternatives from more sustainable sources, complete with price comparisons and sustainability ratings.

The Technical Challenge

To power comparisons, the platform needed product data from across the e-commerce landscape:

  • Multiple retailers: Each with different site structures
  • Traditional sites: Standard HTML that could be parsed
  • Single-page applications: JavaScript-rendered content requiring different approaches
  • Data normalisation: Disparate formats needed standardisation for comparison

The Architecture

The platform was built on true microservices - Docker containers orchestrated by Kubernetes on Google Cloud. This was the first exposure to containerisation at this scale.

The results

Key results

  • Web scraping infrastructure for traditional sites and SPAs
  • Data normalisation pipeline feeding comparison algorithms
  • First microservices experience with Docker, Kubernetes, GCP
  • React Chrome extension showing sustainable alternatives
  • Technical success despite startup's product-market fit challenges

Outcomes

Technical Delivery

  • Scraping infrastructure handling traditional and SPA sites
  • Data normalisation pipeline feeding comparison algorithms
  • Chrome extension displaying sustainable alternatives

Architecture Learning

First experience with true microservices architecture:

  • Docker containers for isolation
  • Kubernetes for orchestration
  • Google Cloud for hosting

This engagement was transformative for technical development - learning containerisation properly shaped subsequent architecture decisions.

Honest Outcome

The startup ultimately folded. The core challenge they couldn't solve was sourcing reliable sustainability ratings. They tried multiple approaches, eventually considering manual human research - a fundamentally unscalable solution.

When Innovate UK funding dried up, the team was downsized. The technical infrastructure worked; the product-market fit problem proved insurmountable. This experience provides valuable perspective when advising startups on technical versus business risk.

The solution

Our Approach

We built the data ingestion pipeline feeding the comparison engine.

Web Scraping Infrastructure

For traditional sites:

  • C# with HttpClient for page retrieval
  • HTML Agility Pack for parsing and extraction
  • Structured data extraction into common format

For single-page applications:

  • Selenium running headless Chrome
  • JavaScript execution to render content
  • Same extraction patterns once content available

Data Normalisation

Product data from disparate sources needed standardisation:

  • Common product schema
  • Price normalisation
  • Category mapping
  • Availability status

Chrome Extension Contribution

Also contributed to the React-based Chrome extension displaying a side panel when users viewed products in the catalogue, showing sustainable alternatives with ratings and prices.

Microservices Architecture

First proper exposure to:

  • Docker containerisation
  • Kubernetes orchestration
  • Google Cloud Platform deployment

Ready to achieve similar results?

Let's discuss how we can help your organisation achieve these results.

Book a strategy call

Architecture Advisory

De-risk critical architecture decisions with on-demand senior advice. Get peer-level technical depth for complex systems, AI adoption strategies, and architectural reviews, without hiring a full-time architect.

Learn more →