Building Product Aggregation Infrastructure for Sustainability Comparison
How web scraping and data normalisation powered a cross-vendor comparison platform
Client: UK sustainability comparison startup
Industry:Sustainability & GreenTech Solutions
Services:Architecture Advisory
Key results at a glance
The challenge
The Problem
An Innovate UK-funded startup was building a sustainability-focused product comparison platform. The vision: a Chrome extension that detects products users view online and surfaces alternatives from more sustainable sources, complete with price comparisons and sustainability ratings.
The Technical Challenge
To power comparisons, the platform needed product data from across the e-commerce landscape:
- Multiple retailers: Each with different site structures
- Traditional sites: Standard HTML that could be parsed
- Single-page applications: JavaScript-rendered content requiring different approaches
- Data normalisation: Disparate formats needed standardisation for comparison
The Architecture
The platform was built on true microservices - Docker containers orchestrated by Kubernetes on Google Cloud. This was the first exposure to containerisation at this scale.
The results
Key results
- Web scraping infrastructure for traditional sites and SPAs
- Data normalisation pipeline feeding comparison algorithms
- First microservices experience with Docker, Kubernetes, GCP
- React Chrome extension showing sustainable alternatives
- Technical success despite startup's product-market fit challenges
Outcomes
Technical Delivery
- Scraping infrastructure handling traditional and SPA sites
- Data normalisation pipeline feeding comparison algorithms
- Chrome extension displaying sustainable alternatives
Architecture Learning
First experience with true microservices architecture:
- Docker containers for isolation
- Kubernetes for orchestration
- Google Cloud for hosting
This engagement was transformative for technical development - learning containerisation properly shaped subsequent architecture decisions.
Honest Outcome
The startup ultimately folded. The core challenge they couldn't solve was sourcing reliable sustainability ratings. They tried multiple approaches, eventually considering manual human research - a fundamentally unscalable solution.
When Innovate UK funding dried up, the team was downsized. The technical infrastructure worked; the product-market fit problem proved insurmountable. This experience provides valuable perspective when advising startups on technical versus business risk.
The solution
Our Approach
We built the data ingestion pipeline feeding the comparison engine.
Web Scraping Infrastructure
For traditional sites:
- C# with HttpClient for page retrieval
- HTML Agility Pack for parsing and extraction
- Structured data extraction into common format
For single-page applications:
- Selenium running headless Chrome
- JavaScript execution to render content
- Same extraction patterns once content available
Data Normalisation
Product data from disparate sources needed standardisation:
- Common product schema
- Price normalisation
- Category mapping
- Availability status
Chrome Extension Contribution
Also contributed to the React-based Chrome extension displaying a side panel when users viewed products in the catalogue, showing sustainable alternatives with ratings and prices.
Microservices Architecture
First proper exposure to:
- Docker containerisation
- Kubernetes orchestration
- Google Cloud Platform deployment
Ready to achieve similar results?
Let's discuss how we can help your organisation achieve these results.
Book a strategy call