The Company: A global luxury fashion marketplace operating globally, working with 150 in-scope partner seller sites who list products on the platform — and also sell those same products on their own websites.
The commercial team faced a problem they couldn't quantify. Products were missing from their marketplace — items sellers had on their own sites but hadn't listed on the platform. Every missing item was missing GMV.
But no one could tell sellers which products were missing. Or how many. Or for which brands and categories. Account managers walked into negotiations with hunches instead of numbers. Sellers had no reason to change anything.
300,000 Pages. 10% Complete. 90% Outdated.
Before working with us, account sales teams collected assortment data manually. At this scale — 150 sellers, hundreds of brands, multiple categories — that's roughly 300,000 pages to check every week. Coverage reached about 10%.
Over 120 hours per week of senior commercial time across 20 account managers — not spent on strategy, but on data collection that still couldn't be trusted. The data that did exist was 3-4 weeks old by the time it reached decision-makers. In luxury fashion, where inventory turns weekly, that meant negotiating with information from two seasons ago.
48-Site Proof in ~48 Hours
For the proof of concept, we scraped 48 of their 3P seller sites. Within 48 hours, structured data showed product counts by brand, by category, by seller. Accuracy was 98% in that proof scope.
The commercial leadership team could immediately see where gaps existed — and more importantly, could show sellers the exact same data.
Use Case 1: Assortment Intelligence (Weekly)
Every week, structured data arrives across all 150 in-scope seller sites:
| Field | Example |
|---|
| Store | Partner Boutique (UK) |
| Country | United Kingdom |
| Category | Men |
| Subcategory | Clothing |
| Brand | Gucci |
| Total Products | 66 |
| Out of Stock | 0 |
These fields make assortment planning actionable. Account managers walk into negotiations knowing precisely what to ask for.
The Complexity Behind Weekly Delivery
Luxury e-commerce sites are notoriously difficult to scrape. Here's what weekly delivery actually requires:
Anti-bot protection Many luxury retailer sites actively resist automated collection — CAPTCHA, rate limiting, behavioral analysis. We use multiple approaches to maintain scheduled delivery reliability across all in-scope sites.
Site navigation Every site structures their catalog differently — infinite scroll, pagination, "load more" buttons, nested categories. Complete product discovery requires handling all of these patterns.
Constant changes Dozens of those 150 sites require maintenance in any given week due to HTML or URL changes. Detection and fixes happen before scheduled delivery.
Cross-region normalization Some seller sites are in Italian, French, or Spanish. Naming and mapping normalization ensures results are comparable across regions.
The Result: "You Have 29 Gucci. Average is 258."
That's the kind of conversation account teams can now have.
Before, negotiations were vague: "We think you're missing some products." Now they're specific: "For Gucci menswear, you have 29 products listed. The average across our sellers is 258. Here's the gap by category."
Assortment completion went from 50% to 90-98%. Sellers started adding products they hadn't realized were missing — or had kept exclusive to their own sites.
- 120+ hours per week returned to the commercial team. Twenty account managers, no longer doing manual data collection.
- Negotiation conversations changed. When you have exact numbers by brand and category, sellers respond differently than when you're estimating.
Use Case 2: Cross-Border D2C Pricing
Once the assortment team had weekly, trusted data, the pricing team explored the same delivery model for a different challenge.
Two years into the engagement, the Global Senior Director of Operations and Senior Pricing Lead reached out. After COVID, luxury brands had shifted focus to their direct-to-consumer channels. Several top luxury brands were running sales on their own sites — with no visibility into that pricing from the marketplace side.
Without D2C pricing data, there was risk of either leaving money on the table (pricing below market) or losing customers (pricing above competitors). In luxury fashion, even a 5% pricing gap drives customers to D2C checkout.
Very reliable — data we don't have to second-guess.
GDGlobal Senior Director of Operations, recommending us to the pricing team
That internal recommendation is why they came to us instead of starting over with a new vendor.
This engagement was larger: 300 brands across 15 countries — roughly 4,500 brand-country combinations tracked every two weeks.
| Field | Example |
|---|
| Product ID | 1575825 |
| Product Name | Example product name (redacted) |
| Category | Hats |
| Full Price | $176.00 |
| Sale Price | $123.20 |
| Discount | 30% |
| Status | Active |
| Sizes | One Size |
These fields feed directly into the data warehouse. Analysts match scraped D2C prices against their own catalog by brand ID, build BI dashboards for commercial and catalog teams, and inform pricing decisions based on real market data — not assumptions.
The Complexity of Cross-Border Pricing
Tracking 300 brands across 15 countries creates challenges most tools can't handle:
Currency and format handling Each country displays prices differently — 1,005.00 USD vs 1.000,55 EUR vs 1'000.00 SGD. Everything is normalized into comparable formats.
Regional URL variations Sometimes the URL stays the same but the country/price changes. Region detection and scraping the correct market is handled automatically.
Heavy anti-bot protection Luxury D2C sites use aggressive anti-scraping measures. These defenses are handled to maintain access across regions.
Product ID extraction Product IDs can appear in different parts of the page. We extract the identifier that maps to their internal catalog.
Scale maintenance When tracking 4,500 site-country combinations, dozens change their structure in any given period. Detection and fixes happen before scheduled delivery.
Two Years, Two Use Cases, One Relationship
This has been a 2+ year customer relationship. It started with 48 sites and 100 brands for assortment tracking. Now two parallel data programs run continuously:
| Program | Scope | Frequency |
|---|
| Assortment Intelligence | 150 sites × 200 brands × 10+ categories | Weekly |
| D2C Pricing Intelligence | 300 brands × 15 countries | Bi-weekly |
Both data streams feed directly into their data warehouse tables accessible to analysts, commercial teams, and catalog managers. No manual transformation. No CSV cleanup.
The combined impact:
- Assortment completion: 50% → 90-98%
- Commercial team time saved: 120+ hours/week
- Pricing algorithms had fresher D2C inputs across regions (less reliance on assumptions)
- Parity decisions were made with D2C visibility across 15 countries
- Brand pricing strategies visible across all tracked markets
Who This Helps
This story resonates with fashion and luxury teams facing similar challenges:
- Marketplaces that work with third-party sellers and need assortment visibility
- Brands or retailers tracking D2C competitors across multiple regions
- Commercial teams negotiating without reliable data
- Pricing teams building algorithms that need accurate market inputs
See What This Looks Like for Your Catalog
We'll scrape your actual products from your actual competitors or partners. You'll see real data for an agreed proof scope within 48 hours.
Request a Sample DeliveryNo commitment. No setup on your end.