Case studies / Top 100 Amazon Reseller

From fragmented data to AI-driven purchasing for a top 100 seller

Use case
Data Engineering, Forecasting, ML
Industry
E-Commerce
Tech stack

A top 100 Amazon reseller specializes in athletic apparel and footwear, with an annual revenue of approximately $25 million. Their catalog spans over 4,200 product listings across 16 brands, with Nike representing roughly 88% of their sales volume.

On a typical day, they move around 1,450 units and maintain approximately 98,000 units in stock. Their core product categories include athletic apparel (jackets, joggers, tees, hoodies, shorts), footwear (training shoes, cleats), and sports accessories, with smaller segments in outdoor gear, luggage, and home goods.

The business is highly seasonal, with Q4 holiday demand driving nearly 4x the volume of an average month — December alone accounts for over 160,000 units shipped.

Results

The story

The company (redacted for privacy) is a high-volume Amazon seller managing thousands of ASINs across competitive product categories. They came to Islands with a clear goal: stop making buying decisions based on gut feel and spreadsheets. They wanted to build a system that tells them what to buy, how much, and when. The system would be backed by data and machine learning.

The problem? None of that was possible with what they had. There was no structured historical database. Data was scattered across disconnected sources with gaps, no unified schema, and no automated collection. Purchasing decisions were based on manual checks and basic equations.

They needed the entire stack built from scratch: the data layer, the intelligence layer, and the decision layer. That’s what we built.

The challenge

No Data Infrastructure
Data was scattered across disconnected sources. Sales history had gaps spanning months. Inventory records were incomplete. 99.4% of products had zero recorded sales - the system couldn’t distinguish a dead product from a data gap.

No Forecasting Capability
Without clean time-series data, there was no way to train, evaluate, or deploy a forecasting model. Purchasing was entirely manual.

No Market Visibility
They also could not tell if a new opportunity was real. Portfolio expansion was based on intuition, not intelligence.

Phase 1

Data Foundation

We designed a three-layer PostgreSQL data warehouse that pulls from two complementary sources: the Amazon SP-API, which provides seller data including orders, inventory, returns, traffic, and fees; and Keepa, parsed into 18 structured tables that feed 28 marketplace features into our models.

To keep data flowing reliably, we built custom retry logic with exponential backoff, required cooldowns between endpoint types, and resumable backfill loops that survive crashes. We also built an Active ASIN Guard that filters 31,460 ASINs down to roughly 4,174 active ones, ensuring we only spend enrichment tokens where it counts.

Metric
Before
After
Connected data sources
0
2 (SP-API + Keepa)
Unified database tables
0
34 tables, 3 layers
SP-API history depth
~12 months (gapped)
18 months (complete)
Keepa history depth
0
36 months
ASINs tracked daily
0
4,174
Products with usable sales data
0.6%
100% of active catalog
Manual intervention required
Constant
Zero (fully automated)
Active ASIN filtering
None (31,460 ASINs)
Smart guard (~4,174)

Phase 2

ASIN Discovery & Enrichment

We built a discovery pipeline using Keepa's Finder API, combining a custom sliding window with binary search to scan full brand catalogs on Amazon and bypass the platform's 10,000-result cap. This allowed us to discover over 1.39 million ASINs across multiple brands.

From there, the enrichment pipeline pulls the full product payload for each ASIN and parses it into 25 structured relational tables. To date, 620,000 ASINs have been fully enriched with complete time-series history.

Metric
After
New ASINs discovered
1.39 million discovered, 620,000 enriched
Data points per ASIN
25 structured tables per ASIN + 200+ variables
Data categories covered
12 (pricing, demand, competition, reviews, etc.) + 2B+ rows
History type
Full time-series (not just snapshots)
Pipeline resilience
Crash-safe, resumable, token-aware

Phase 3

Forecasting & Purchase Order Intelligence

With 91.9% of daily sales values at zero, accurate forecasting required a careful approach. We experimented with multiple model families before settling on a per-ASIN ensemble that selects the single best model for each product.

The top performers are LightGBM Two-Stage — a binary classifier with AUC 0.807 paired with a Tweedie regressor across 111 features, achieving ~30% WAPE — and Amazon Chronos-2, a 120M-parameter model fine-tuned with LoRA, achieving ~29% WAPE. For sparse ASINs, a dedicated Binary Classifier predicts 7, 14, and 30-day sale probability with AUC scores ranging from 0.878 to 0.889. Trained on 70 engineered features drawn from 18 months of data, the ensemble achieves 18.8% WAPE on top ASINs and 29.9% overall.

Metric
Before
After
Forecast accuracy (WAPE) —
dense ASINs
~100% (no forecast)
18.8% (top) / 29.9% (all)
Forecast horizon
None
45 days
Engineered ML features
0
70
Hyperparameter optimization
None
Automated (Optuna)
Forecast error reduction
N/A
~81%

Business Logic Layers

On top of the forecasting layer, we built a set of business logic systems to drive real purchasing decisions. Dump signal guardrails flag candidates for liquidation based on four converging signals: demand decline, seasonal timing, competition growth, and inventory depth.

The automated PO engine uses Newsvendor optimization with a 45-day sell-through cap across five checkpoints, and includes kill switches that halt orders in the event of negative margin, a lost buy box, or stale data, with final quantities rounded to meet MOQ rules and budgets allocated across the portfolio by urgency, margin, and days to stockout. Sitting alongside this is a market intelligence layer that scores all 1.39 million ASINs across demand, competition, and margin potential, producing a ranked list of products worth adding to the catalog.

Conclusion

When everything is wired together, the company gets a daily system that answers: what’s selling, what’s dying, what should I restock, what should I dump, how much should I buy, when does it need to arrive, and what new products should I be looking at, all backed by ML forecasts, real-time market data, and hard business constraints.

Islands built this entire system from scratch. It includes the data warehouse and discovery engine. It also includes the ML pipeline and business logic layer. We didn’t hand off requirements to a vendor or plug into a SaaS tool. We built each layer to fit this business’s needs. It has strict API rate limits. Enrichment uses tokens and costs money. The data is very sparse. The buying rules are complex.

Database: PostgreSQL
Data Sources: Amazon SP-API, Keepa API | ML Models: LightGBM, Amazon Chronos-2 with LoRA, Binary Classifier.
Optimization: Optuna for hyperparameter search | Ensemble: Per-ASIN model selection.
Architecture: 3-layer warehouse (Dimensions → Raw Facts → Daily Aggregates)

Want to build something?

Let’s talk about what you’re working on next and see how we can help.

Book a call

No pitches, no hard sell. Just a real conversation.

contact image
PREMIUM CONTENT

The AI Revolution: Enterprise scale at startup speed

You'll receive 1-2 valuable AI insights per month. We never spam and you can unsubscribe anytime.
or book a discovery call

Message sent successfully!

How about booking a consultation? Schedule a 1-on-1 with our Accounts team selecting a date and time below.
Book a discovery call
Oops! Something went wrong while submitting the form.
Book cover titled 'The AI Agent Revolution: Building Enterprise AI at Startup Speed' with illustration of a robot and a person shaking hands, surrounded by tech-related logos.