From fragmented data to AI-driven purchasing for a top 100 seller
.png)
A top 100 Amazon reseller specializes in athletic apparel and footwear, with an annual revenue of approximately $25 million. Their catalog spans over 4,200 product listings across 16 brands, with Nike representing roughly 88% of their sales volume.
On a typical day, they move around 1,450 units and maintain approximately 98,000 units in stock. Their core product categories include athletic apparel (jackets, joggers, tees, hoodies, shorts), footwear (training shoes, cleats), and sports accessories, with smaller segments in outdoor gear, luggage, and home goods.
The business is highly seasonal, with Q4 holiday demand driving nearly 4x the volume of an average month — December alone accounts for over 160,000 units shipped.
4100+
ASINs tracked daily with automated data collection
1.3 million
New product opportunities discovered
~81%
Reduction in forecast error
on top products
36 months
Of historical marketplace
data integrated
70
Engineered ML features
powering forecasts
0
Manual intervention
required, fully automated
The story
The company (redacted for privacy) is a high-volume Amazon seller managing thousands of ASINs across competitive product categories. They came to Islands with a clear goal: stop making buying decisions based on gut feel and spreadsheets. They wanted to build a system that tells them what to buy, how much, and when. The system would be backed by data and machine learning.
The problem? None of that was possible with what they had. There was no structured historical database. Data was scattered across disconnected sources with gaps, no unified schema, and no automated collection. Purchasing decisions were based on manual checks and basic equations.
They needed the entire stack built from scratch: the data layer, the intelligence layer, and the decision layer. That’s what we built.
The challenge
No Data Infrastructure
Data was scattered across disconnected sources. Sales history had gaps spanning months. Inventory records were incomplete. 99.4% of products had zero recorded sales - the system couldn’t distinguish a dead product from a data gap.
No Forecasting Capability
Without clean time-series data, there was no way to train, evaluate, or deploy a forecasting model. Purchasing was entirely manual.
No Market Visibility
They also could not tell if a new opportunity was real. Portfolio expansion was based on intuition, not intelligence.
Phase 1
Data Foundation
We designed a three-layer PostgreSQL data warehouse that pulls from two complementary sources: the Amazon SP-API, which provides seller data including orders, inventory, returns, traffic, and fees; and Keepa, parsed into 18 structured tables that feed 28 marketplace features into our models.
To keep data flowing reliably, we built custom retry logic with exponential backoff, required cooldowns between endpoint types, and resumable backfill loops that survive crashes. We also built an Active ASIN Guard that filters 31,460 ASINs down to roughly 4,174 active ones, ensuring we only spend enrichment tokens where it counts.
Phase 2
ASIN Discovery & Enrichment
We built a discovery pipeline using Keepa's Finder API, combining a custom sliding window with binary search to scan full brand catalogs on Amazon and bypass the platform's 10,000-result cap. This allowed us to discover over 1.39 million ASINs across multiple brands.
From there, the enrichment pipeline pulls the full product payload for each ASIN and parses it into 25 structured relational tables. To date, 620,000 ASINs have been fully enriched with complete time-series history.
Phase 3
Forecasting & Purchase Order Intelligence
With 91.9% of daily sales values at zero, accurate forecasting required a careful approach. We experimented with multiple model families before settling on a per-ASIN ensemble that selects the single best model for each product.
The top performers are LightGBM Two-Stage — a binary classifier with AUC 0.807 paired with a Tweedie regressor across 111 features, achieving ~30% WAPE — and Amazon Chronos-2, a 120M-parameter model fine-tuned with LoRA, achieving ~29% WAPE. For sparse ASINs, a dedicated Binary Classifier predicts 7, 14, and 30-day sale probability with AUC scores ranging from 0.878 to 0.889. Trained on 70 engineered features drawn from 18 months of data, the ensemble achieves 18.8% WAPE on top ASINs and 29.9% overall.
dense ASINs
Business Logic Layers
On top of the forecasting layer, we built a set of business logic systems to drive real purchasing decisions. Dump signal guardrails flag candidates for liquidation based on four converging signals: demand decline, seasonal timing, competition growth, and inventory depth.
The automated PO engine uses Newsvendor optimization with a 45-day sell-through cap across five checkpoints, and includes kill switches that halt orders in the event of negative margin, a lost buy box, or stale data, with final quantities rounded to meet MOQ rules and budgets allocated across the portfolio by urgency, margin, and days to stockout. Sitting alongside this is a market intelligence layer that scores all 1.39 million ASINs across demand, competition, and margin potential, producing a ranked list of products worth adding to the catalog.
Conclusion
When everything is wired together, the company gets a daily system that answers: what’s selling, what’s dying, what should I restock, what should I dump, how much should I buy, when does it need to arrive, and what new products should I be looking at, all backed by ML forecasts, real-time market data, and hard business constraints.
Islands built this entire system from scratch. It includes the data warehouse and discovery engine. It also includes the ML pipeline and business logic layer. We didn’t hand off requirements to a vendor or plug into a SaaS tool. We built each layer to fit this business’s needs. It has strict API rate limits. Enrichment uses tokens and costs money. The data is very sparse. The buying rules are complex.
Data Sources: Amazon SP-API, Keepa API | ML Models: LightGBM, Amazon Chronos-2 with LoRA, Binary Classifier.
Optimization: Optuna for hyperparameter search | Ensemble: Per-ASIN model selection.
Architecture: 3-layer warehouse (Dimensions → Raw Facts → Daily Aggregates)
4100+
ASINs tracked daily with automated data collection
1.3 million
New product opportunities discovered
~81%
Reduction in forecast error
on top products
36 months
Of historical marketplace
data integrated
70
Engineered ML features
powering forecasts
0
Manual intervention
required, fully automated
Want to build something?
Let’s talk about what you’re working on next and see how we can help.
No pitches, no hard sell. Just a real conversation.
.png)

.png)

%201.png)
.png)