Uncategorized

Why Logistics GenAI Stalls Before It Delivers — and the Data Foundation Question Every Supply Chain Leader Needs to Ask First

May 14, 2026 | Author: Christian Gylseth
Digital illustration featuring a blue globe labeled 'GenAI' at the center, connected by lines to various trucks and buildings, symbolizing logistics and artificial intelligence integration.

Every supply chain leader has seen this story play out. A GenAI pilot demos beautifully in a boardroom, the demand forecast looks sharper, the route plan looks tighter, the customs paperwork seems to read itself. Six months later, it has quietly disappeared from operations. No public failure. No internal post-mortem. Just a slow drift back to spreadsheets, exception queues, and dispatcher gut-feel.

The reflexive read is that the model was wrong. The actual answer, almost every time, is that the foundation underneath it was never stress-tested before the build began. MIT’s NANDA initiative reports that 95% of enterprise GenAI pilots produce no measurable P&L impact. The reason isn’t the model, it’s the data the model was asked to operate on.

The pattern behind every stall

Logistics GenAI doesn’t usually fail at the model layer. It fails upstream, at the data layer, where leaders rarely look until the build is already underway. Gartner has now placed GenAI in the trough of disillusionment on its 2025 Supply Chain Strategy Hype Cycle, and the firm separately predicts that 60% of AI projects will be abandoned through 2026 because they aren’t supported by AI-ready data.

Three failure patterns repeat across logistics deployments: demand forecasts trained on incomplete ERP data, route optimisation built on historical patterns without live network signals, and document processing that cannot handle the variability of real freight paperwork. Each is a data foundation failure. Each is predictable before the build starts.

Demand forecasting built on incomplete ERP data

ERP data feels comprehensive until you forecast at the SKU-lane level, where replenishment decisions actually get made. Promotion flags missing. Returns sitting in a separate system. Substitutions managed in a planner’s spreadsheet. Exception handling done outside the ERP and never written back.

The aggregate forecast on the dashboard looks reasonable. The forecast that drives the actual purchase order doesn’t.

McKinsey’s 2024 Global Supply Chain Leader Survey found that just over half of supply chain leaders rate their planning-system data as adequate, and that advanced planning system implementations consistently get bogged down on master data with only half of those projects ultimately delivering the business case originally promised. A model layered on top of that data inherits every gap silently.

The data foundation question this raises for any supply chain leader: is our demand signal actually complete, or are we forecasting on the half of reality our ERP happened to capture?

Route optimisation without live network signals

Historical patterns are necessary for route optimisation. They are not sufficient. Without live signals, port congestion, customs queues, carrier capacity, weather, traffic, the model produces routes that are mathematically optimal and operationally impossible.

The disconnect is the one every dispatcher already knows: the system says route through X, while the team on the ground knows X has been gridlocked since Tuesday.

The numbers behind this gap are stark. Sea-Intelligence puts global container schedule reliability at around 62% across 2025, meaning roughly four in ten containers don’t arrive when scheduled. In US trucking, ATRI’s 2025 cost analysis shows empty miles still running at 16.7% of total mileage. McKinsey reports that companies take an average of two weeks to plan and execute a response to a major supply chain disruption far longer than the weekly S&OP cycle the response is supposed to inform.

The data foundation question: does our optimisation engine know what our dispatchers know?

Document processing that can’t handle real freight variability

Bills of lading. Commercial invoices. Customs declarations. Proofs of delivery. Freight documentation is staggeringly heterogeneous across carriers, across geographies, across shippers, even across a single shipper’s lanes.

A McKinsey study of trade documentation found that a single shipment can require up to 50 sheets of paper exchanged across as many as 30 stakeholders. As of January 2025, the Digital Container Shipping Association reported electronic bill of lading adoption at just 5.7%. The rest is still paper, scans, faxes, and PDFs annotated by hand.

Models trained on clean, standardised samples collapse when the real long tail arrives. The exception queue that automation was supposed to shrink starts growing instead and operations teams find themselves pulled back into the loop within months.

The data foundation question here isn’t about model accuracy. It is: have we tested this against our worst documents, or only our cleanest?

Why these failures are predictable

The thread is the same across all three patterns. The data foundation question was either skipped entirely or answered by the wrong people, typically IT, in isolation from operations.

McKinsey’s research on digital transformation outcomes is unambiguous: the projects that survive are the ones where business operations led the data work, not the ones where IT delivered a technology and handed it over. RAND Corporation’s analysis of why AI projects fail puts inadequate data and infrastructure among the top root causes alongside misunderstood problem definitions and a fixation on the technology rather than the problem.

Data foundation work is not an IT exercise. It is an operations exercise. Operations is where the consequences land.

The foundation assessment every leader should run first

Before approving any logistics GenAI build, five questions belong on the table:

  • Where does this data live, and how many systems are we stitching together to feed the model?
  • How fresh is the data in operational terms, not IT terms?
  • What does the long tail of edge cases look like, and is any of it represented in our training set?
  • Who owns the data lineage end-to-end, from system of record to model input?
  • What does “complete enough” actually mean for this specific use case?

If those questions can’t be answered cleanly, the model isn’t the next investment. The foundation is.

The question that matters

The shift in the question that matters is small but decisive. Not “can we build this?” most things can be built. The question is whether the foundation is ready to support what we build on top of it.

The GenAI projects that deliver in logistics are the ones where that question was asked first. The ones that stall are the ones where it was asked last.