The client is a major educational publishing house that aggregates and distributes textbooks from various publishers. Their annual performance depends on one mission-critical question: which schools are assigning which books.
To answer that question at scale, they process two different data streams every academic year: incoming publisher catalogues and outgoing school book lists. Both sources arrive in different structures and quality levels, creating significant operational complexity.