This blog is co-authored by Divya Chatty (Senior Data Scientist at Gramener) and Praneel Nihar (Lead Data Scientist at Gramener)
Ice cream sandwiches are a delightful treat, but more often than not, one must really earn them.
Ask tiny toddlers who really want their mothers to buy them ice cream, and they’ll be able to tell you all the groundwork that goes into successfully getting one.
The timing and the precision of the request is of utmost importance. Messing up any of those elements will most definitely cost them that treat.
All good data scientists need to be like children. Curious and relentless, they constantly expand their scope of understanding of the world. Last year, we, as data scientists, had the rare privilege of tirelessly, relentlessly, and doggedly pursuing excellence in quality and delivery while expanding our love for data and its mysteries. Day after day.
Whether or not we succeeded in getting our spiffing ice cream sandwiches at the end of this year, the words below must show.
Who’s Spaceman Spiff?
For the uninitiated, Spaceman Spiff is an astronaut alter-ego of Calvin – the titular character of the comic strip ‘Calvin & Hobbes’. Spiff, in his multiple adventures, explores new planets and fights aliens. Whether or not Spiff manages to gain decisive victories each time can most definitely be called into question, but every one of his adventures is funny, dynamic, and crazy.
Spiff has also been an integral and vibrant part of the co-author’s (Divya’s) childhood, so she’s always happy to reference him.
The Ask in 2023
We worked on a smart transportation solution for intelligent customer order consolidation for a warehousing client in the US.
The solution addresses the complex problem of smartly grouping customer orders together so that the number of delivery trucks (loads) sent from the warehouses (where the product is stored) to the retail vendors (where the product is supposed to go) is as cost—and space-efficient as possible while simultaneously adhering to various operational constraints.
Distance, shipping timeline, truck capacity, product type, customer-specific constraints, and delivery parameters are just a few of the operational constraints that need to be considered while optimizing trucks.
Up till the introduction of this solution, specific employees would have had to manually sift through hundreds of orders each day and lay out the daily shipping plan purely based on their experience.
The client organizes its transportation business into “regions,” each pivoting around a specific group of warehouses. Initially, the solution was developed and rolled out for one region.
In 2023, the main task was to extend the solution US-wide to all the regions of the transportation operations by showcasing its value each time and ensuring seamless delivery and integration.
What Challenges Did We Face?
The 5 sections below cover some of the most critical aspects and challenges we faced to fulfil the ask.
Validation is the Foundation
Bringing each new region into the solution is a complex, multi-step task. But the central idea always remains the same. We always want to validate one single statement: “How stable is the model with this new change?”
The hypothesis that we seek to validate in each round of onboarding is that the core of the solution should not change, but the operational behaviors of the region can be incorporated into the model by examining the historical data of the said region.
We have defined a process for validating this hypothesis region after region by devising a stability test that compares the model’s recommendations with historical loads (trucks) and quantifies the differences in dollars saved.
We turn to this stability test when we onboard a region or enhance the model because it can accurately tell us the impact of a change in understandable cost terms.
To put it simply, we look at:
- The cost savings at the macro level and
- Validate the model’s recommendations at the micro level
to identify areas of improvement. We then feed our insights back into the model by tweaking hyperparameters and repeat the process once again till the best possible configuration is achieved.
This validation has thus been truly the foundation for us to achieve the following each time:
- Give us a quantifiable way to measure impact and improve the model.
- Allow us to communicate the model’s abilities to the client in an effective, understandable way and build trust.
Aerodynamically Speaking
Picture the Wright Flyer created by the Wright Brothers and contrast it with a state-of-the-art airplane today. Deliberate efforts to make the design ergonomic have made a monumental difference in aircraft sleekness, speed, and efficiency.
While the stability test is a powerful tool for validating the model’s goodness, it is highly complex in all its stages—prep, execution, and post-analysis.
Through the course of the year, one thing that really helped us carry out these steps again and again was our effort to make these processes as repeatable as possible.
Even though this effort required carving out time from our regular commitments, we sought to consciously identify areas of our work that could be stripped of most of its frills and made automated.
For example, while the detailed analysis of the outcome of the stability test is a manual process, the validation of the important constraints of the model could be automated.
While there is scope for us to streamline things further this year, the automation we managed to achieve helped us save time and allowed us to focus on other complex aspects of our work.
So aerodynamically speaking:
- Properly streamlining processes requires time and significant effort but pays rich dividends in the long run.
- Creating generic checklists requires thought but cuts common mistakes more often than not.
Birds of a Feather Sometimes Don’t Flock Together
We’ve deliberately played on the words of this adage because it has been one of our big learnings this year. The solution was built for one region of operations, but its central assumptions were intended to be region-agnostic.
As we started to work on each region’s data, examining historical patterns and observing whether our model could capture them, we realized that even though all the regions operate similarly, the same optimal parameters don’t hold well for all of them.
In essence, there was a need to make the model work and make the parameters as configurable as possible. This reconfigurability helped in:
- Creating specific levers to aid experimentation.
- Making the codebase generic and robust.
- Accommodating region-specific behavior with minimal interference with other regions.
Birds of the same feather sometimes nest on the same tree but perch on faraway branches.
- Data always helps identify the “faraway branches,” and so the code must be ready to accommodate those changes.
- Realizing that aspect and making our model ready to handle “faraway branches” has made our solution robust in the face of change and maintainable.
Curveballs Ahead!
The presence of checklists and streamlined processes can help in preparedness, but that doesn’t stop curveballs from coming in. The ability to absorb these curveballs and respond appropriately and quickly can differentiate one team from another.
One of the major changes we saw this year was a change in the historical data source. The data source is our starting point while onboarding any region, and our stability analysis and the model depend heavily on it. The change to the data source was unanticipated and required time and effort to align all the processes back.
Carving out time and spending effort on the following points helped us absorb the impact of this curveball:
- Ensuring that the new data source contains all the information required for the analysis.
- Validating the result of the new data source against the old one and measuring impact.
- Communicating the results of the data retention study explicitly.
- Creating new data cleaning pipelines to remove erroneous records (because the old process wouldn’t apply anymore).
- Making the new data cleaning pipeline as configurable as possible.
There will be curveballs thrown regardless of prior preparedness. It is thoroughness in the response to the unexpected that matters.
It Comes Back to the Problem
One of the biggest challenges we faced last year was handling a few edge cases in which the model’s recommendations left something to be desired. While some edge cases could be handled by modifying hyperparameters, others required a deeper study of the model’s core assumptions.
We found it easy to fall into rabbit holes while investigating solutions for those edge cases. What we eventually realized was that it was important to keep the breadth of the model in mind while digging deeper into one anomaly.
It must come back to the problem – the problem that the model already solves well, the problem that the edge case is causing to the end-user, and the problems that might arise if we modify too much. Subsequently, we introduced fixes that resolved the edge cases without disturbing the rest of the model architecture a lot.
As data scientists, always dialing back to the problem helps in:
- Putting the bigger picture in perspective
- Understanding the customer’s issues better
- Making meaningful changes with less invasive impact
Reflective Notes: Why Spiff?
Last year, we had to tread the uncharted and dodge the unexpected – all simultaneously. The challenges we faced helped dismantle a lot of preconceptions. In solving them, through our individual efforts and through the inputs of our team, we learned so much about:
- The stability of the model and measuring impact,
- The good sense in streamlining processes,
- The necessity to make a solution configurable,
- The response to the unexpected and
- The importance of keeping the problem in perspective.
Like toddlers relentlessly, doggedly, and strategically trying to get what they want, we pursued our ambitious goals with a lot of zeal.
Our ability to deliver on time with quality has truly been the payoff for our efforts.
Oh! The fact that we demonstrated that our solution is poised to give the client millions of dollars in savings each year is an incredible cherry on this cake.
But why did we use Spaceman Spiff as a leitmotif in this piece?
In one of his intrepid adventures, Spaceman Spiff is stranded on a remote planet with challenge after challenge to overcome. In Calvin’s world, he has minimal options at his school cafeteria, which excites him gastronomically. In the end, it is a glorious ice cream sandwich that saves him.
So, did we manage to get an ice cream sandwich at the end of it all? We certainly think so!
This blog is written as part of our CoCreate Internal Blog Writing Competition to foster innovation and collaboration. Participants from technical teams share ideas and solutions, showcasing their creativity and expertise. Through this competition, we aim to highlight the power of collective intelligence and the potential of co-creation in solving complex challenges.