Friday, March 18, 2011

Predictive Analytics - The New Avenue for Cost Savings at Transport and Logistics Companies

It is hard to squeeze blood out of a stone, as the saying goes - and the CFOs of most logistics companies around the world probably find this saying to be particularly appropriate right about now. However CFOs who think they've run out of opportunities to cut costs, may just find themselves pleasantly surprised.

This discussion will center around using predictive analytics to better forecast resource requirements at transport and logistics companies. Predictive analytics is the technique of exploiting patterns in transactional data to make predictions ahead of time. In this case we will look at it in the context of predicting package volumes, on the assumption that if you know your volumes, you can more efficiently schedule your resources.

The case for looking at this aggressively is quite strong. If we look at a company with $4 million per month of variable cost and we can manage to reduce this by 1% through more accurate forecasting, we are talking about a $40,000 per month ($480,000 p.a.) saving, if we can get this up to 5% this figure moves to $200,000 per month ($2.4 million p.a.). It is not unreasonable to expect that we can expect to improve our forecasting accuracy within this range.

Generally business executives will prefer to operate a business that is well scheduled as opposed to reactive. There are a number of good reasons for this; one being it is less expensive to deploy scheduled resource than emergency resource, all things being equal you have a better chance of maintaining high service levels in a scheduled operation than one which is reactive, and not to be under-estimated a scheduled operation seems to increase the chances of management and staff alike maintaining their sanity.

We can safely say that a business running closer to the scheduled end of the scheduled/reactive continuum is better for all stakeholders (including the fork lift driver's wife who knows if her husband will be home for dinner or not..).

So, beginning with looking at high level tasks and resources:

If you are a transportation company, your job is to deliver goods from point A to point B. The fundamental steps are:

1. Pick goods up from your customer.
2. Deliver to hub for sorting.
3. Deliver goods to your customer's customer.

Clearly a gross over-simplification, but sometimes it pays to keep things simple.

So what are our resources?
  • Trucks (our own)
  • Trucks (third parties, regularly scheduled)
  • Trucks (third parties, scheduled at short notice)
  • Labor
In keeping with our spirit of over-simplification, let's also assume that our resource requirement scales in a somewhat linear fashion, though it has capacity steps in it. So that is to say that to deliver 10,000 packages we will require approximately double the resource than we would require to deliver 5,000 packages.  Again this over-simplifies the case, but it does not affect the fundamentals of our discussion.

This benefits us, in that it makes our objective very clear - if we can accurately predict package volume, we can accurately predict resource requirement. So our objective is to predict package volume as accurately as we possibly can.

Now we will add one qualification to this objective. It is infinitely better to be able to forecast resource requirements 24 hours out, rather than 1 hour out, in fact one week out would be even better still. So we can state our objective (for example) as:

 To increase our ability to predict package volumes 24 hours ahead of time, so that we can more efficiently schedule resources.

The CFO will like this. She is sick of paying for labor that was not required. Not nearly as sick, mind you, as she is of paying the invoices to trucking companies that had to be called in in at the 11th hour in order to provide extra capacity, and get orders out on time.  With this objective we are hoping to get closer to the theoretical nirvana of 'no tasks without resources, and no resources without tasks'.

So now we have established what we are trying to do, and why, let's look at how.

How is, by bringing in techniques that have been in use in another industry - the insurance industry - since 1762. Just as insurance companies take an actuarial approach to expected payout for any given customer - we are going to take a similar approach to forecast the projected volumes on any given day.

To clarify how it works in insurance, let's use the example of life insurance...life insurance companies don't typically have one premium rate for all customers. Rather they ask you specific questions, like: your age, your gender, are you a smoker/non-smoker, etc. Then they go back and compare what you have told them with how long people of your age, gender, smoker-status, tend to live from historical data. Obviously if you are statistically likely to live longer, your premium should be lower than someone who is statistically likely to live for a shorter period.

But returning to our case, rather than taking into account age, smoking, gender, etc we are going to analyze our data at a granular level and find the drivers that affect package volumes. We are then going to build a predictive model, which helps us determine what our volumes are statistically likely to be tomorrow, and this will become an important reference piece of information for our resource planning for the next day.

So far all this sounds good. In fact it makes a lot of sense. Rather than making general, unscientific assumptions about our volumes, we will take a scientific approach to it. However, it is about this point that most executives at transportation companies instinctively start coming up with reasons that while this may work in other industries, it wouldn't work in their industry.

We don't really have much data. An insurance company can ask all those questions. All we have really is 'date and number of packages shipped'.

If I had a dollar for every industry that thought their data was different, this blog would be being written from a super yacht in the Mediterranean. Yes - it is a fact that every industry is different, but this fact does not mean there is not more data there than you would initially imagine, nor that there are not patterns in that data which can be exploited. For example. Most people look at a date field like '03/26/2011' and see one piece of information. Actually you would be surprised how much information we can extract out of a seemingly innocuous field like '03/26/11':

  • Year (2011)
  • Quarter (First)
  • Month (March)
  • Day of Month (26)
  • Day of Week (Saturday)
  • Week of Year (12)
  • Week of Quarter (12)
  • Week of Month (3)
  • Public Holiday (No)
  • Weekend (Yes)
  • Days Since Last Public Holiday (36)
  • Days Until Next Public Holiday (31)
  • Days Until End of Month (5)
  • Season (Spring) (or if in Southern Hemisphere, Autumn)
So there you go, we have miraculously transformed our date field into 14 pieces of more granular information. More importantly the granularity allows us to consider which things have correlation to package volumes much more than the '03/11/2011' ever did alone. An executive may intuitively observe:

  • We do tend to transport more at the end of the month, when our clients are doing end of month promotions.
  • There is a great deal of seasonality in our volumes.
  • Quarter ends are especially busy times.
  • We ship nothing on public holidays.
  • Fridays are always busy.
 So that was just the date. Now let's look at the next thing historical package volumes:

Let's say our info says: 03/26/2011  5,920 Packages Shipped to 3,450 Locations

Let's see what we can do with that:

  • Volume today.
  • Volume yesterday.
  • Volume today - volume yesterday.
  • Volume one week ago.
  • Volume today - volume one week ago.
  • Volume one month ago.
  • Volume today - volume one month ago.
  • Volume one year ago
  • Volume today - volume one year ago.
  • Etc
As you can see these things are also likely to be pertinent, as they will capture and reflect growth trends, etc.

We can also look at making our data more granular by extracting data at a customer level.

Finally we also have data in our arsenal, that we do not even own -third party provided data. Examples of this are:

  • Economic confidence data
  • GDP data
  • Unemployment data
  • Stock market data
  • Building permit data (if, for example, we ship primarily construction products)
The point of all of the above is not to prescribe exactly what data we should be throwing into the mix, but rather to encourage executives to think about which data may have a correlation to their volumes, and to illustrate that they have a lot more to work with than they may have imagined at first glance.

Okay - I concede we may have the data. But how could we possibly analyze that data and exploit it without a team of PhD.s in statistics or mathematics, or whatever it is?!

Relax. There are companies like 11Ants Analytics that do this sort of thing - and things infinitely more complicated - every day. They will walk you through the whole process analyze the data, and even customize their software solution for you so that you can integrate it into your business operations.

But we could never do our scheduling based upon what a black box program told us to do - what if it was wrong?!

The same could be said for the speedometer on your car - sometimes you just want to trust that the system is doing its job. However that being said there may be events that crop up on the day that render the prediction made 24 hours prior obsolete. This is not a big deal, you create your best scientific guess 24 hours out, and then you use it as an important frame of reference, which can always be over-ridden if the evidence requires this. You are still likely to have more resource scheduled correctly, than you would have if forecasting less scientifically.

Another consideration is that there may be trends that occur during the day that can also be modeled in a similar fashion, that serve as an 'early warning system' . Effectively when we see spikes in volumes exceeding what we forecast during the day, we can even model what this may mean today has in store for us. Even these inputs a few hours earlier are more useful to us than at the very last minute.

It is absolutely impossible that we can forecast volumes 100% accurately using predictive analytics.


Correct. You certainly will not - these are statistical predictions. However keep in mind, all we are looking for is improvement. The question to ask yourself is - by using analytics techniques like this, could we possibly get 1% better at forecasting? 5% better? 10% better? The answer will be different for everyone, but as you've seen above, you don't need to experience a huge improvement in forecasting to start saving some serious money - and you certainly don't need to be predicting at 100% accuracy, simply predicting better than we are now.

It just sounds like a lot of work on our end, extracting the data preparing it, etc!

Probably not as much as you would imagine. However regardless, this decision needs to be evaluated on a straightforward ROI basis. As one would expect savings of this magnitude are going to require some investment, but the justification for the effort should not be too hard demonstrate. We are probably going to have to look pretty hard, elsewhere, to find savings that equate to pulling 1% - 5% out of our variable cost base.

If you have any questions about any of this, feel free to drop me an email.

No comments:

Post a Comment