Category Archives: Prediction

Prediction – Don’t Give Up

I previously described a dataset that was stubbornly refusing to yield any predictability. Using a larger set of the same data I realised that rather than trying to predict an individual opportunity to sell what was actually of interest was is a particular lead likely to become a customer (most people don’t buy multiple products of this type and there is often a number of interactions with each customer before a sale). By focussing on customer it became clear there were some noticeable distributions amongst the parameters. The dataset was very nicely split 50/50 between successful and non-successful opportunities and the best predictors were able to correctly classify almost 70% of cases. The most accurate predictor (highlighted) was the time between the first and ultimate opportunity with that customer – probably not the most useful as it could be argued as customer coming back after a reasonable period has thought about the product  and decided to proceed.

Weka Parameters 2012 Set

Whilst the mathematically best predictor might not be that useful in the business setting there are some other parameters that show noticeably useful predictive capabilities.

Prediction

Leaving behind email data for a while another set of data my organisation collects represents potential business. Using Weka I have started to investigate if conversion of the potential business to actual business can be predicted. I have a sample of 11,000 recent prior cases and know if they have led to business (‘converted’) or not. 41% converted and 59% did not. Using some summary level data I have tried a number of classification algorithms. Many of these reported approximately 60% correctly classified instances (which is not very good considering always predicting a ‘No’ is 59% correct). The best gave 70%-75% but on further investigation (of those where enough information was provided) this was due to over-fitting: this was particularly noticeable for rule-based algorithms which where generating up to a few-hundred rules.