Leaving behind email data for a while another set of data my organisation collects represents potential business. Using Weka I have started to investigate if conversion of the potential business to actual business can be predicted. I have a sample of 11,000 recent prior cases and know if they have led to business (‘converted’) or not. 41% converted and 59% did not. Using some summary level data I have tried a number of classification algorithms. Many of these reported approximately 60% correctly classified instances (which is not very good considering always predicting a ‘No’ is 59% correct). The best gave 70%-75% but on further investigation (of those where enough information was provided) this was due to over-fitting: this was particularly noticeable for rule-based algorithms which where generating up to a few-hundred rules.

