When working with email there are a few things that can trip you up, here are some tips for avoiding them:
- Always turn email addresses into a consistent case; I prefer lower but the choice is yours. Oh and get rid of any leading or training spaces, you shouldn’t get any from the Exchange message tracking logs but make sure by trimming anyway.
- Use an integer ID as primary NOT the email address; email addresses can change over time and there will often be duplicates; using a key other than the address makes it easier to merge addresses when duplicates are detected (through an email aliases table)
- Ignoring broadcast emails: sometimes you may see an email sent from the CEO to everyone in the organisation – is this really indicative of a relationship, probably not. In fact any emails sent to more than a small group probably don’t give much indication of a social tie. There are a couple of options:
- Ignore emails sent to more than n people; what in is up to you, I’d say around 10
- Use a formula to exponentially reduce the social network significance assigned to an email as the number of recipients increases. I’ll say more about this approach when I discuss some other sources of data.
- Ignoring system/technical accounts: you might see emails sent from non-personal accounts, e.g. “[email protected]” and these should probably be ignored as they are usually just a broadcast of information revealing no social ties. How do you spot them? If you are lucky then they may not conform to the same pattern as personal emails (e.g. “[email protected] .com” versus “[email protected]”) or you’ll have to construct a list; in my experience both were used, the pattern match caught most but there were a number of exceptions that had to go into a list, you just have to keep an eye out for them.
The Exchange server logs contain a message size. I have not yet found any use for this in understanding the social network but it’s useful to have when making friends with the Exchange server administrator, see my next article!