Tag Archives: privacy

Dealing with Sensitive Data (e.g. email Message Subject)

In previous entries about analysis email logs I mentioned the message subject which can, optionally, be included. Now this can be considered sensitive information and how you deal with this will depend on the organisation concerned. The organisation I’ve previously described decided to allow the message subject to be extracted but not stored as-is; instead it was agreed that the message subject would be hashed (a one-way encryption) and then stored. This is useful because it allows conversations to be tracked so that metrics like the average response time can be collected. There are a couple of other useful things to make the best of this approach:

  1. Before doing anything with the message subject turn the whole string into consistent case (upper or lower, your choice) otherwise “Hello” and “hello” give different hash values.
  2. Strip of the subject prefix (“RE:”, “FW:”) and do this repeatedly until none are left. Store the outermost prefix as-is (no hashing) and then hash and store the remainder of the message subject. In Social (Organisational) Network Analysis using email – Practical Architecture Part 1 the email table contains the fields ‘subject_prefix’ and ‘subject_hash’ – this is what these fields store.
  3. Base64 encode the hashed value otherwise you’ll run into trouble with escape characters.