Category Archives: Social Network Analysis

Combining Data to Weight Social Connections in an Organisation

In the above diagram red nodes are from the division of the organisation under study; green and blue are from two other divisions and grey nodes are uncategorised or central functions.

I’ve previously described determining a ‘score’ for social connections taking data from email, meetings, directory and timesheets. The question is how to combine them to produce as complete a picture as possible from the data at hand. I’m fairly sure the best answer is not to simply add the scores together but I’ve not found any guidance that would help do anything except that so that’s exactly what I have done…add them up. To summarise the score (or more correctly weight) given to each edge is made from:

1 point for each email exchanged (where there are a maximum of 10 recipients)
1 point for each minute in a one-to-one meeting, reducing rapidly as the number of attendees in the meeting increases
300 points for being in a manages/managed by relationship
1 point for every hour spent on a project divided by the number of people on the project

I’ve pulled this data together over the following periods:

Email: 6 months
Meetings: 2 years, 3 months
Corporate Directory: 6 months (but this is very slow to change so probably reflects the vast majority of the last 2 years)
Projects: 1 year

The coverage of the data also varies:

Email: for the core of the organisation being studied this is excellent as the data comes from Exchange Server Logs, for the periphery there is limited coverage as only emails being exchanged with the core are captured
Meetings: probably less than 50% as not all rooms are visible and not all meetings are booked in rooms; also teleconference information is not captured
Corporate Directory: very good, 90%+ but data is limited to corporate hierarchy
Timesheets: good but the system is not used universally as not everyone works on projects.

Some Observations using this approach:

Email dominates the structure of the network, the others add very little for those in the core; however for those outside the core the others provide additional insight into the structure.
There is overlap in these sources, for example we expect a manger will share emails with their reports and that people on a project will have meetings together but, as the coverage of each source is not compete, this is a small price to pay for seeing the whole network.

Despite the rather simplistic approach the results appear to work quite well but I’d love to hear from anyone who has implemented, or read about, a smarter way to combine these types of SNA sources.

Value in IIS Logs

2 Replies

Dependency Discovery

For organisations using Web Services on Microsoft Servers the IIS logs can prove a useful resource. Firstly it’s possible to build a dependency map showing which servers are dependent on services on a given server. Using the Gephi timeline feature it’s also possible to show how the traffic changes over the course of a day, or whatever period. The Gephi graph below shows data collected from a number of servers over an 18 day period. The edges have been weighted with a logarithm of the number of calls received per minute. The colours represent clusters detected by Gephi and not derived from any information about the server. Now you might think an IT department will know all the dependencies between servers; well maybe it should but this exercise did reveal a few surprises and even if it did not it is still a worthwhile exercise to validate dependency information.

Deviance Detection

Log files can be used to automatically create a baseline of ‘normal’ behaviour. This can then be compared with current behaviour and anomalies identified. A simplistic approach is to calculate an average of calls to a web server historically and then compare with the number of current calls. The chart, below, shows this for one server: the blue line is the average number of calls per minute of the day from days 1 to 17; the red line is the number of calls received each minute on day 18

Social Network Detection

All very interesting but can IIS logs help build a picture of Social Networks? Well I’m not sure as I’ve not tried but it lets you see who used what and when, well certainly for internal apps. People who use the same app around the same time or with similar usage patterns are probably doing a similar job so may know each other and, if they don’t, maybe they should.

Beyond Email: Meetings

4 Replies

The next data source you may have in your organisation also comes from Microsoft Exchange Server. If you use Exchange Server to book meeting rooms then this can be mined. As always what can be accessed will depend on your organisations privacy policies. In the organisation I describe here I have access to the calendars for well over half of the meeting rooms using my standard authentication credentials because I am allowed to book meetings in these rooms. Through the room calendar I can also see when other people have booked meetings; it’s not possible to see the meeting subject but it is possible to see a list of attendees. Unlike email I have accessed the meeting room calendars through the Exchange Server API; this is described by a number of others so I won’t reproduce it here, search for ‘Microsoft.Exchange.WebServices’ and ‘GetRoomLists’.

Meetings differ from email in that they are a many-to-many event rather than on-to-many. There will be a meeting organiser but this is often a PA so I do not give any special meaning to them. Just as with email I prefer to load data into a relational database first, the table structure is shown, below.

You’ll notice that the table attend_meeting has a field ‘score’; this table has an entry for every pair of attendees at the meeting but how to give each pair a score? Starting with the premise that a two-person meeting means each person is receiving the full attention of the other I need to find a way to reduce this score as the number of attendees increased and I found the following seemed to be a good fit:

score = minutes / ( n * ( n -1 ) / 2 ) where n = number of attendees

The table below shows the scores for a 60 minute meeting

Attendees, x=(n * (n -1)) / 2, minutes/x
2         1                    60
3         3                    20
4         6                    10
5          10                   6
6          15                   4
7          21                   3
8          28                   2
9          36                   2
10         45                   1

After 10 attendees the score is always set to 1

I found an interesting discussion of Dunbar’s Number in ‘Connected: The Amazing Power of Social Networks’ by Nicholas Christakis which suggests the maximum effective meeting size is 3.8 (OK let’s call it 4) which seems to support the fairly rapid degradation of the importance of a meeting (as a social network building tool) when the number of attendees increases. If you check out the book at Amazon http://www.amazon.co.uk/dp/0007303602/ and look at the preview you’ll see the discussion on page 249.

Side Effects: Making Friends with your Exchange Server Administrator

Robert Gimeno's Adventures in Data Science

Data everywhere but what can it tell us?

Category Archives: Social Network Analysis

Value in IIS Logs

Side Effects: Making Friends with your Exchange Server Administrator