Thursday, February 3, 2011

Toolkit: Visual Event Timeline

Skype suffered a major outage in December 2010.  The company provided on its blog a postmortem from the CIO, explaining the cause of the failure, describing how it was addressed in the short-term, and discussing plans to prevent it from happening again.  The company was forthcoming and provided sufficient information to give its customers an understanding of what occurred.

When a negative event like this happens, it is important to communicate well both during the event and after the failure is resolved.  Communications during the event keeps those impacted abreast of the situation so they can plan and make decisions.  Communications after the event are important to explain what happened, why it happened, and what has been done or will be done to prevent it from happening again (or lower the risk of it occurring again to a more acceptable level).

I have been asked to lead post-mortems or to step in to situations where a problem has occurred and needs to be fixed.  One technique in my toolkit is a visual event timeline.  I saw this technique from a colleague of mine a decade ago when he gave me an update on a problem that occurred in his area.  A combination of a visual event timeline and a detailed narrative can be an effective way to communicate.  I believe the addition of a visual event timeline in the Skype blog entry would have enhanced the communication of the outage event.

This post introduces you to the visual event timeline technique by:
  • Presenting the concepts of the technique.
  • Providing real-life examples.
  • Sharing some tips on creating a visual event timeline.
  • Applying it to the Skype outage event.

The examples have been altered to avoid disclosure of any confidential information.


The visual event timeline has a simple purpose:  visually depict along a timeline the series of key facts uncovered, actions taken, and observations noted to effectively communicate what occurred during an event.

A visual event timeline contains the following elements:
  • A timeline.  The timeline is presented as a horizontal or vertical line, bar, or rectangle.  The unit of time (for example, days, hours, minutes) displayed on the timeline is dependent on the event.  "Days" is appropriate when several days elapse between the time the event started and all issues were resolved.  For the Skype incident, "hours" is the more appropriate unit of time.
  • Key messages.  These are the most important messages that help communicate what occurred.  They can be a description of the situation ("the cluster of Skype support servers for offline instant messaging became overloaded"), an observation ("Windows clients running Skype version 5.0.0152 were not properly processing"), an action performed ("disabled the overloaded Skype servers and eliminated client requests to them"), or other facts considered important in explaining the event at a point in time.  A line or arrow aligns each key message with its appropriate point in time on the timeline.
  • Statistics.  These are data points, such as numbers or percentages, that provide additional information on the severity, magnitude, or status at a point in time.  For the Skype incident, the percentage of supernodes functioning properly or the number of customers affected would add relevant information.  This is optional and is dependent on the event and availability of statistics.
  • Icons.  These visually highlight missed opportunities, points of failures, or points of success.  Coloring the icons red can also be used to emphasize missed opportunities and failures.
  • Important lessons learned and action items.  For postmortems, the most important lessons learned and what has been done (or will be done) should be briefly noted.


I created this visual event timeline as part of a postmortem.

(Click image to enlarge)
In this example, the timeline is displayed horizontally in the middle of the page.  The units are in days as the event began on June 30 and was not fully resolved until July 21.

The key messages are listed above and below the timeline.  The items listed above the timeline tend to refer to actions or observations of a negative nature and the items listed below the timeline contain the interactions with a vendor (SAP), although this was not fully adhered to due to space limitations.  The action that started the event (human error) and the discrepancies are highlighted in red.

No statistics are provided for this event.

Icons used include an "X" for missed opportunities and darkened circles for points of failure.  The visual event timeline clearly highlights two missed opportunities:  it took six days before the error was detected, and five days were lost due to misdiagnosis of the error.  The happy face icons show points of progress and resolution.

Because this was a postmortem, key learnings and action items are highlighted in the lower left.

This visual event timeline was used in both short debriefings as well as a long debriefing with the AR teams and management affected by the event.

Below is another visual event timeline I created when asked to help resolve an issue that had existed for a while.  Creating a visual event timeline helped me get up to speed on the history.

(Click image to enlarge)
In this example, the issue was still in existence and was being worked on.  Statistics are shown to show the number of employees or contractors impacted during this time period.  The missed opportunities and points of failure are clearly noted.


Keep the following points in mind when preparing a visual events timeline.

Be clear.  The purpose is to effectively communicate.  Do not be vague.  Avoid jargon, acronyms, and abbreviations unless they are known and obvious to the intended audience.

Be honest.  There should not be any information on the visual event timeline that is inaccurate, misleading, half-truth, or opinion.

Keep it to one page.  The visual event timeline is a communication tool to highlight the key points.  It is designed to communicate well in a 5-10 minute debriefing as well as an hour conversation.  Details of the event should be contained in accompanying written documentation (suggestion:  use Information Mapping to prepare the written document).

In some cases, a portion of a timeline can be exploded out on a separate visual event timeline if it requires a different unit of time.  For example, if a critical 30-minute period of time contained lots of failures and missed opportunities, but the overall event lasted days or weeks, a separate timeline for the same event is helpful.  For example, the BP oil spill in the Gulf of Mexico could have an overall timeline and another timeline exploded out for the hours when the incident occurred.

Include only the key messages and statistics that matter.  Part of the skill of communicating effectively is knowing what to include and what not to include.  A lot of actions and observations occur during an event, but many of them are secondary to what is important.  Do not leave out important information, but do not include tangential information.  Remember that a detailed document should accompany the visual event timeline and can be referenced when discussing the event.


Below is the start of a visual event timeline for the December 2010 Skype outage based on information Skype provided on its blog.  It is not complete, and I made some assumptions simply for illustrative purposes.

(Click image to enlarge)

In creating the illustrative example from the blog entry and reading the details, I determined insufficient statistical information was available to highlight the number of impacted customers and the number of supernodes available.  I also learned that the event lasted more than 24 hours to restore all services and return to normal, so a four day time period would have been more accurate.  I do feel this type of visual event timeline, nicely formatted and worded for a general audience, would have added value and enhanced the communication.

Happy timelining!
Toolkit: Visual Event Timeline ~ DANIEL SKLAR