Microsoft Azure Incident Management System

IcM is a term describing the activities of an organization to identify, analyze, and correct hazards to prevent a future re-occurrence. If not managed, an incident can escalate into an emergency, crisis or a disaster.

The first goal of the incident management process is to restore a normal service operation as quickly as possible and to minimize the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained.

ClientMicrosoftServicesITSM, ITIL, Incident ManagementYear2017, 2018

THE PROBLEM
Microsoft Azure has a few thousand services that are linked in a complex hierarchy within its organization across the globe. These services are used both internally and externally and their outages can cause major disruptions to its customers.

The current IcM solutions have a few issues:
1. Lack of context about the incident.
2. No suggestions on how to fix the issues based on similar incidents
3. No prioritization or severity levels.
4. No way to escalate the incident to a crisis.
5. No visibility to the leadership teams.

THE SOLUTION
1. Improving communications between teams, making information to resolve major incidents discoverable and cognitive using machine learning
2. Managing on call solutions across multiple teams with ease
3. Enhanced analytics to identify bottlenecks in mitigating incidents as well as creating a reusable pattern of solutions to fix outages
4. Setup operational rules that would monitor and inform of potential outages in real-time, giving the user a pre-emptive way of resolving them before they occur

MY ROLE
Senior UX Designer
IcM incident creation, transfer, mitigation, post mortem, on call, smart assistant, rules management, outages.

I was the lead designer for this product and employed the following methods:

  • Contextual walkthrough to learn about how Directly Responsible Individuals (DRIs) resolve incidents
  • Evaluation of ITIL methodologies to identify best practices and UX opportunities.
  • Collaboration with key stakeholders and input from SME’s to identify Microsoft service processes and third party integration points.
  • Execution of complex interactive prototypes to simulate system interactions, incident resolutions, and configuration.
  • Worked towards a new visual language (Fluent) informed from existing design patterns.
  • Rigorous user research and testing to validate visual metaphors and conceptual frameworks.

TOOLS
Sketch (Wireframes and Visual Design), InVision (Prototype)

Smart Assistant (ML Feature)

Privacy Preference Center