Background

As cloud services become increasingly sophisticated organizations monitor them to ensure optimal performance and reliability. As the mantra goes ‘If it can be measured it can be managed’. They instrument their services with metrics, logs and traces to ensure optimal health of the products they provide their customers.

Customer Pain

Monitoring costs are high and the signal to noise ratio can be low. Users struggle with finding the needle in the haystack when there is an incident with the services they provide their customers.  The complexity of the data they are observing and and the overall costs of the the data they are streaming and storing are a significant challenge.  There is a recognition that there is a lot of waste here and strong desire to drive efficiencies.

Proposal

The UX at its core solved for two key aspects; First enable as much trust in the recommendations the service presented to the user.  Secondly get the user to a high enough level of confidence that the recommendations they selected would help them achieve their goals without breaking their user’s workflows.

Research

The initial research on an acceptable user mental model aligned with the backend structure of analysis generating recommendations that the user reviewed and curated to be fed back into the system.

Design System Components

A significant part of the work for the Adaptive Metrics project resulted in a more sophisticated workflow than the Design System components could accommodate.  The core of the user interaction in the GUI revolves around parsing a a tabular version of the recommendations json file, making informed decisions and then batch editing the table based on conclusions from the presented data.  That work is now formalised in the design system & storybook components.

Validation

We started a collaborative design & testing process with Driver Customers, i.e. organizations that were representative of the broader market we were targeting.  In parallel the backend squad built out the underlying service to support those workflows.

I identified a gap in the process where Frontend and Backend development teams were not collaborating as effectively as they should.  We invested more effort and thought into the backend API so that it was more aligned with the interactions the users expected from the frontend.

Monitor

With advanced customers providing initial requirements through their usage of the API and CLI, and new custoemr usage monitored on the GUI we were able to derive prioritized requirements for continuous improvement of the product.

Much of the feedback confirmed our initial research findings. As the product and usage evolved we increasingly heard qualitative feedback nebulous around how do we get users to fully trust recommendation.  We emphasized calculations around ‘tell me what this might do before I press the button” and a healthy undo stack.