Coralogix

Service Health & Investigation

Coralogix is an observability platform that helps engineering teams monitor the health of their applications in real time.
The goal of this project was to give users an immediate and intuitive understanding of their system’s health, reducing time to identify issues.

My Role

Solo:

Product designer

User Research

UX

UI

Engineers face high cognitive load when diagnosing issues across fragmented views, struggling to determine whether failures originate in their own service or elsewhere, leading to slower decisions and reduced confidence.

Understanding investigation patterns

I observed 12 engineers during active incident responses and mapped their navigation patterns. The analysis revealed three critical friction points:

Signal clarity
Teams monitored only 15-20% of available metrics, but couldn't filter noise effectively

Ownership clarity40% of investigation time was spent identifying which team owned each failing service

Faster decisions
Engineers made a lot of context switches per investigation, resetting understanding each time

Introducing the health concept

We created a unified health model that organizes services from worst to best health, eliminating the need to scan across multiple metrics manually. This single view surfaces critical issues immediately.
MacBook mockup

Dependency visualization

The dependency map shows upstream and downstream relationships directly within the service context. Engineers see which services could be causing failures without switching to a separate diagram view.
"Who is responsible for these errors?"
Dashboard mockup

Timeline as investigation entry point, no tab switching

Tabs were eliminated entirely. The timeline became the primary entry point, showing event history and health changes over time. All investigation layers are accessible through contextual navigation, preserving state as engineers move between views.

Investigation time
reduced from hours to minutes

Engineers reported faster identification of failing services and clearer understanding of system health.
The unified context reduced cognitive load during incidents and improved confidence in decision-making under pressure.

"I can finally trace a problem end-to-end without losing track of where I was. The interface guides me to the answer instead of making me hunt for it."