So what is distributed tracing?

Distributed tracing is a way to monitor applications, especially those built with microservices. Distributed tracing helps you pinpoint where failures occur and the leading contributors to slow transactions or poor transaction performance when you analyze a map of the requests made to other services within the main transaction call.

Traces and spans

A trace is a single transaction request generated by a user or API call, which in turn uses other services and resources. The trace maps the execution path the request takes through the system. Each trace is composed of a varying number of child requests to other components within the system which must all be executed for the transaction to be successful.

A span is the primary building block of a distributed trace. Spans represent an individual unit of work done in a distributed system. Each request to a component within the distributed system contributes a span - a named, timed operation representing a piece of the workflow. The tracing instrumentation collects this transaction data (the timing, metadata, and other details), and assigns assigns a unique trace ID to each trace and a unique span ID to each span within the trace.

Depending on the instrumentation tools and methods chosen, spans may contain “references” to other spans, which allows multiple spans to be assembled into one complete trace - a visualization of the life of a request as it moves through various microservices in a distributed systems. The edges between the spans which make up a trace indicate parent/child relationships.

DevOps teams can use distributed tracing to monitor the requests in their system from a high-level, and then dig into specific requests as issues arise.

Each span is defined by its:

  • Operation name
  • Start timestamp, and span duration
  • Set of span tags: These key:value pair tags enable user-defined span annotations. You can use these tags to query, filter, and understand trace data
  • Span Logs, which include information about the exceptions or error that occured in the request
Example

A service with a unique ID executes within the system and calls microservices A through E. Each microservice request returns a response.

Microservice A is the edge service at which the transaction execution begins. This service first calls microservice B, and then calls microservice E after microservice B begins to execute. Microservice B calls microservices C and D.

Logz.io visualizes and maps application requests as they execute across microservices. We represent the microservice calls by time spans: Each bar in the image below is a span.

All of these spans together make up the full trace. services to spans

Further reading