Overview

logzio-otel-k8s-metrics allows you to ship metrics from your Kubernetes cluster to Logz.io with the OpenTelemetry collector. For AKS clusters, this chart also allows you to ship Windows node metrics.

This chart is a fork of the opentelemetry-collector Helm chart. The main repository for Logz.io helm charts are logzio-helm.

It is also dependent on the kube-state-metrics, prometheus-node-exporter and prometheus-pushgateway charts, which are installed by default.

To disable the dependency during installation, set kubeStateMetrics.enabled, nodeExporter.enabled or pushGateway.enabled to false.

For applications that run on Kubernetes, enable the Prometheus scrape feature:

prometheus.io/scrape: true
Sending logs from nodes with taints

If you want to ship logs from any of the nodes that have a taint, make sure that the taint key values are listed in your in your daemonset/deployment configuration as follows:

tolerations:
- key: 
  operator: 
  value: 
  effect: 

To determine if a node uses taints as well as to display the taint keys, run:

kubectl get nodes -o json | jq ".items[]|{name:.metadata.name, taints:.spec.taints}"

Standard configuration for Linux nodes

Add logzio-helm repo to your helm repo list
  helm repo add logzio-helm https://logzio.github.io/logzio-helm
  helm repo update
Deploy the Helm chart
  1. Configure the relevant parameters in the following code:

    helm install --namespace <<YOUR-NAMESPACE>>  \
    --set secrets.MetricsToken=<<METRICS-SHIPPING-TOKEN>> \
    --set secrets.ListenerHost="https://<<LISTENER-HOST>>:8053" \
    --set secrets.p8s_logzio_name=<<ENV-TAG>> \
    logzio-otel-k8s-metrics logzio-helm/logzio-otel-k8s-metrics
    
    • Replace <<YOUR_NAMESPACE>> with the required namespace.

    • Replace <<METRICS-SHIPPING-TOKEN>> with a token for the Metrics account you want to ship to.
      Look up your Metrics token.

    • Replace <<LISTENER-HOST>> with the host for your region. For example, listener.logz.io if your account is hosted on AWS US East, or listener-nl.logz.io if hosted on Azure West Europe.

    • Replace <<ENV-TAG>> with the name for the environment’s metrics, to easily identify the metrics for each environment.

  2. Run the code.

Check Logz.io for your metrics

Give your metrics some time to get from your system to ours.

Log in to your Logz.io account and navigate to the current instructions page inside the Logz.io app. Install the pre-built dashboard to enhance the observability of your metrics.

To view the metrics on the main dashboard, log in to your Logz.io Metrics account, and open the Logz.io Metrics tab.

Standard configuration for Windows nodes

Add logzio-helm repo to your helm repo list
  helm repo add logzio-helm https://logzio.github.io/logzio-helm
  helm repo update
Update your Windows node pool credentials (if needed)

To extract and scrape metrics from Windows nodes, you need to install a Windows Exporter service on the node host. To do this, you need to establish a SSH connection to the node by authenticating with a username and password. The windows-exporter-installer job performs the installation on each Windows node using the provided Windows credentials. The default username for Windows node pool is azureuser.

If your Windows node pool does not share the same username and password across the nodes, you will need to run the windows-exporter-installer job for each node pool using the relevant credentials. You can change your Windows node pool password in AKS cluster with the following command:

   az aks update \
   --resource-group $RESOURCE_GROUP \
   --name $CLUSTER_NAME \
   --windows-admin-password $NEW_PW
  • Replace RESOURCE_GROUP with the resource group name.
  • Replace CLUSTER_NAME with the cluster name.
  • Replace NEW_PW with the password selected for this pool.
Deploy the Helm chart
  1. Configure the relevant parameters in the following code:

    helm install --namespace <<YOUR-NAMESPACE>>  \
    --set secrets.MetricsToken=<<METRICS-SHIPPING-TOKEN>> \
    --set secrets.ListenerHost="https://<<LISTENER-HOST>>:8053" \
    --set secrets.p8s_logzio_name=<<ENV-TAG>> \
    --set secrets.windowsNodeUsername=<<WINDOWS-NODE-USERNAME>> \
    --set secrets.windowsNodePassword=<<WINDOWS-NODE-PASSWORD>> \
    logzio-otel-k8s-metrics logzio-helm/logzio-otel-k8s-metrics
    
    • Replace <<YOUR_NAMESPACE>> with the required namespace.

    • Replace <<METRICS-SHIPPING-TOKEN>> with a token for the Metrics account you want to ship to.
      Look up your Metrics token.

    • Replace <<LISTENER-HOST>> with the host for your region. For example, listener.logz.io if your account is hosted on AWS US East, or listener-nl.logz.io if hosted on Azure West Europe.

    • Replace <<ENV-TAG>> with the name for the environment’s metrics, to easily identify the metrics for each environment.

    • Replace <WINDOWS-NODE-USERNAME>> with the username for the Node pool you want the Windows Exporter to be installed on.

    • Replace <<WINDOWS-NODE-PASSWORD>> with the password for the Node pool you want the Windows exporter to be installed on.

  2. Run the code.

Check Logz.io for your metrics

Give your metrics some time to get from your system to ours.

Log in to your Logz.io account and navigate to the current instructions page inside the Logz.io app. Install the pre-built dashboard to enhance the observability of your metrics.

To view the metrics on the main dashboard, log in to your Logz.io Metrics account, and open the Logz.io Metrics tab.

Customizing Helm chart parameters

Configure customization options

You can use the following options to update the Helm chart parameters:

  • Specify parameters using the --set key=value[,key=value] argument to helm install

  • Edit the values.yaml

  • Overide default values with your own my_values.yaml and apply it in the helm install command.

Example:
helm install logzio-otel-k8s-metrics logzio-helm/logzio-otel-k8s-metrics -f my_values.yaml 
Customize the metrics collected by the Helm chart

The default configuration uses the Prometheus receiver with the following scrape jobs:

  • Cadvisor: Scrapes container metrics
  • Kubernetes service endpoints: These jobs scrape metrics from the node exporters, from Kube state metrics, from any other service for which the prometheus.io/scrape: true annotaion is set, and from services that expose Prometheus metrics at the /metrics endpoint.

To customize your configuration, edit the config section in the values.yaml file.

Uninstalling the Chart

The uninstall command is used to remove all the Kubernetes components associated with the chart and to delete the release.

To uninstall the logzio-otel-k8s-metrics deployment, use the following command:

helm uninstall logzio-otel-k8s-metrics

This section contains some guidelines for handling errors that you may encounter when trying to collect Kubernetes metrics.

Problem: Permanent error - context deadline exceeded

The following error appears:

Permanent error: Post \"https://<<LISTENER-HOST>>:8053\": context deadline exceeded
meaning that the post request timeout.

Possible cause - Connectivity issue

A connectivity issue may be causing this error.

Suggested remedy

Check your shipper’s connectivity as follows.

For macOS and Linux, use telnet to make sure your log shipper can connect to Logz.io listeners.

As of macOS High Sierra (10.13), telnet is not installed by default. You can install telnet with Homebrew by running brew install telnet.

Run this command from the environment you’re shipping from, after adding the appropriate port number:

telnet listener.logz.io {port-number}

For Windows servers running Windows 8/Server 2012 and later, run the following command in PowerShell:

Test-NetConnection listener.logz.io -Port {port-number}

The port numbers are 8052 and 8053.

Possible cause - Service exposing the metrics need more time

A service exposing the metrics may need more time to send the response to the OpenTelemetry collector.

Suggested remedy

Increase the OpenTelemetry collector timeout as follows.

In values.yaml,under: config: receivers: prometheus: config: global: scrape_timeout: <<timeout time>>.

Problem: Incorrect listener and/or token

You may be using an incorrect listener and/or token.

You will need to look in the logs of a pod whose name contains otel-collector.

Possible cause - The token is not valid

In the logs, for the token the error will be: "error": "Permanent error: remote write returned HTTP status 401 Unauthorized; err = <nil>: Shipping token is not valid".

Possible cause - The listener is not valid

For the Url the error will be: "error": "Permanent error: Post \"https://liener.logz.io:8053\": dial tcp: lookup <<provided listener>> on <<ip>>: no such host".

Suggested remedy

Check that the listener and token of your account are correct. You can view them in the Manage tokens section.

Problem: Windows nodes error

Possible cause - Incorrect username and/or password for Windows nodes

You may be using an incorrect username and/or password for Windows nodes.

You will need to look in the logs of the windows-exporter-installer pod. The error will look like this: INFO:paramiko.transport:Authentication (password) failed.

ERROR:root:SSH connection to node aksnpwin000002 failed, please check username and password.

Suggested remedy

Ensure the username and password to Windows nodes are correct.

Problem: Invalid helm chart version

Possible cause - The version of the helm chart is not up to date

The helm chart version that you are using may have expired.

Suggested remedy

Update the helm chart by running:

helm repo update

Problem: The prometheusremotewrite exporter timeout

When checking the Logz.io app you don’t see any metrics, or you only see some of your metrics, but when checking your otel-collector pod for logs, you don’t see any errors. This might indicate this issue.

Possible cause - The timeout in prometheusremotewrite exporter too short

The timeout setting in the prometheusremotewrite exporter is too short.

Suggested remedy

Increase the timeout setting in the prometheusremotewrite exporter.

For example, if our timeout setting is 5s:

endpoint: ${LISTENER_URL}
      timeout: 5s
      external_labels:
        p8s_logzio_name: ${P8S_LOGZIO_NAME}
      headers:
        Authorization: "Bearer ${METRICS_TOKEN}"

You can increase it to 20s:

endpoint: ${LISTENER_URL}
      timeout: 20s
      external_labels:
        p8s_logzio_name: ${P8S_LOGZIO_NAME}
      headers:
        Authorization: "Bearer ${METRICS_TOKEN}"

Problem: Permanent error - log state shows as waiting

The log shows the following:

State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137

Possible cause

Insufficient memory allocated to the pod.

Suggested remedy

In values.yaml, increase the memory of the standaloneCollector resources by approximately 100Mi.

For example, if you are using 512Mi:

standaloneCollector:
  enabled: true

  containerLogs:
    enabled: false

  resources:
    limits:
      cpu: 256m
      memory: 512Mi

You can increase it as much as needed. In this example, it’s 612Mi:

standaloneCollector:
  enabled: true

  containerLogs:
    enabled: false

  resources:
    limits:
      cpu: 256m
      memory: 612Mi

When running apps on Kubernetes

You need to make sure that the prometheus.io/scrape is set to true:

prometheus.io/scrape: true