Deploying a Monitoring stack with HA

Prerequisites

A fully functional working Kubernetes cluster
Two object storage buckets: one for Logs data, used by Loki, and one for Metrics data, used by Thanos

Configuring Loki for HA

Launchpad uses the loki-distributed release for setting up Loki, which can be configured according to its values interface (as seen here)

Note: The example setups we'll show will be based on an architecture that makes use of the following components: querier, distributor, ingester, queryFrontend, gateway, compactor, ruler, indexGateway. Different architectures are possible so adjust to your needs as necessary.

For an HA setup, deploying several components with multiple replicas each, loki-distributed values can be set like in the following example snippet:

querier:
  replicas: 2
  maxUnavailable: 1
distributor:
  replicas: 3
  maxUnavailable: 2
ingester:
  replicas: 3
  maxUnavailable: 2
queryFrontend:
  replicas: 2
  maxUnavailable: 1
gateway:
  replicas: 2
  maxUnavailable: 1
compactor:
  kind: Deployment
  replicas: 1
  enabled: true
ruler:
  enabled: true
  replicas: 2
  maxUnavailable: 1
indexGateway:
  enabled: true
  replicas: 2
  maxUnavailable: 1
loki:
  structuredConfig:
    ruler:
      ring:
        kvstore:
          store: memberlist
    ingester:
      lifecycler:
        ring:
          replication_factor: 2

Note: If you use a compactor, only one will run at a time and it's not critical so you don't really need more than one instance of it.

Besides increasing the number of replicas, ingester replication_factor is of particular relevance as the Distributor will distribute the write load to multiple ingesters and will require a quorum of them to have acknowledged the write (replication_factor / 2 + 1). For lowering the chances of loosing logs, a replication_factor of at least two should be used (Loki default is 3).

Loki's storage fundamentally requires object storage, regardless of whether HA is used or if there's more than one replica for any component, as multiple components need to share this storage.

Object storage can be setup as shown in the following snippet:

loki:
  structuredConfig:
    storage_config:
      tsdb_shipper:
        active_index_directory: /var/loki/data/tsdb-index
        cache_location: /var/loki/data/tsdb-cache
        index_gateway_client:
          # only applicable if using microservices where index-gateways are independently deployed.
          # This example is using kubernetes-style naming.
          server_address: dns:///loki-loki-distributed-index-gateway.monitoring.svc.cluster.local:9095
        shared_store: s3
      aws:
        bucketnames: <<bucket>>
        endpoint: <<endpoint>>
        region: <<region>>
        access_key_id: "${S3_ACCESS_KEY_ID}"
        secret_access_key: "${S3_SECRET_ACCESS_KEY}"
        insecure: false
        sse_encryption: false
        s3forcepathstyle: true
    schema_config:
      configs:
        # New TSDB schema below
        - from: "2024-01-01"
          index:
            period: 24h
            prefix: index_
          object_store: s3
          schema: v12
          store: tsdb
    query_scheduler:
      # the TSDB index dispatches many more, but each individually smaller, requests.
      # We increase the pending request queue sizes to compensate.
      max_outstanding_requests_per_tenant: 32768
    querier:
      max_concurrent: 16
    compactor:
      shared_store: s3
querier:
  extraArgs:
    - -config.expand-env=true
  extraEnv:
    - name: S3_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_ACCESS_KEY_ID
    - name: S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_SECRET_ACCESS_KEY
distributor:
  extraArgs:
    - -config.expand-env=true
  extraEnv:
    - name: S3_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_ACCESS_KEY_ID
    - name: S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_SECRET_ACCESS_KEY
ingester:
  extraArgs:
    - -config.expand-env=true
  extraEnv:
    - name: S3_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_ACCESS_KEY_ID
    - name: S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_SECRET_ACCESS_KEY
compactor:
  extraArgs:
    - -config.expand-env=true
  extraEnv:
    - name: S3_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_ACCESS_KEY_ID
    - name: S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: <<bucket-secret>>
          key: S3_SECRET_ACCESS_KEY

Here we are setting up object storage in the structuredConfig section, but to keep the credentials secret, we are adding env vars from secrets to several components and an extra command line argument -config.expand-env=true, the purpose of which is being able to use ENV vars in the structuredConfig section. With that argument, the components will replace the values such as ${S3_ACCESS_KEY_ID} by the corresponding ENV var value upon processing the config.

Besides setting up object storage, we're also configuring TSDB index schema in substitution of the default boltdb-shipper, which is a more recent and more efficient alternative to it. Doing so is not mandatory but recommended.

Putting it all together and adding a few more standard options such as persistence (PVC) to some components, and enabling ServiceMonitor and Prometheus Rules, a Launchpad Monitoring namespace helmfile.yaml Loki config could look like:

helmfiles:
  - path: git::https://github.com/graphops/launchpad-namespaces.git@monitoring/helmfile.yaml?ref=monitoring-stable/latest
    selectorsInherited: true
    values:
    - features: [ metrics, logs]
      loki:
        values:
          loki:
            structuredConfig:
              ingester:
                # Disable chunk transfer which is not possible with statefulsets
                # and unnecessary for boltdb-shipper
                max_transfer_retries: 0
                chunk_idle_period: 1h
                chunk_target_size: 1536000
                max_chunk_age: 1h
              storage_config:
                tsdb_shipper:
                  active_index_directory: /var/loki/data/tsdb-index
                  cache_location: /var/loki/data/tsdb-cache
                  index_gateway_client:
                    # only applicable if using microservices where index-gateways are independently deployed.
                    # This example is using kubernetes-style naming.
                    server_address: dns:///loki-loki-distributed-index-gateway.monitoring.svc.cluster.local:9095
                  shared_store: s3
                aws:
                  bucketnames: <<bucket>>
                  endpoint: <<endpoint>>
                  region: <<region>>
                  access_key_id: "${S3_ACCESS_KEY_ID}"
                  secret_access_key: "${S3_SECRET_ACCESS_KEY}"
                  insecure: false
                  sse_encryption: false
                  s3forcepathstyle: true
              schema_config:
                configs:
                  # New TSDB schema below
                  - from: "2024-01-01"
                    index:
                      period: 24h
                      prefix: index_
                    object_store: s3
                    schema: v12
                    store: tsdb
              query_scheduler:
                # the TSDB index dispatches many more, but each individually smaller, requests.
                # We increase the pending request queue sizes to compensate.
                max_outstanding_requests_per_tenant: 32768
              querier:
                # Each `querier` component process runs a number of parallel workers to process queries simultaneously.
                # You may want to adjust this up or down depending on your resource usage
                # (more available cpu and memory can tolerate higher values and vice versa),
                # but we find the most success running at around `16` with tsdb
                max_concurrent: 16
              compactor:
                shared_store: s3
              ruler:
                ring:
                  kvstore:
                    store: memberlist
                rule_path: /tmp/loki/scratch
                alertmanager_url: http://kube-prometheus-stack-alertmanager:9093
                external_url: <<your alertmanager external URL>>

          querier:
            replicas: 2
            maxUnavailable: 1
            extraArgs:
              - -config.expand-env=true
            extraEnv:
              - name: S3_ACCESS_KEY_ID
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_ACCESS_KEY_ID
              - name: S3_SECRET_ACCESS_KEY
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_SECRET_ACCESS_KEY
          distributor:
            replicas: 3
            maxUnavailable: 2
            extraArgs:
              - -config.expand-env=true
            extraEnv:
              - name: S3_ACCESS_KEY_ID
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_ACCESS_KEY_ID
              - name: S3_SECRET_ACCESS_KEY
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_SECRET_ACCESS_KEY
          ingester:
            replicas: 3
            maxUnavailable: 2
            persistence:
              enabled: true
              inMemory: false
              claims:
                - name: data
                  size: 10Gi
            extraArgs:
              - -config.expand-env=true
            extraEnv:
              - name: S3_ACCESS_KEY_ID
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_ACCESS_KEY_ID
              - name: S3_SECRET_ACCESS_KEY
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_SECRET_ACCESS_KEY
          queryFrontend:
            replicas: 2
            maxUnavailable: 1
          gateway:
            replicas: 2
            maxUnavailable: 1
          compactor:
            kind: Deployment
            replicas: 1
            enabled: true
            extraArgs:
              - -config.expand-env=true
            extraEnv:
              - name: S3_ACCESS_KEY_ID
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_ACCESS_KEY_ID
              - name: S3_SECRET_ACCESS_KEY
                valueFrom:
                  secretKeyRef:
                    name: <<bucket-secret>>
                    key: S3_SECRET_ACCESS_KEY
          ruler:
            enabled: true
            replicas: 2
            maxUnavailable: 1
          indexGateway:
            enabled: true
            replicas: 2
            maxUnavailable: 1
          serviceMonitor:
            enabled: true
          prometheusRule:
            enabled: true
            namespace: monitoring

We've also added setting up Ruler with Alertmanager's endpoint (which can be deployed by the Monitoring Namespace as well, as will be seen in the Metrics section).

Setting up a Prometheus Stack with HA

Thanos

For an HA Prometheus Stack we'll need Thanos which is not yet part of the Monitoring Namespace, so we'll start by going over how to deploy it with Launchpad.

Thanos requires object storage so a bucket (and credentials) will be needed. To deploy Thanos we're going to use bitnami's thanos chart, and we'll deploy that with Launchpad as in the following example helmfile:

repositories:
- name: bitnami
  url: https://charts.bitnami.com/bitnami
          
releases:
  - name: thanos
    namespace: monitoring
    createNamespace: true
    chart: bitnami/thanos
    version: ~12.20
    missingFileHandler: Warn
    values:
    - existingObjstoreSecret: <<thanos-objstore-secret>>
      query:
        replicaCount: 2
        dnsDiscovery:
          sidecarsService: "prometheus-operated"
          sidecarsNamespace: "monitoring"
        replicaLabel:
          - prometheus_replica
      queryFrontend:
        enabled: true
        replicaCount: 2
      compactor:
        enabled: true
        persistence:
          enabled: true
        retentionResolutionRaw: 30d
        retentionResolution5m: 30d
        retentionResolution1h: 10y
      storegateway:
        enabled: true
        replicaCount: 2
        persistence:
          enabled: true
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
      prometheusRule:
        enabled: true

Warning: Never try to run more than one instance of compactor. If your object storage does not support locking, it will lead to error states.

Where we added the bitnami repository, and a release to deploy the Thanos chart from that repository. From the values used in this example, notice the query.dnsDiscovery and query.replicaLabel keys, as those values need to match the ones used in Thanos Prometheus sidecar, deployed in the kube-prometheus-stack release with the Monitoring Namespace.

There is one extra thing needed for Thanos, a secret with the bucket credentials as referred previously with <<thanos-objstore-secret>>. That secret need to have a key called objstore.yml, and its value content should be yaml and have keys like these:

type: S3
config: 
  endpoint: <<endpoint>>
  bucket: <<bucket name>>
  bucket_lookup_type: path 
  insecure: false
  access_key: <<access_key>>
  secret_key: <<secret_key>>

bucket_lookup_type can be auto, path or virtual_host, and you would want to use path for Ceph Object Storage. You can check here all the available options.

Adding a secret like that with Launchpad and Sealed Secrets will add a release like so:

releases:
  - name: thanos-objstore-secret
    namespace: monitoring
    chart: graphops/resource-injector
    values:
      - resources:
          thanos-objstore-secret:
            apiVersion: bitnami.com/v1alpha1
            kind: SealedSecret
            metadata:
              name: thanos-objstore-secret
              namespace: monitoring
            spec:
              encryptedData:
                objstore.yml: <<SealedSecrets Encrypted Data>>

The last remaining required Thanos component, the prometheus sidecar, will be deployed with kube-prometheus-stack so keep reading.

Prometheus Stack

There are three components we want to focus on, Prometheus, Grafana and Alertmanager. We'll start with adjusting Alertmanager's config for HA which is the simplest:

alertmanager:
  alertmanagerSpec:
    replicas: 3

This will change our setup to a 3 replica Alertmanager and that's all that's required.

For grafana we have the added requirement of changing from the default embedded sqlite database to a shared database like Postgres.

So let's start by adding a release to create a Postgres database with the Postgres-Operator (from Launchpad Postgres-Operator Namespace). As an example:

releases:
  - name: grafana-database
    namespace: monitoring
    createNamespace: true
    chart: graphops/resource-injector
    version: 0.2.0
    missingFileHandler: Warn
    values:
      - resources:
          grafana-database:
            apiVersion: "acid.zalan.do/v1"
            kind: postgresql
            metadata:
              name: grafana-database
              teamId: "pg"
              numberOfInstances: 2
              users:
                grafana:
                  - superuser
                  - createdb
              enableMasterLoadBalancer: false
              enableReplicaLoadBalancer: false
              enableConnectionPooler: false
              enableReplicaConnectionPooler: false
              databases:
                grafana: grafana
              postgresql:
                version: "15"
                parameters: {}
              volume:
                size: 1Gi
                storageClass: <<your_storage_class>>
              resources:
                requests:
                  cpu: 250m
                  memory: 1Gi
                limits:
                  cpu: 1000m
                  memory: 4Gi
              patroni:
                initdb:
                  encoding: "UTF8"
                  locale: "C"
                pg_hba:
                  - local   all             all                                   trust
                  - hostssl all             +zalandos    127.0.0.1/32       pam
                  - host    all             all                127.0.0.1/32       md5
                  - hostssl all             +zalandos    ::1/128            pam
                  - host    all             all                ::1/128            md5
                  - local   replication     standby                    trust
                  - hostssl replication     standby all                md5
                  - hostnossl all           all                all                md5
                  - hostssl all             +zalandos    all                pam
                  - hostssl all             all                all                md5
              podAnnotations:
                coa.zalan.do/auto-create-database: "true"

Having that database, adjusting grafana values setup can be achieved like so:

grafana:
  replicas: 2
  envValueFrom:
    DATABASE_PASSWORD:
      secretKeyRef:
        name: grafana.grafana-database.credentials.postgresql.acid.zalan.do
        key: password
  sidecar:
    datasources:
      url: http://thanos-query-frontend:9090
      createPrometheusReplicasDatasources: false
  grafana.ini:
    database:
      type: postgres
      host: grafana-database.monitoring.svc:5432
      name: grafana
      user: grafana
      password: "$__env{DATABASE_PASSWORD}"

Finally, we need to adjust Prometheus to increase replicas and use Thanos sidecar.

A workable set of values for accomplishing that looks like:

prometheus:
  prometheusSpec:
    replicas: 2
    shards: 1
    thanos:
      objectStorageConfig:
        existingSecret:
          name: <<thanos-objstore-secret>>
          key: objstore.yml
    replicaExternalLabelName: prometheus_replica
  thanosService:
    enabled: true
  thanosServiceMonitor:
    enabled: true

Notice the sidecar will be configured to use the same secret provisioned before for Thanos, and the replicaExternalLabelName matches the value used before as well.

Taking all of this together, here's an example of an helmfile that deploys Thanos and kube-prometheus-stack setting the most important values for HA:

repositories:
- name: bitnami
  url: https://charts.bitnami.com/bitnami

helmfiles:
  - path: git::https://github.com/graphops/launchpad-namespaces.git@monitoring/helmfile.yaml?ref=monitoring-stable/latest
    selectorsInherited: true
    values:
    - helmDefaults:
        <<: *helmDefaults
      features: [ metrics, logs]
      kube-prometheus-stack:
        values:
          kube-prometheus-stack:
            alertmanager:
              alertmanagerSpec:
                replicas: 3
            grafana:
              replicas: 2
              envValueFrom:
                DATABASE_PASSWORD:
                  secretKeyRef:
                    name: grafana.grafana-database.credentials.postgresql.acid.zalan.do
                    key: password
              sidecar:
                datasources:
                  url: http://thanos-query-frontend:9090
                  createPrometheusReplicasDatasources: false
              grafana.ini:
                database:
                  type: postgres
                  host: grafana-database.monitoring.svc:5432
                  name: grafana
                  user: grafana
                  password: "$__env{DATABASE_PASSWORD}"
            prometheus:
              prometheusSpec:
                replicas: 2
                shards: 1
                thanos:
                  objectStorageConfig:
                    existingSecret:
                      name: <<thanos-objstore-secret>>
                      key: objstore.yml
                replicaExternalLabelName: prometheus_replica
              thanosService:
                enabled: true
              thanosServiceMonitor:
                enabled: true

releases:
  - name: thanos-objstore-secret
    namespace: monitoring
    chart: graphops/resource-injector
    values:
      - resources:
          thanos-objstore-secret:
            apiVersion: bitnami.com/v1alpha1
            kind: SealedSecret
            metadata:
              name: thanos-objstore-secret
              namespace: monitoring
            spec:
              encryptedData:
                objstore.yml: <<SealedSecrets Encrypted Data>>

  - name: thanos
    namespace: monitoring
    createNamespace: true
    chart: bitnami/thanos
    version: ~12.20
    missingFileHandler: Warn
    values:
    - existingObjstoreSecret: <<thanos-objstore-secret>>
      query:
        replicaCount: 2
        dnsDiscovery:
          sidecarsService: "prometheus-operated"
          sidecarsNamespace: "monitoring"
        replicaLabel:
          - prometheus_replica
      queryFrontend:
        enabled: true
        replicaCount: 2
      compactor:
        enabled: true
        persistence:
          enabled: true
        retentionResolutionRaw: 30d
        retentionResolution5m: 30d
        retentionResolution1h: 10y
      storegateway:
        enabled: true
        replicaCount: 2
        persistence:
          enabled: true
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
      prometheusRule:
        enabled: true

  - name: grafana-database
    namespace: monitoring
    createNamespace: true
    chart: graphops/resource-injector
    version: 0.2.0
    missingFileHandler: Warn
    values:
      - resources:
          grafana-database:
            apiVersion: "acid.zalan.do/v1"
            kind: postgresql
            metadata:
              name: grafana-database
              teamId: "pg"
              numberOfInstances: 2
              users:
                grafana:
                  - superuser
                  - createdb
              enableMasterLoadBalancer: false
              enableReplicaLoadBalancer: false
              enableConnectionPooler: false
              enableReplicaConnectionPooler: false
              databases:
                grafana: grafana
              postgresql:
                version: "15"
                parameters: {}
              volume:
                size: 1Gi
                storageClass: <<your_storage_class>>
              resources:
                requests:
                  cpu: 250m
                  memory: 1Gi
                limits:
                  cpu: 1000m
                  memory: 4Gi
              patroni:
                initdb:
                  encoding: "UTF8"
                  locale: "C"
                pg_hba:
                  - local   all             all                                   trust
                  - hostssl all             +zalandos    127.0.0.1/32       pam
                  - host    all             all                127.0.0.1/32       md5
                  - hostssl all             +zalandos    ::1/128            pam
                  - host    all             all                ::1/128            md5
                  - local   replication     standby                    trust
                  - hostssl replication     standby all                md5
                  - hostnossl all           all                all                md5
                  - hostssl all             +zalandos    all                pam
                  - hostssl all             all                all                md5
              podAnnotations:
                coa.zalan.do/auto-create-database: "true"

Deploying a Monitoring stack with HA

Prerequisites​

Configuring Loki for HA​

Setting up a Prometheus Stack with HA​

Thanos​

Prometheus Stack​

Prerequisites

Configuring Loki for HA

Setting up a Prometheus Stack with HA

Thanos

Prometheus Stack