--- # Source: postiz/charts/postgres-17-cluster/templates/prometheus-rule.yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: postiz-postgresql-17-alert-rules namespace: postiz labels: helm.sh/chart: postgres-17-cluster-6.16.1 app.kubernetes.io/name: postiz-postgresql-17 app.kubernetes.io/instance: postiz app.kubernetes.io/part-of: postiz app.kubernetes.io/version: "6.16.1" app.kubernetes.io/managed-by: Helm spec: groups: - name: cloudnative-pg/postiz-postgresql-17 rules: - alert: CNPGClusterBackendsWaitingWarning annotations: summary: CNPG Cluster a backend is waiting for longer than 5 minutes. description: |- Pod {{ $labels.pod }} has been waiting for longer than 5 minutes expr: | cnpg_backends_waiting_total > 300 for: 1m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterDatabaseDeadlockConflictsWarning annotations: summary: CNPG Cluster has over 10 deadlock conflicts. description: |- There are over 10 deadlock conflicts in {{ $labels.pod }} expr: | cnpg_pg_stat_database_deadlocks > 10 for: 1m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterHACritical annotations: summary: CNPG Cluster has no standby replicas! description: |- CloudNativePG Cluster "{{`{{`}} $labels.job {{`}}`}}" has no ready standby replicas. Your cluster at a severe risk of data loss and downtime if the primary instance fails. The primary instance is still online and able to serve queries, although connections to the `-ro` endpoint will fail. The `-r` endpoint os operating at reduced capacity and all traffic is being served by the main. This can happen during a normal fail-over or automated minor version upgrades in a cluster with 2 or less instances. The replaced instance may need some time to catch-up with the cluster primary instance. This alarm will be always trigger if your cluster is configured to run with only 1 instance. In this case you may want to silence it. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHACritical.md expr: | max by (job) (cnpg_pg_replication_streaming_replicas{namespace="postiz"} - cnpg_pg_replication_is_wal_receiver_up{namespace="postiz"}) < 1 for: 5m labels: severity: critical namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterHAWarning annotations: summary: CNPG Cluster less than 2 standby replicas. description: |- CloudNativePG Cluster "{{`{{`}} $labels.job {{`}}`}}" has only {{`{{`}} $value {{`}}`}} standby replicas, putting your cluster at risk if another instance fails. The cluster is still able to operate normally, although the `-ro` and `-r` endpoints operate at reduced capacity. This can happen during a normal fail-over or automated minor version upgrades. The replaced instance may need some time to catch-up with the cluster primary instance. This alarm will be constantly triggered if your cluster is configured to run with less than 3 instances. In this case you may want to silence it. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHAWarning.md expr: | max by (job) (cnpg_pg_replication_streaming_replicas{namespace="postiz"} - cnpg_pg_replication_is_wal_receiver_up{namespace="postiz"}) < 2 for: 5m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterHighConnectionsCritical annotations: summary: CNPG Instance maximum number of connections critical! description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" instance {{`{{`}} $labels.pod {{`}}`}} is using {{`{{`}} $value {{`}}`}}% of the maximum number of connections. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHighConnectionsCritical.md expr: | sum by (pod) (cnpg_backends_total{namespace="postiz", pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) / max by (pod) (cnpg_pg_settings_setting{name="max_connections", namespace="postiz", pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) * 100 > 95 for: 5m labels: severity: critical namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterHighConnectionsWarning annotations: summary: CNPG Instance is approaching the maximum number of connections. description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" instance {{`{{`}} $labels.pod {{`}}`}} is using {{`{{`}} $value {{`}}`}}% of the maximum number of connections. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHighConnectionsWarning.md expr: | sum by (pod) (cnpg_backends_total{namespace="postiz", pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) / max by (pod) (cnpg_pg_settings_setting{name="max_connections", namespace="postiz", pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) * 100 > 80 for: 5m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterHighReplicationLag annotations: summary: CNPG Cluster high replication lag description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" is experiencing a high replication lag of {{`{{`}} $value {{`}}`}}ms. High replication lag indicates network issues, busy instances, slow queries or suboptimal configuration. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterHighReplicationLag.md expr: | max(cnpg_pg_replication_lag{namespace="postiz",pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) * 1000 > 1000 for: 5m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterInstancesOnSameNode annotations: summary: CNPG Cluster instances are located on the same node. description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" has {{`{{`}} $value {{`}}`}} instances on the same node {{`{{`}} $labels.node {{`}}`}}. A failure or scheduled downtime of a single node will lead to a potential service disruption and/or data loss. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterInstancesOnSameNode.md expr: | count by (node) (kube_pod_info{namespace="postiz", pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) > 1 for: 5m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterLongRunningTransactionWarning annotations: summary: CNPG Cluster query is taking longer than 5 minutes. description: |- CloudNativePG Cluster Pod {{ $labels.pod }} is taking more than 5 minutes (300 seconds) for a query. expr: |- cnpg_backends_max_tx_duration_seconds > 300 for: 1m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterLowDiskSpaceCritical annotations: summary: CNPG Instance is running out of disk space! description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" is running extremely low on disk space. Check attached PVCs! runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterLowDiskSpaceCritical.md expr: | max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"} / kubelet_volume_stats_capacity_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"})) > 0.9 OR max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-wal"} / kubelet_volume_stats_capacity_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-wal"})) > 0.9 OR max(sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_used_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-tbs.*"}) / sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_capacity_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-tbs.*"}) * on(namespace, persistentvolumeclaim) group_left(volume) kube_pod_spec_volumes_persistentvolumeclaims_info{pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"} ) > 0.9 for: 5m labels: severity: critical namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterLowDiskSpaceWarning annotations: summary: CNPG Instance is running out of disk space. description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" is running low on disk space. Check attached PVCs. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterLowDiskSpaceWarning.md expr: | max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"} / kubelet_volume_stats_capacity_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"})) > 0.7 OR max(max by(persistentvolumeclaim) (1 - kubelet_volume_stats_available_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-wal"} / kubelet_volume_stats_capacity_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-wal"})) > 0.7 OR max(sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_used_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-tbs.*"}) / sum by (namespace,persistentvolumeclaim) (kubelet_volume_stats_capacity_bytes{namespace="postiz", persistentvolumeclaim=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$-tbs.*"}) * on(namespace, persistentvolumeclaim) group_left(volume) kube_pod_spec_volumes_persistentvolumeclaims_info{pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"} ) > 0.7 for: 5m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterOffline annotations: summary: CNPG Cluster has no running instances! description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" has no ready instances. Having an offline cluster means your applications will not be able to access the database, leading to potential service disruption and/or data loss. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterOffline.md expr: | (count(cnpg_collector_up{namespace="postiz",pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"}) OR on() vector(0)) == 0 for: 5m labels: severity: critical namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterPGDatabaseXidAgeWarning annotations: summary: CNPG Cluster has a number of transactions from the frozen XID to the current one. description: |- Over 300,000,000 transactions from frozen xid on pod {{ $labels.pod }} expr: | cnpg_pg_database_xid_age > 300000000 for: 1m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterPGReplicationWarning annotations: summary: CNPG Cluster standby is lagging behind the primary. description: |- Standby is lagging behind by over 300 seconds (5 minutes) expr: | cnpg_pg_replication_lag > 300 for: 1m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterReplicaFailingReplicationWarning annotations: summary: CNPG Cluster has a replica is failing to replicate. description: |- Replica {{ $labels.pod }} is failing to replicate expr: | cnpg_pg_replication_in_recovery > cnpg_pg_replication_is_wal_receiver_up for: 1m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster - alert: CNPGClusterZoneSpreadWarning annotations: summary: CNPG Cluster instances in the same zone. description: |- CloudNativePG Cluster "postiz/postiz-postgresql-17-cluster" has instances in the same availability zone. A disaster in one availability zone will lead to a potential service disruption and/or data loss. runbook_url: https://github.com/cloudnative-pg/charts/blob/main/charts/cluster/docs/runbooks/CNPGClusterZoneSpreadWarning.md expr: | 3 > count(count by (label_topology_kubernetes_io_zone) (kube_pod_info{namespace="postiz", pod=~"postiz-postgresql-17-cluster-([1-9][0-9]*)$"} * on(node,instance) group_left(label_topology_kubernetes_io_zone) kube_node_labels)) < 3 for: 5m labels: severity: warning namespace: postiz cnpg_cluster: postiz-postgresql-17-cluster