Rules

container_cpu_usage_is_high

59.932s ago

7.597ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_CPU_IS_HIGH expr: sum by(container, pod, namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) * 100 > 90 for: 1m labels: severity: critical annotations: description: Container {{ $labels.container }} CPU usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} summary: POD {{ $labels.pod}} CPU Usage is high in {{ $labels.namespace}} ok 59.932s ago 7.582ms

container_memory_usage_is_high

10.501s ago

11.89ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_MEMORY_USAGE_IS_HIGH expr: (sum by(container, pod, namespace) (container_memory_working_set_bytes{container!=""}) / sum by(container, pod, namespace) (container_spec_memory_limit_bytes > 0) * 100) > 80 for: 1m labels: severity: critical annotations: description: |- Container Memory usage is above 80% VALUE = {{ $value }} LABELS = {{ $labels }} summary: Container {{ $labels.container }} Memory usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} ok 10.501s ago 11.88ms

node_cpu_greater_than_80

45.189s ago

1.235ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_CPU_IS_HIGH expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} cpu is high summary: node cpu is greater than 80 precent ok 45.189s ago 1.222ms

node_disk_space_too_low

40.312s ago

1.022ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DISK_SPACE_IS_LOW expr: (100 * ((node_filesystem_avail_bytes{fstype!="rootfs",mountpoint="/"}) / (node_filesystem_size_bytes{fstype!="rootfs",mountpoint="/"}))) < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.node }} disk space is only {{ printf "%0.2f" $value }}% free. summary: node disk space remaining is less than 10 percent ok 40.312s ago 1.011ms

node_down

54.75s ago

440us

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DOWN expr: up{component="node-exporter"} == 0 for: 3m labels: severity: warning annotations: description: '{{ $labels.job }} job failed to scrape instance {{ $labels.instance }} for more than 3 minutes. Node Seems to be down' summary: Node {{ $labels.kubernetes_node }} is down ok 54.75s ago 427.6us

node_memory_left_lessser_than_10

5.465s ago

1.125ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_MEMORY_LESS_THAN_10% expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} memory left is low summary: node memory left is lesser than 10 precent ok 5.465s ago 1.11ms

Front50-cache

23.506s ago

369.1us

Rule State Error Last Evaluation Evaluation Time
alert: front50:storageServiceSupport:cacheAge__value expr: front50:storageServiceSupport:cacheAge__value > 300000 for: 2m labels: severity: warning annotations: description: front50 cacheAge for {{$labels.pod}} in namespace {{$labels.namespace}} has value = {{$value}} summary: front50 cacheAge too high ok 23.506s ago 346.1us

autopilot-component-jvm-errors

57.818s ago

6.348ms

Rule State Error Last Evaluation Evaluation Time
alert: jvm-memory-filling-up-for-oes-audit-client expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="auditclient"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="auditclient"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 57.818s ago 1.847ms
alert: jvm-memory-filling-up-for-oes-autopilot expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="autopilot"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 57.817s ago 989.4us
alert: jvm-memory-filling-up-for-oes-dashboard expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="dashboard"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 57.816s ago 787.7us
alert: jvm-memory-filling-up-for-oes-platform expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="platform"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="platform"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 57.815s ago 708.8us
alert: jvm-memory-filling-up-for-oes-sapor expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="sapor"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="sapor"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 57.815s ago 1.082ms
alert: jvm-memory-filling-up-for-oes-visibility expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="visibility"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="visibility"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 57.814s ago 879us

autopilot-component-latency-too-high

4.56s ago

9.274ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="auditclient"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="auditclient"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 4.56s ago 1.681ms
alert: oes-autopilot-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="autopilot"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="autopilot"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 4.558s ago 1.339ms
alert: oes-dashboard-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="dashboard"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="dashboard"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 4.557s ago 1.246ms
alert: oes-platform-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="platform"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="platform"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 4.556s ago 3.065ms
alert: oes-sapor-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="sapor"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="sapor"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 4.553s ago 1.254ms
alert: oes-visibility-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="visibility"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="visibility"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 4.552s ago 641.9us

autopilot-scrape-target-is-down

26.18s ago

1.842ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-scrape-target-is-down expr: up{component="auditclient"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-audit-client scrape target is down ok 26.18s ago 513.1us
alert: oes-autopilot-scrape-target-is-down expr: up{component="autopilot"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-autopilot scrape target is down ok 26.18s ago 201.3us
alert: oes-dashboard-scrape-target-is-down expr: up{component="dashboard"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-dashboard scrape target is down ok 26.18s ago 187.9us
alert: oes-platform-scrape-target-is-down expr: up{component="platform"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-platform scrape target is down ok 26.179s ago 308.3us
alert: oes-sapor-scrape-target-is-down expr: up{component="sapor"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-sapor scrape target is down ok 26.179s ago 336.6us
alert: oes-visibility-scrape-target-is-down expr: up{component="visibility"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-visibility scrape target is down ok 26.179s ago 255.3us

igor-needs-attention

27.311s ago

525.8us

Rule State Error Last Evaluation Evaluation Time
alert: igor-needs-attention expr: igor:pollingMonitor:itemsOverThreshold__value > 0 labels: severity: crtical annotations: description: Igor in namespace {{$labels.namespace}} needs human help summary: Igor needs attention ok 27.311s ago 512.1us

jvm-too-high

29.871s ago

3.051ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_rw:jvm:memory:used__value) / sum by(instance, area) (clouddriver_rw:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-rw JVM memory too high ok 29.871s ago 752.5us
alert: clouddriver-ro-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_ro:jvm:memory:used__value) / sum by(instance, area) (clouddriver_ro:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-ro JVM memory too high ok 29.871s ago 342.3us
alert: clouddriver-caching-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_caching:jvm:memory:used__value) / sum by(instance, area) (clouddriver_caching:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-caching JVM memory too high ok 29.87s ago 346us
alert: gate-pod-may-be-evicted-soon expr: (sum by(instance, area) (gate:jvm:memory:used__value) / sum by(instance, area) (gate:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: gate JVM memory too high ok 29.87s ago 225.8us
alert: orca-pod-may-be-evicted-soon expr: (sum by(instance, area) (orca:jvm:gc:liveDataSize__value) / sum by(instance, area) (orca:jvm:gc:maxDataSize__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: orca JVM memory too high ok 29.87s ago 196.8us
alert: igor-pod-may-be-evicted-soon expr: (sum by(instance, area) (igor:jvm:memory:used__value) / sum by(instance, area) (igor:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: igor JVM memory too high ok 29.87s ago 242.4us
alert: echo-scheduler-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_scheduler:jvm:memory:used__value) / sum by(instance, area) (echo_scheduler:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-scheduler JVM memory too high ok 29.87s ago 336.2us
alert: echo-worker-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_worker:jvm:memory:used__value) / sum by(instance, area) (echo_worker:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-worker JVM memory too high ok 29.87s ago 294.4us
alert: front50-pod-may-be-evicted-soon expr: (sum by(instance, area) (front50:jvm:memory:used__value) / sum by(instance, area) (front50:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Front50 JVM memory too high ok 29.87s ago 251.7us

kube-api-server-is-down

57.288s ago

438.3us

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-down expr: up{job="kubernetes-apiservers"} == 0 for: 2m labels: severity: critical annotations: description: Kubernetes API Server service went down LABELS = {{ $labels }} summary: Kube API Server job {{ $labels.job }} is down ok 57.288s ago 427.4us

kubernetes-api-server-experiencing-high-error-rate

38.174s ago

298ms

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-errors expr: sum(rate(apiserver_request_total{code=~"^(?:5..)$",job="kubernetes-apiservers"}[2m])) / sum(rate(apiserver_request_total{job="kubernetes-apiservers"}[2m])) * 100 > 3 for: 2m labels: severity: critical annotations: description: |- Kubernetes API server is experiencing high error rate VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes API server errors (instance {{ $labels.instance }}) ok 38.174s ago 298ms

latency-too-high

26.97s ago

5.073ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-ro-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__total{service="spin-clouddriver-ro"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__count_total{service="spin-clouddriver-ro"}[5m])) > 1 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.971s ago 726.5us
alert: clouddriver-rw-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__total{service="spin-clouddriver-rw"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__count_total{service="spin-clouddriver-rw"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is ({{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.97s ago 539.4us
alert: clouddriver-caching-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__total{service="spin-clouddriver-caching"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__count_total{service="spin-clouddriver-caching"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.97s ago 379.2us
alert: clouddriver_ro_deck-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.97s ago 347.1us
alert: gate-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__total{service="spin-gate"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__count_total{service="spin-gate"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.97s ago 378.5us
alert: orca-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__total{service="spin-orca"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__count_total{service="spin-orca"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 398.8us
alert: igor-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__total{service="spin-igor"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__count_total{service="spin-igor"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 382.1us
alert: echo_scheduler-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__total{service="spin-echo-scheduler"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__count_total{service="spin-echo-scheduler"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 387.7us
alert: echo_worker-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__total{service="spin-echo-worker"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__count_total{service="spin-echo-worker"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 348.5us
alert: front50-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__total{service="spin-front50"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__count_total{service="spin-front50"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 364.9us
alert: fiat-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__total{service="spin-fiat"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__count_total{service="spin-fiat"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 367.9us
alert: rosco-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__total{service="spin-rosco"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__count_total{service="spin-rosco"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 26.969s ago 376.1us

orca-queue-issue

23.041s ago

941.5us

Rule State Error Last Evaluation Evaluation Time
alert: orca-queue-depth-high expr: (sum by(instance) (orca:queue:ready:depth__value{namespace!=""})) > 10 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Orca queue depth is high ok 23.041s ago 455.4us
alert: orca-queue-lag-high expr: sum by(instance, service, namespace) (rate(orca:controller:invocations__total[2m])) / sum by(instance, service, namespace) (rate(orca:controller:invocations__count_total[2m])) > 0.5 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} has Lag value of {{$value}} summary: Orca queue lag is high ok 23.041s ago 467.7us

prometheus-job-down

4.28s ago

509.1us

Rule State Error Last Evaluation Evaluation Time
alert: prometheus-job-is-down expr: up{job="prometheus"} == 0 for: 5m labels: severity: warning annotations: description: Default Prometheus Job is Down LABELS = {{ $labels }} summary: The Default Prometheus Job is Down (job {{ $labels.job}}) ok 4.28s ago 494.8us

spinnaker-service-is-down

27.782s ago

2.981ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-rw"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-rw Spinnaker service is down ok 27.783s ago 494us
alert: clouddriver-ro-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro Spinnaker service is down ok 27.782s ago 218.9us
alert: clouddriver-caching-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-caching"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-caching Spinnaker service is down ok 27.782s ago 302.3us
alert: clouddriver-ro-deck-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro-deck"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro-deck Spinnaker service is down ok 27.782s ago 279.2us
alert: gate-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-gate"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Gate Spinnaker services is down ok 27.782s ago 207.2us
alert: orca-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-orca"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Orca Spinnaker service is down ok 27.782s ago 183.1us
alert: igor-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-igor"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Igor Spinnaker service is down ok 27.782s ago 197.5us
alert: echo-scheduler-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-scheduler"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-Scheduler Spinnaker service is down ok 27.782s ago 220us
alert: echo-worker-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-worker"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-worker Spinnaker service is down ok 27.781s ago 199.6us
alert: front50-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-front50"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Front50 Spinnaker service is down ok 27.781s ago 214.1us
alert: fiat-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-fiat"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Fiat Spinnaker service is down ok 27.781s ago 224.5us
alert: rosco-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-rosco"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Rosco Spinnaker service is down ok 27.781s ago 168us

volume-is-almost-full (< 10% left)

34.823s ago

2.491ms

Rule State Error Last Evaluation Evaluation Time
alert: pvc-storage-full expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10 for: 2m labels: severity: warning annotations: description: |- Volume is almost full (< 10% left) VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes Volume running out of disk space for (persistentvolumeclaim {{ $labels.persistentvolumeclaim }} in namespace {{$labels.namespace}}) ok 34.823s ago 2.443ms