Skip to main content

Playbook Catalog

Reference for every action you can attach to an event playbook. Each action runs when its event fires, and its output is attached to the event as an evidence card the LLM uses for root-cause analysis. For the conceptual model, see Event Playbooks vs Workflows.

The catalog applies to events from any source — Prometheus / AlertManager rules, Datadog monitors, New Relic alert policies, Signoz, Chronosphere, AWS CloudWatch / GCP Cloud Monitoring / Azure Monitor alarms, PagerDuty incidents, generic webhooks, and workflows that mint events via events.store.

Conventions

Actions are grouped by category. The category is matched against the event subject — Pod actions surface in the alert UI when the subject is a Kubernetes pod, Service actions when the subject is a cloud resource or service. Actions in category All always surface.

Common conventions:

  • Every action accepts an optional title to override the evidence card title. Setting a title also adds it as an alias key in outputs / extracted_labels, which makes referencing the action's output from a later step much more readable.
  • Action parameters accept template expressions ({{ alert.labels.namespace }}, {{ extracted_labels['logs_0']['_series'] }}) so values can be derived from the event payload or earlier actions. See Templating & Best Practices for the full context, filters, and patterns.
  • Every action also supports the control parameters if, for_each, for_each_limit, for_each_on_limit_exceeded.
  • Source nudgebee = action runs in the Nudgebee server (no agent required). Source prometheus = action runs in the in-cluster Kubernetes agent.

Pod

Surfaced when the event subject is a Kubernetes pod.

pod_enricher

Adds structured pod metadata (containers, restarts, status). Useful for Jinja templating in conditional actions. Takes only the standard title.

logs_enricher

Streams logs from the alerting pod.

FieldTypeRequiredDefaultDescription
container_namestringNoSpecific container; defaults to the alerting one.
tail_linesintNo1000Lines to tail from the end of the log.
previousboolNofalseFetch logs from the previous container instance (useful after a restart).

pod_events_enricher

Adds Kubernetes events scoped to the pod.

FieldTypeRequiredDefaultDescription
max_eventsintNo8Cap on number of events.
included_typesstring[]No["Normal","Warning"]Event types to include.

report_crash_loop

Reports pods in CrashLoopBackOff. Takes only title.

pod_issue_investigator

Built-in heuristic investigator covering common pod failure modes. Takes only title.

pod_profiler

CPU or memory profile of a container.

FieldTypeRequiredDefaultDescription
profile_typestringYescpuOne of cpu, memory.
durationintYes60Profile duration in seconds.

pod_bash_enricher

Runs a bash command in the alerting pod.

FieldTypeRequiredDefaultDescription
bash_commandstringYesThe command to execute.

pod_graph_enricher_cpu / pod_graph_enricher_memory

Pod-level resource graph.

FieldTypeRequiredDefaultDescription
display_limitsboolNofalseOverlay the pod's resource limit on the graph.
graph_duration_minutesintYes60Window length.

pod_metric_enricher_cpu / pod_metric_enricher_memory

Same as the graph enrichers above, plus metric-derived insights (peaks, throttling). Same parameters.

oom_killer_enricher

Recent OOMKills with surrounding memory metrics.

FieldTypeRequiredDefaultDescription
new_oom_kills_duration_in_secintYes1200Lookback window for OOMKills (seconds).
metrics_duration_in_secsintYes1200Window for memory metrics (seconds).

impacted_services_enricher

Identifies downstream services impacted by a crashing pod.

FieldTypeRequiredDefaultDescription
delay_sintNo30Wait this many seconds before analysis to allow propagation.

Deployment

deployment_events_enricher

Events for the Deployment, optionally for its pods.

FieldTypeRequiredDefaultDescription
dependent_pod_modeboolNofalseWhen true, fetch events for the deployment's pods instead of the deployment itself.
max_podsintNo1When dependent_pod_mode is true, cap on pods inspected.

Node

node_cpu_enricher

Per-pod CPU breakdown for the node. Takes only title.

node_disk_analyzer

Disk usage across the node, sorted by pod.

FieldTypeRequiredDefaultDescription
show_podsboolNotrueInclude pod-level breakdown.
show_containersboolNofalseInclude container-level breakdown.

node_running_pods_enricher

Pods running on the node and their Ready status. Takes only title.

node_allocatable_resources_enricher

Allocatable CPU / memory / pods on the node. Takes only title.

node_status_enricher

Node conditions and overall status. Takes only title.

node_pods_capacity_enricher

Node pod-capacity and scheduling-constraint analysis. Takes only title.

cpu_overcommited_enricher / memory_overcommited_enricher

CPU / memory overcommit analysis on the node.

FieldTypeRequiredDefaultDescription
default_query_durationintNo600Lookback window for overcommit calculation, in seconds.

cluster_cpu_requests_enricher

Cluster-wide CPU requests over a duration.

FieldTypeRequiredDefaultDescription
default_query_durationintNo600Lookback window in seconds.

target_down_dns_silencer

Silences DNS-related target-down alerts. Takes only title.

node_semantic_version_mismatch_enricher

Detects mismatched Kubernetes versions across nodes. Takes only title. (Category: All — included here for grouping.)


Cluster

cluster_memory_requests_enricher

Cluster-wide memory requests over a duration.

FieldTypeRequiredDefaultDescription
default_query_durationintNo600Lookback window in seconds.

Job

job_events_enricher

Events for the Job.

FieldTypeRequiredDefaultDescription
max_eventsintNo8Cap on number of events.
included_typesstring[]No["Normal","Warning"]Event types to include.

job_pod_enricher

Adds the Job's pod, optionally with logs and events.

FieldTypeRequiredDefaultDescription
eventsboolNotrueInclude the pod's events.
logsboolNotrueInclude the pod's logs.

job_info_enricher

Detailed Job execution information. Takes only title.


DaemonSet / StatefulSet / PVC

daemonset_status_enricher

Pod distribution and health for the DaemonSet. Takes only title.

daemonset_misscheduled_analysis_enricher

Reports DaemonSet scheduling failures. Takes only title.

statefulset_replicas_enricher

Replica count and status for the StatefulSet. Takes only title.

prometheus_pvc_event_enricher

Recent PVC events from Prometheus. Takes only title.


Service (Cloud / APM)

Surfaced for events whose subject is a service or cloud resource. Use these for AWS / GCP / Azure alerts as well as APM-based alerts.

cloud_resource

Look up a resource on AWS / GCP / Azure.

FieldTypeRequiredDefaultDescription
service_namestringYese.g. EC2, RDS, Lambda, S3.
resource_typestringYesResource type to look up.
regionstringYesCloud region.
resource_id / resource_ids[]string / string[]NoSpecific resource(s) to fetch.
instance_id / instance_ids[]string / string[]NoFor instance-scoped lookups.
account_idstringNoCloud account override.

cloud_metrics

Query CloudWatch / Cloud Monitoring / Azure Monitor metrics.

FieldTypeRequiredDefaultDescription
service_namestringYesProvider service (e.g. EC2, RDS).
regionstringYesCloud region.
metric_name / metric_names[]string / string[]NoSpecific metric(s).
metric_namespacestringNoe.g. AWS/RDS.
resource_id / resource_ids[]string / string[]NoFilter to specific resource(s).
querystringNoProvider-native query expression.
statisticstringNoAverageAverage, Sum, Maximum, Minimum.
statistics[]string[]NoMultiple statistics.
dimension / dimensions[]object / object[]NoDimension filter(s).
stepstringNoResolution (e.g. 60s, 5m).
start_time / end_timestringNoevent windowRFC3339 timestamps.
account_idstringNoCloud account override.

cloud_logs

Query cloud-provider logs (CloudWatch Logs / Cloud Logging / Azure Logs). Alert type: log.

FieldTypeRequiredDefaultDescription
querystringNoProvider-native log query.
service_namestringNoe.g. RDS.
resource_idstringNoARN or resource name.
log_group_namestringNoLog group / log scope.
regionstringNoCloud region.
start_time / end_timestringNoevent windowRFC3339 timestamps.
account_idstringNoCloud account override.

cloud_service_map

Service map for a cloud resource.

FieldTypeRequiredDefaultDescription
service_namestringYesProvider service.
resource_idstringNoARN or resource name.
regionstringNoCloud region.
account_idstringNoCloud account override.

cloud_cli

Run an AWS / GCP / Azure CLI command on a configured cloud account.

FieldTypeRequiredDefaultDescription
account_idstringYesCloud account (rendered as a dropdown of configured accounts).
commandstringYesThe CLI invocation, e.g. aws ec2 describe-instances --filters Name=instance-state-name,Values=running.

ssh

Run a CLI command over SSH using a configured SSH integration.

FieldTypeRequiredDefaultDescription
commandstringYesShell command to execute.
host_namestringYesTarget host.
integration_namestringYesConfigured SSH integration.
user_namestringNointegration defaultOverride the integration's default user.
account_idstringNoAccount override.

metric_anomaly_enricher

Detects anomalies in a metric by comparing current values to a historical baseline.

FieldTypeRequiredDefaultDescription
namespacestringYesWorkload namespace.
deploymentstringYesWorkload name.
querystringYesPromQL query.
historical_window_hoursintNo168Baseline window (default = 7 days).
analysis_start_time / analysis_end_timestringNoevent windowAnalysis window in RFC3339.

traces_dependency_map

Builds a service-dependency map from traces.

FieldTypeRequiredDefaultDescription
service_namestringYesTarget service.
namespacestringNoKubernetes namespace.
durationstringNoe.g. 30m, 1h. Takes priority over start/end.
start_time / end_timestringNoevent windowRFC3339 timestamps.
label_filter[]object[]No{key, value, operator} filters to apply.
exclude_filters[]object[]NoSame shape, but excludes matches.
upstream_onlyboolNofalseWhen true, only show callers of the target service.

Account-Level Logs / Metrics / Traces

These actions resolve the configured observability provider for the event's account and execute against it. Use them when you don't care which provider is wired up — the same config will work whether logs go to Loki, Datadog, CloudWatch, etc.

logs

Query logs from the configured provider. Supports regex / label extraction so extracted values can be used as for_each arrays in subsequent actions. Alert type: log.

FieldTypeRequiredDefaultDescription
querystringNoProvider-native log query.
durationintNo-1Window in minutes (default uses the event window).
query_optionsobjectNo{}Provider-specific extra parameters.
regex_extractors[]object[]No{pattern, label_name} — pull values out of log bodies.
label_extractors[]object[]No{label_name, placeholder_name} — promote a log attribute to a label.
account_idstringNoAccount override.

metrics

Query metrics from the configured provider.

FieldTypeRequiredDefaultDescription
querystringYesProvider-native metrics query.
durationnumberNo15Window in minutes.
query_optionsobjectNo{}Provider-specific extras.
account_idstringNoAccount override.

traces

Query traces from the configured provider.

FieldTypeRequiredDefaultDescription
querystringYesTrace query.
durationnumberNo15Window in minutes.
query_optionsobjectNo{}Provider-specific extras.
account_idstringNoAccount override.

signoz_logs_enricher

Signoz logs with optional regex extraction. Alert type: log.

FieldTypeRequiredDefaultDescription
queryobjectYesSignoz log query (autocomplete-driven in the UI).
durationintNo15Window in minutes.
regex_extractors[]object[]No{pattern, label_name} extractors.

chronosphere_traces_enricher

Chronosphere traces with tag filters.

FieldTypeRequiredDefaultDescription
query_typestringYesTRACE_IDS or SERVICE_OPERATION.
servicestringNoRequired when query_type=SERVICE_OPERATION.
trace_idsstringNoComma-separated. Required when query_type=TRACE_IDS.
tag_filters[]object[]YesList of tag filters.
start_time / end_timestringNoevent windowRFC3339 timestamps.

Proxy Agent (Custom Data Collection)

These actions execute through a Nudgebee proxy agent running inside your network. Use them to query private databases, hit internal HTTP endpoints, or run shell commands on hosts the cloud-side server cannot reach directly. The datasource_id parameter is rendered as a dropdown of configured proxy integrations of the matching type (tenant-wide).

proxy_db_query

SQL query against any DB integrated via the proxy agent (PostgreSQL, MySQL, MSSQL, ClickHouse, Oracle). Intended for read-only investigation; register the datasource with a database user that only has SELECT privileges to enforce that. The proxy also accepts a per-datasource read_only flag that blocks separate db_execute calls.

FieldTypeRequiredDefaultDescription
datasource_idstringYesProxy DB integration to run against.
querystringYesSQL to execute.
databasestringNodatasource defaultOverride the default database.
max_rowsintNo1000Cap on rows returned.
timeout_msintNo30000Query timeout in milliseconds (clamped to 120000 by the proxy).

Tip. Attach proxy_db_query with a pg_stat_activity snapshot to a HighDBCPU alert and the running-query state lands on the event before the LLM analyses it. See Custom Data Collection.

proxy_http_request

HTTP request to an internal API reachable by the proxy (Grafana, Jenkins, custom health endpoints).

FieldTypeRequiredDefaultDescription
datasource_idstringYesProxy HTTP integration.
urlstringYesPath relative to the datasource's base URL.
methodstringNoGETOne of GET, POST, PUT, PATCH, DELETE.
headersobjectNo{}Extra request headers.
bodystringNoRequest body (for POST/PUT/PATCH).

proxy_ssh_command

Shell command on a remote server via SSH through the proxy.

FieldTypeRequiredDefaultDescription
datasource_idstringYesProxy SSH integration.
commandstringYesCommand to run.
timeout_msintNo30000Command timeout in milliseconds.

Notifications

notification_channel_join

Join an incident channel on Slack / Teams / Google Chat.

FieldTypeRequiredDefaultDescription
platformstringYesslackOne of slack, ms_teams, google_chat.
channel_idstringYesChannel to join.
incident_idstringYesUUID of the Nudgebee incident.
team_idstringNointegration defaultWorkspace / team override.
textstringNoOptional join message.

notification_channel_message

Post a message to a Slack / Teams / Google Chat channel.

FieldTypeRequiredDefaultDescription
platformstringYesslackOne of slack, ms_teams, google_chat.
channel_idstringYesChannel to post to.
incident_idstringYesUUID of the Nudgebee incident.
textstringYesMessage body.
team_idstringNointegration defaultWorkspace / team override.

Monitoring Rules / Integrations

prometheus_rules_enricher

Information about the Prometheus rule that fired the alert. Takes only title.

prometheus_enricher

Run one or more PromQL queries against the cluster's Prometheus.

FieldTypeRequiredDefaultDescription
instantboolNofalseInstant query instead of range.
promql_querystringNoA single PromQL expression.
promql_queries[]object[]NoMultiple named queries — {key, query}. Use this or promql_query.
stepstringNoResolution (15s, 1m, …).
durationobjectYes{duration_minutes: <n>} window.

Look up triggered Datadog monitors.

FieldTypeRequiredDefaultDescription
statusstringNoalertOne of alert, warn, no data.
envstringNoEnvironment tag filter.
servicestringNoService tag filter.
querystringNoCustom Datadog monitor query.
limitintNo30Page size (max 100).
durationintNo1Hours to look back.

alert_explanation_enricher

Pins a human-readable explanation and a suggested resolution to the event.

FieldTypeRequiredDefaultDescription
alert_explanationstringYesPlain-English explanation of what triggered the alert.
recommended_resolutionstringNoSuggested mitigation.

Alert Resource Graphs

alert_graph_enricher_cpu / alert_graph_enricher_memory / alert_graph_enricher_disk

Resource-usage graph for the alerting Pod or Node.

FieldTypeRequiredDefaultDescription
item_typestringYesPodPod or Node.
graph_duration_minutesintYes60Window length.

pod_node_metrics_enricher_memory

Node-level memory metrics for the alerting pod.

FieldTypeRequiredDefaultDescription
graph_duration_minutesintYes60Window length.

Performance Analysis

cpu_throttling_analysis_enricher

CPU throttling events for pods. Takes only title.


Custom Execution / Utility

These actions run arbitrary code or fetch arbitrary resources.

kubectl_command_executor

Run any kubectl command in the alert's cluster.

FieldTypeRequiredDefaultDescription
commandstringYesFull command, e.g. kubectl describe pod foo -n bar.

pod_script_run_enricher

Run a script against a pod (the alerting one or an ephemeral one).

FieldTypeRequiredDefaultDescription
commandstringYesScript to execute.
namestringNoContainer name.
imagestringNoImage to use.
secretstringNoImage-pull secret.
use_side_carboolNofalseRun as a side-car container.
ephemeralboolNofalseRun as an ephemeral pod.
pod_name / namespacestringNoalerting podOverride pod target.

custom_image_run_enricher

Run any container image as a one-shot enrichment job.

FieldTypeRequiredDefaultDescription
imagestringYesContainer image.
command[]string[]Noimage defaultCommand override.
args[]string[]NoArguments.
env_variablesobjectNo{}Environment variables map.
secretstringNoImage-pull secret.
config_mapstringNoConfigMap to mount.
image_pull_policystringNoIfNotPresentAlways or IfNotPresent.
service_accountstringNoService account to run under.

pg_health_enricher

Run predefined PostgreSQL health queries.

FieldTypeRequiredDefaultDescription
secret_namestringYesKubernetes secret with DB credentials.
secret_namespacestringYesNamespace of the secret.

pg_run_queries

Run a list of user-defined PostgreSQL queries.

FieldTypeRequiredDefaultDescription
queries[]string[]YesSQL statements to execute.
secret_namestringYesKubernetes secret with DB credentials.
secret_namespacestringYesNamespace of the secret.

get_resource_yaml

Fetch a Kubernetes resource YAML for debugging.

FieldTypeRequiredDefaultDescription
namestringYesResource name.
kindstringYesOne of Pod, Deployment, StatefulSet, DaemonSet, Job, CronJob, ReplicaSet, Service, ConfigMap, Secret, PersistentVolumeClaim.
namespacestringNoNamespace.

get_kubernetes_resource

Fetch deployment(s) by name or across all namespaces.

FieldTypeRequiredDefaultDescription
name[]string[]YesNames to fetch.
resource_typestringNodeploymentResource type.
namespace[]string[]NoSpecific namespaces.
all_namespacesboolNofalseFetch across all namespaces.

get_pod_resource

Fetch pod(s) by name, namespace, or owner.

FieldTypeRequiredDefaultDescription
name[]string[]NoPod names.
ownerstringNoPods by owner (Deployment / ReplicaSet / Job).
resource_typestringNopodResource type.
namespace[]string[]NoSpecific namespaces.
all_namespacesboolNofalseFetch across all namespaces.

resource_events_enricher

Nearby Kubernetes events for the alert resource.

FieldTypeRequiredDefaultDescription
runbook_namestringYesSource runbook (dropdown).
max_podsintNo1Cap on pods inspected when dependent_pod_mode is set.
dependent_pod_modeboolNofalseFetch events for the resource's pods rather than the resource.

resource_logs_enricher

Logs from a specific (parameterised) pod.

FieldTypeRequiredDefaultDescription
pod_namestringYesPod to fetch logs from.
namespacestringYesPod namespace.
container_namestringNoSpecific container.
since_secondsintNoLookback in seconds.
tail_linesintNo100Lines from the end.

Pods related to the alerting subject.

FieldTypeRequiredDefaultDescription
output_formatstringNotabletable or json.

text_enricher

Pin a free-form text message on the event.

FieldTypeRequiredDefaultDescription
textstringYesMessage body.
severitystringNoInfoInfo or Critical.

status_enricher

Resource status conditions, optionally with messages.

FieldTypeRequiredDefaultDescription
show_detailsboolNofalseInclude condition messages.

api_service_status_enricher / api_failure_enricher

API service status and API-failure analysis. Take only title.

api_traces_enricher

Traces of API requests around the alert.

FieldTypeRequiredDefaultDescription
durationintNo15Lookback in minutes.

hpa_mismatch_enricher

HPA configuration / scaling-policy mismatch analysis.

FieldTypeRequiredDefaultDescription
check_for_metrics_serverboolYestrueVerify metrics-server is installed and report otherwise.

nudgebee_playbook_trigger_enricher

Trigger another Nudgebee runbook on this event.

FieldTypeRequiredDefaultDescription
runbook_idstringYesRunbook to invoke (dropdown).

nubi_enricher

Trigger an LLM investigation with a custom prompt.

FieldTypeRequiredDefaultDescription
promptstringYesPrompt for the investigation.
titlestringYesTitle for the LLM analysis card.

Conditional & Iterative Control

Every action above accepts the following control parameters in addition to its own. They are evaluated against the event payload, previous action outputs, and extracted_labels.

FieldTypeDescription
iftemplate / boolRun the action only when the value is the string "true" (case-insensitive — "True" and "TRUE" also work) or boolean true. Anything else skips the action: "false", "False", the empty string, "yes", "1", plain text, etc. Use the gt / lt / gte / lte filters when you want a clean lowercase "true" / "false" you don't have to think about.
for_eachtemplate / arrayRun the action once per item; inside the action, {{ item }} is the current iteration value.
for_each_limitintCap on iterations (default 10).
for_each_on_limit_exceededstringWhat to do when the array is longer than for_each_limit. "warn" (default) logs a warning and truncates to the limit. "error" fails the action.

Example — data-driven evidence with for_each

A logs action with a regex extractor produces an array of distinct values; the next action runs once per value. The extracted_labels key follows the action's name + position — here logs_0 because the logs action is at index 0.

[
{
"logs": {
"title": "Error logs",
"query": "level=error",
"duration": 30,
"regex_extractors": [
{ "pattern": "service=(\\S+)", "label_name": "service" }
]
}
},
{
"kubectl_command_executor": {
"for_each": "{{ extracted_labels['logs_0']['_series'] }}",
"for_each_limit": 5,
"command": "kubectl describe deploy {{ item.service }} -n production"
}
}
]

Example — if to gate expensive evidence

The first action gets a title so we can reference it cleanly, and the gt filter produces the lowercase "true" / "false" string the if: field expects.

[
{
"pod_enricher": {
"title": "Pod details"
}
},
{
"pod_profiler": {
"if": "{{ outputs['Pod details'].data.containers[0].restarts | gt(5) }}",
"profile_type": "cpu",
"duration": 60
}
}
]

See Also