N8N Project for My production Server

Production Monitoring & Security Automation Runbook

Purpose

This runbook describes how to operate, monitor, and respond to events generated by the company’s production automation stack:

  • n8n (orchestration)
  • AdGuard DNS (Primary & Secondary Raspberry Pi)
  • Fail2Ban
  • Uptime Kuma
  • Slack (alerting)
  • Omada Controller (network devices)

It is written so that any on-call engineer can safely respond to alerts without deep system knowledge.


System Overview

What this system does

  • Monitors DNS behavior on two AdGuard servers (Pi1 = Primary, Pi2 = Secondary)
  • Detects possible DNS abuse / attacks using query heuristics
  • Automatically blocks malicious IPs in AdGuard (when enabled)
  • Monitors uptime of both DNS servers
  • Pushes health heartbeats to Uptime Kuma
  • Receives Fail2Ban ban/unban events from multiple hosts
  • Receives Omada controller events (AP, gateway, switch up/down)
  • Sends actionable alerts to Slack

What it does NOT do

  • It does not permanently blacklist IPs without review
  • It does not modify firewall rules (DNS-layer only)
  • It does not auto-restart servers

Normal Operation (Healthy State)

Expected behavior

  • Cron runs every minute
  • Slack is quiet most of the time
  • Uptime Kuma shows:
    • Pi1 Uptime: UP
    • Pi2 Uptime: UP
    • DNS Status: NORMAL

Normal Slack messages

  • ✅ DNS NORMAL (baseline)
  • ✅ DNS OK / RECOVERED
  • ✅ Fail2Ban UNBANNED
  • ℹ️ Omada informational events

No action is required in these cases.


Alert Types & Response Actions

🚨 POSSIBLE DNS ATTACK

Meaning

  • One client is responsible for an abnormally high percentage of DNS queries
  • Triggered when:
    • ≥ 80% of recent queries OR
    • ≥ 400 queries in sample window

Automatic actions

  • AdGuard auto-block may already be applied
  • IP reputation (IPinfo) is attached to the alert

Required response (step-by-step)

  1. Open the Slack alert
  2. Review:
    • Attacker IP
    • Client name (if known)
    • Organization / ASN
  3. Log into the affected AdGuard server
  4. Open Query Log
  5. Confirm traffic pattern matches alert
  6. If legitimate client:
    • Remove IP from disallowed_clients
    • Add client to DNS whitelist in n8n
  7. If malicious:
    • No action needed (auto-block handled it)

Escalation

  • Repeated attacks from different IPs → notify network/security team

✅ DNS OK / RECOVERED

Meaning

  • DNS traffic has returned to normal

Action

  • None required

🔴 / 🚨 UPTIME DOWN

Meaning

  • DNS server is unreachable or returning bad HTTP status

Response steps

  1. Check Uptime Kuma for confirmation
  2. Attempt to reach host:
    • Ping
    • HTTPS access
  3. If unreachable:
    • Check power
    • Check network connectivity
  4. Review system logs if accessible
  5. Restart service/server if required

Escalation

  • If downtime > SLA threshold, notify management

🚫 Fail2Ban BANNED

Meaning

  • Fail2Ban blocked an IP due to repeated authentication failures

Automatic actions

  • IP already blocked at service level
  • Geo/IP data added automatically

Response steps

  1. Review IP reputation in Slack
  2. Confirm jail name (sshd, nginx, etc.)
  3. If internal or known IP:
    • Manually unban
    • Adjust Fail2Ban rules if needed
  4. If external/malicious:
    • No action required

🚨 Omada Device DOWN

Meaning

  • AP, gateway, or switch disconnected

Response steps

  1. Identify device and site in Slack alert
  2. Check Omada Controller status
  3. Verify power and uplink
  4. If multiple devices affected:
    • Suspect upstream outage

Environment & Configuration

Required environment variables (n8n)

  • F2B_TOKEN
  • IPINFO_TOKEN
  • KUMA_PI1_UPTIME_URL
  • KUMA_PI1_DNS_URL
  • KUMA_PI2_UPTIME_URL
  • KUMA_PI2_DNS_URL

Webhook endpoints

  • /fail2ban-pi1
  • /fail2ban-pi2
  • /OmadaController
  • /JoeOmadaTPlink

Maintenance & Safe Changes

Before making changes

  • Disable auto-block if testing
  • Clone workflow for testing
  • Verify Slack output formatting

After changes

  • Manually trigger workflow
  • Confirm:
    • No duplicate Slack alerts
    • Kuma heartbeats still flow

Break-Glass (Emergency)

If automation behaves incorrectly:

  1. Disable the n8n workflow
  2. Remove IPs from AdGuard block list
  3. Notify security/network team
  4. Document incident

Ownership

  • System owner: IT / Network Team
  • Primary contact: IT Manager
  • Slack channel: Monitoring / Security Alerts

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Secret Link