Category: Server

  • N8N Project for My production Server

    Production Monitoring & Security Automation Runbook

    Purpose

    This runbook describes how to operate, monitor, and respond to events generated by the company’s production automation stack:

    • n8n (orchestration)
    • AdGuard DNS (Primary & Secondary Raspberry Pi)
    • Fail2Ban
    • Uptime Kuma
    • Slack (alerting)
    • Omada Controller (network devices)

    It is written so that any on-call engineer can safely respond to alerts without deep system knowledge.


    System Overview

    What this system does

    • Monitors DNS behavior on two AdGuard servers (Pi1 = Primary, Pi2 = Secondary)
    • Detects possible DNS abuse / attacks using query heuristics
    • Automatically blocks malicious IPs in AdGuard (when enabled)
    • Monitors uptime of both DNS servers
    • Pushes health heartbeats to Uptime Kuma
    • Receives Fail2Ban ban/unban events from multiple hosts
    • Receives Omada controller events (AP, gateway, switch up/down)
    • Sends actionable alerts to Slack

    What it does NOT do

    • It does not permanently blacklist IPs without review
    • It does not modify firewall rules (DNS-layer only)
    • It does not auto-restart servers

    Normal Operation (Healthy State)

    Expected behavior

    • Cron runs every minute
    • Slack is quiet most of the time
    • Uptime Kuma shows:
      • Pi1 Uptime: UP
      • Pi2 Uptime: UP
      • DNS Status: NORMAL

    Normal Slack messages

    • ✅ DNS NORMAL (baseline)
    • ✅ DNS OK / RECOVERED
    • ✅ Fail2Ban UNBANNED
    • ℹ️ Omada informational events

    No action is required in these cases.


    Alert Types & Response Actions

    🚨 POSSIBLE DNS ATTACK

    Meaning

    • One client is responsible for an abnormally high percentage of DNS queries
    • Triggered when:
      • ≥ 80% of recent queries OR
      • ≥ 400 queries in sample window

    Automatic actions

    • AdGuard auto-block may already be applied
    • IP reputation (IPinfo) is attached to the alert

    Required response (step-by-step)

    1. Open the Slack alert
    2. Review:
      • Attacker IP
      • Client name (if known)
      • Organization / ASN
    3. Log into the affected AdGuard server
    4. Open Query Log
    5. Confirm traffic pattern matches alert
    6. If legitimate client:
      • Remove IP from disallowed_clients
      • Add client to DNS whitelist in n8n
    7. If malicious:
      • No action needed (auto-block handled it)

    Escalation

    • Repeated attacks from different IPs → notify network/security team

    ✅ DNS OK / RECOVERED

    Meaning

    • DNS traffic has returned to normal

    Action

    • None required

    🔴 / 🚨 UPTIME DOWN

    Meaning

    • DNS server is unreachable or returning bad HTTP status

    Response steps

    1. Check Uptime Kuma for confirmation
    2. Attempt to reach host:
      • Ping
      • HTTPS access
    3. If unreachable:
      • Check power
      • Check network connectivity
    4. Review system logs if accessible
    5. Restart service/server if required

    Escalation

    • If downtime > SLA threshold, notify management

    🚫 Fail2Ban BANNED

    Meaning

    • Fail2Ban blocked an IP due to repeated authentication failures

    Automatic actions

    • IP already blocked at service level
    • Geo/IP data added automatically

    Response steps

    1. Review IP reputation in Slack
    2. Confirm jail name (sshd, nginx, etc.)
    3. If internal or known IP:
      • Manually unban
      • Adjust Fail2Ban rules if needed
    4. If external/malicious:
      • No action required

    🚨 Omada Device DOWN

    Meaning

    • AP, gateway, or switch disconnected

    Response steps

    1. Identify device and site in Slack alert
    2. Check Omada Controller status
    3. Verify power and uplink
    4. If multiple devices affected:
      • Suspect upstream outage

    Environment & Configuration

    Required environment variables (n8n)

    • F2B_TOKEN
    • IPINFO_TOKEN
    • KUMA_PI1_UPTIME_URL
    • KUMA_PI1_DNS_URL
    • KUMA_PI2_UPTIME_URL
    • KUMA_PI2_DNS_URL

    Webhook endpoints

    • /fail2ban-pi1
    • /fail2ban-pi2
    • /OmadaController
    • /JoeOmadaTPlink

    Maintenance & Safe Changes

    Before making changes

    • Disable auto-block if testing
    • Clone workflow for testing
    • Verify Slack output formatting

    After changes

    • Manually trigger workflow
    • Confirm:
      • No duplicate Slack alerts
      • Kuma heartbeats still flow

    Break-Glass (Emergency)

    If automation behaves incorrectly:

    1. Disable the n8n workflow
    2. Remove IPs from AdGuard block list
    3. Notify security/network team
    4. Document incident

    Ownership

    • System owner: IT / Network Team
    • Primary contact: IT Manager
    • Slack channel: Monitoring / Security Alerts

Secret Link