A US-based managed security services provider integrated their guard tour platform with Splunk Enterprise in February. They configured webhooks to send every checkpoint scan as an individual event for "full audit trail". By week 3, their Splunk license consumption had grown from 12 GB/day to 64 GB/day, $180 000/year over their licensed indexer capacity. The CFO called an emergency meeting. The MSSP's solution architect spent the next two weeks redesigning the webhook integration — batching, severity filtering, structured payloads. Daily volume dropped from 64 GB to 4.8 GB without losing any audit value. The Splunk license bill went back to baseline.
The mistake wasn't using Splunk — it was treating webhooks as "raw event streams" instead of designing them as a SIEM-aware integration. Same mistake happens daily with Microsoft Sentinel ingestion costs, Slack rate limits and Teams adaptive card rendering. This post explains the four production patterns that make guard tour webhook integrations sustainable at scale, with real payload examples for each of Splunk, Sentinel, Slack and Teams. Written for security engineers and platform integrators.
The four patterns
Pattern 1 — Batching by time window or count. Instead of one webhook per event, the platform aggregates events into batches (e.g., every 60 seconds or every 50 events, whichever comes first) and sends a single webhook with an array of events. Reduces HTTP overhead by 50-200x. Most SIEM ingestion endpoints accept batches natively (Splunk HEC, Sentinel Logs Ingestion API). Slack and Teams have message rate limits that batching circumvents — instead of 50 messages, you send 1 message with 50 lines.
Pattern 2 — Severity filtering at source. Not every event needs to leave the platform. A "checkpoint scanned OK" event is audit material (stays in the platform's internal log) but should NOT hit your SIEM or chat. Only events of operational interest go out: alarms, anomalies, fraud detection signals, supervisor-flagged incidents, missed checkpoints. Severity tags (P1 critical, P2 important, P3 informational, P4 audit-only) filter at the source.
Pattern 3 — Structured payload aligned to receiver schema. Each receiver has its own preferred schema. Splunk expects CIM (Common Information Model) fields. Sentinel expects ASIM (Advanced Security Information Model). Slack expects Block Kit. Teams expects Adaptive Cards. A platform that emits a single JSON schema and lets you reshape on the receiver side requires complex Splunk SPL or Logic Apps transformations. A platform that emits schema-aware payloads per target keeps the integration boilerplate to near zero.
Pattern 4 — Dead-letter queue with exponential retry. Webhooks fail. Receiver down, rate-limited, payload too big, certificate expired — countless reasons. A naive integration drops the event silently. Production-grade integration has retry with exponential backoff (3 attempts at 1s, 5s, 30s) and on permanent failure pushes to a dead-letter queue (S3 bucket, internal API) that operators can review and replay manually. Lost events break audit trails.
Splunk integration pattern (HEC + CIM)
Splunk receives via HTTP Event Collector (HEC) with a token. Batching pattern: 50 events per HEC POST, gzip-compressed body. CIM field mapping for security incidents:
POST https://splunk.client.example.com:8088/services/collector/event
Authorization: Splunk {hec_token}
Content-Encoding: gzip
{
"events": [
{
"time": 1748332245,
"host": "guard-tour-prod",
"source": "patroltech.guardtour.app",
"sourcetype": "guardtour:incident",
"index": "security_main",
"event": {
"src_user": "officer.j.smith",
"user_id": "usr_3f8a2e",
"src_ip": "10.20.30.40",
"category": "Authentication",
"subject": "checkpoint_scan_failed",
"action": "denied",
"result": "failure",
"severity": "high",
"site": "DC-East-Tier3",
"checkpoint_id": "cp_a4f7c2",
"geofence_match": false,
"device_id": "device_b8e1d3",
"msg": "Officer scan from device outside expected geofence",
"_time_event": "2026-05-27T14:30:45Z"
}
}
]
}
CIM Authentication data model is the right fit for "officer logged action": gets you correlation with other login events, anomaly detection on user_id, and out-of-the-box dashboards. The platform should emit only events that map cleanly to a CIM model; raw status pings go to the internal audit log, never to Splunk.
Volume management: a 500-checkpoint operation with 3 patrol cycles per shift × 2 shifts/day produces 3 000 scan events/day. At full firehose that's 100-300 MB/day raw. With severity filtering (only deny/anomaly/incident), volume drops to 50-200 events/day at 100-500 KB/day — sustainable on any Splunk license tier.
Microsoft Sentinel integration pattern (Logs Ingestion API + ASIM)
Sentinel uses the Logs Ingestion API (deprecated HTTP Data Collector API is going away). Pattern: batch 50 events to a custom table via Data Collection Rule (DCR). ASIM AuditEvent schema fits guard tour events:
POST https://{dce-endpoint}.azure.com/dataCollectionRules/{dcr-id}/streams/Custom-GuardTour_CL?api-version=2023-01-01
Authorization: Bearer {azure_ad_token}
Content-Type: application/json
[
{
"TimeGenerated": "2026-05-27T14:30:45Z",
"EventType": "ResourceAccess",
"EventResult": "Failure",
"EventSeverity": "Medium",
"ActorUsername": "officer.j.smith",
"ActorUserId": "usr_3f8a2e",
"ActorSessionId": "shift_a7b3c1",
"TargetResourceName": "DC-East-Tier3/Cold-Aisle-7",
"TargetResourceType": "PhysicalLocation",
"EventMessage": "Checkpoint scan from device outside expected geofence",
"SrcDeviceId": "device_b8e1d3",
"SrcGeoCity": "Phoenix",
"AdditionalFields": {
"checkpoint_id": "cp_a4f7c2",
"geofence_match": false,
"distance_to_geofence_m": 87
}
}
]
Sentinel Analytics Rules can correlate guard tour denials with badge access denials, network anomalies and identity events out of the box if the ASIM mapping is correct. Sentinel ingestion cost depends on data volume (Pay-As-You-Go ~$2.30/GB ingested). 200 events/day at avg 1.5 KB = 300 KB/day = $0.20/month per operation. Sustainable.
Authentication: Azure AD service principal with Monitoring Metrics Publisher role on the DCR. Token refreshed every 50 minutes (1h expiry). Token caching at the platform layer is critical to avoid Azure AD throttling.
Slack integration pattern (Block Kit + thread grouping)
Slack is communication, not SIEM. Pattern: ONE message per incident, with subsequent updates as thread replies instead of new messages. Severity filtered to P1+P2 only (P3/P4 stay in audit log).
POST https://hooks.slack.com/services/{T}/{B}/{X}
Content-Type: application/json
{
"channel": "#security-ops",
"username": "guardtour.app",
"icon_emoji": ":shield:",
"text": "P1 incident at DC-East-Tier3",
"blocks": [
{
"type": "header",
"text": {"type": "plain_text", "text": ":rotating_light: P1: Geofence violation"}
},
{
"type": "section",
"fields": [
{"type": "mrkdwn", "text": "*Site:*\nDC-East-Tier3 / Cold-Aisle-7"},
{"type": "mrkdwn", "text": "*Officer:*\n<@U03XYZ123>"},
{"type": "mrkdwn", "text": "*Time:*\n14:30:45 UTC"},
{"type": "mrkdwn", "text": "*Severity:*\n:large_orange_circle: High"}
]
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Officer scan registered from device 87m outside expected geofence. Possible fraud or GPS drift. Supervisor acknowledged at 14:31:12."
}
},
{
"type": "actions",
"elements": [
{"type": "button", "text": {"type": "plain_text", "text": "View in console"}, "url": "https://guardtour.app/incidents/inc_5f7a..."},
{"type": "button", "text": {"type": "plain_text", "text": "Acknowledge"}, "value": "ack_inc_5f7a..."}
]
}
],
"thread_ts": null
}
Subsequent updates (officer acknowledged, supervisor arrived, incident resolved) post as thread_ts replies on the original message. This keeps the channel clean and groups related events together. Rate limit: 1 message/second per channel, 100 messages/minute per workspace. Batching to one initial message + thread updates keeps any operation well under the limit.
For Slack ChatOps: action buttons can call back to the platform via Slack interactive components. "Acknowledge" button hits a platform webhook that updates the incident status without leaving Slack.
Microsoft Teams integration pattern (Adaptive Cards + workflow connector)
Teams uses workflow webhooks (the legacy "Incoming Webhook" is being deprecated October 2025). Pattern: Adaptive Card 1.5 with severity-colored header. Same batching/filtering rules as Slack.
POST https://prod-12.westus.logic.azure.com/workflows/{flow-id}/triggers/manual/paths/invoke?...
Content-Type: application/json
{
"type": "message",
"attachments": [
{
"contentType": "application/vnd.microsoft.card.adaptive",
"content": {
"$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"type": "AdaptiveCard",
"version": "1.5",
"body": [
{
"type": "Container",
"style": "warning",
"items": [
{"type": "TextBlock", "size": "Large", "weight": "Bolder", "text": "P1: Geofence violation"}
]
},
{
"type": "FactSet",
"facts": [
{"title": "Site", "value": "DC-East-Tier3 / Cold-Aisle-7"},
{"title": "Officer", "value": "Jane Smith"},
{"title": "Time", "value": "14:30:45 UTC"},
{"title": "Distance", "value": "87m outside geofence"}
]
},
{
"type": "TextBlock",
"wrap": true,
"text": "Officer scan registered from device outside expected geofence. Possible fraud or GPS drift."
},
{
"type": "ActionSet",
"actions": [
{"type": "Action.OpenUrl", "title": "View in console", "url": "https://guardtour.app/incidents/inc_5f7a..."},
{"type": "Action.Submit", "title": "Acknowledge", "data": {"action": "ack", "incident_id": "inc_5f7a..."}}
]
}
]
}
}
]
}
Teams adaptive cards render correctly in mobile, desktop and web. Action.Submit returns to the workflow which can call back to the platform's API for acknowledgment. Rate limit: 4 messages/second per webhook, 30/minute average. Same batching pattern as Slack applies.
Dead-letter queue and retry
Production retry policy with exponential backoff:
- Attempt 1: immediate.
- Attempt 2: 1 second later (transient network errors).
- Attempt 3: 5 seconds later (receiver throttling).
- Attempt 4: 30 seconds later (receiver brief outage).
- Permanent failure after attempt 4: push to DLQ.
The DLQ is critical for audit trail. A typical implementation:
{
"dlq_id": "dlq_8f3a2c1",
"original_target": "splunk.client.example.com",
"original_endpoint": "/services/collector/event",
"original_payload": { ... full event ... },
"first_attempt_at": "2026-05-27T14:30:45Z",
"last_attempt_at": "2026-05-27T14:31:22Z",
"attempts": 4,
"last_error": "HTTP 503 Service Unavailable",
"status": "failed_permanent",
"expires_at": "2026-06-27T14:30:45Z"
}
DLQ entries get a 30-day retention by default. Operators can replay them manually from the platform console once the receiver issue is resolved. After 30 days they archive to cold storage for audit (regulatory minimum 5-year retention applies).
Common integration mistakes
Sending every scan as a webhook. Saturates the receiver's ingestion, costs money, hides signal in noise. Filter at source: P1/P2 only, P3/P4 stay in audit log.
No batching for high-volume receivers. SIEM ingestion charges per ingestion event in some pricing tiers. 1 batched webhook with 50 events = 50x cheaper than 50 individual webhooks at the same total bytes.
Mismatched schema between sender and receiver. Sending raw JSON to Splunk and requiring SPL transformations to map fields adds 50-200ms per query and clutters dashboards. Schema-aware emission saves both ingestion and query time.
No retry on transient failures. Receiver had a 503 for 4 seconds during a deploy. Without retry, you lost 240 events. With exponential retry, you lost zero.
Authentication tokens hardcoded with no rotation. HEC token leaked via stack trace stays valid for years. Best practice: rotate quarterly via SRE process, support short-lived OAuth tokens where possible.
No DLQ. When something fails permanently, you don't know. The integration "works" until an auditor asks for an event that's missing.
Sending PII without explicit DPA mapping. Officer names, geolocation and device IDs are PII. Receiver-side DPA (e.g., Splunk Cloud on EU vs US) needs to be reviewed before configuring. Anonymization at source is an option for some use cases.
How guardtour.app implements these patterns
Native integrations with Splunk HEC (CIM-mapped), Microsoft Sentinel (Logs Ingestion API + ASIM-mapped), Slack (Block Kit + thread grouping with action buttons), Microsoft Teams (Adaptive Cards via workflow connector). Pattern defaults: batching by 50 events or 60 seconds, severity filter set to P1+P2 by default with operator override per integration, exponential retry 4 attempts, 30-day DLQ with manual replay. Receiver-specific schema emission instead of generic JSON. Authentication: HEC token in Splunk, Azure AD service principal for Sentinel, signed Slack webhook URL, workflow connector for Teams. Token rotation supported with zero-downtime swap. DLQ accessible via platform console and API. Available on Pro tier and above.
To go deeper
- /blog/guard-tour-pricing-per-officer-vs-per-site-tco — integration tier (Pro vs Complete) affects which webhook integrations are available.
- /blog/offline-guard-tour-app-no-internet — when the device is offline, events queue locally and webhook delivery is deferred until reconnect.
- /blog/eu-ai-act-guard-tour-software-2026 — webhook payloads carry the audit trail that EU AI Act needs.
- /blog/hot-cold-aisle-patrol-soc2-evidence — SOC 2 evidence depends on the webhook receiver being independent and tamper-resistant.