detecting-dns-exfiltration-with-dns-query-analysis

mukul975/Anthropic-Cybersecurity-Skills · updated May 25, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills install mukul975/Anthropic-Cybersecurity-Skills/detecting-dns-exfiltration-with-dns-query-analysis
0 commentsdiscussion
summary

Detect data exfiltration through DNS tunneling by analyzing query entropy, subdomain length, query volume, TXT record abuse, and response payload sizes using passive DNS monitoring.

skill.md
name
detecting-dns-exfiltration-with-dns-query-analysis
description
Detect data exfiltration through DNS tunneling by analyzing query entropy, subdomain length, query volume, TXT record abuse, and response payload sizes using passive DNS monitoring.
domain
cybersecurity
subdomain
network-security
tags
- dns-exfiltration - dns-tunneling - data-exfiltration - threat-detection - entropy-analysis - passive-dns - network-monitoring - iodine - dnscat2
version
'1.0'
author
mahipal
license
Apache-2.0
nist_csf
- PR.IR-01 - DE.CM-01 - ID.AM-03 - PR.DS-02

Detecting DNS Exfiltration with DNS Query Analysis

Overview

DNS exfiltration exploits the Domain Name System as a covert channel to extract data from compromised networks. Attackers encode stolen data into DNS query names (subdomains) or DNS response records (TXT, CNAME, NULL), bypassing traditional security controls that typically allow DNS traffic unrestricted. Tools like iodine, dnscat2, and dns2tcp enable full TCP tunneling over DNS. Detection requires analyzing DNS query patterns for anomalies including excessive query length, high entropy subdomain strings, abnormal query volumes to single domains, and oversized TXT record responses. This skill covers building a comprehensive DNS exfiltration detection capability using passive DNS analysis, statistical methods, and machine learning approaches.

When to Use

  • When investigating security incidents that require detecting dns exfiltration with dns query analysis
  • When building detection rules or threat hunting queries for this domain
  • When SOC analysts need structured procedures for this analysis type
  • When validating security monitoring coverage for related attack techniques

Prerequisites

  • Access to DNS query logs (passive DNS capture, DNS server logs, or PCAP)
  • Zeek, Suricata, or tcpdump for DNS traffic capture
  • Python 3.8+ with scipy, numpy, pandas, and scikit-learn
  • SIEM platform for alert correlation
  • Baseline of normal DNS traffic patterns for the environment

Core Concepts

DNS Tunneling Mechanics

DNS exfiltration encodes data in different parts of DNS messages:

Outbound (Query-based exfiltration):

Encoded data as subdomain labels:
dGhlIHNlY3JldCBkYXRh.exfil.attacker.com
[base64-encoded data].[tunnel domain]

Query types used: A, AAAA, CNAME, MX, TXT, NULL

Inbound (Response-based command channel):

TXT records carry encoded commands/data in responses
CNAME records chain encoded data through multiple labels
NULL records carry arbitrary binary data

Detection Indicators

IndicatorNormal DNSDNS Tunneling
Subdomain length5-20 chars40-253 chars
Label count2-4 labels5-10+ labels
Shannon entropy2.5-3.5 bits4.0-5.5 bits
Query volume (per domain)Variable100s-1000s/min
TXT response size< 100 bytes200-4000+ bytes
Unique subdomainsLowVery high
Query type distributionMostly A/AAAAHeavy TXT, NULL, CNAME

Common Tunneling Tools

ToolProtocolEncodingDetection Difficulty
iodineIP-over-DNSBase32/Base64/RawMedium
dnscat2TCP-over-DNSHex encodingMedium
dns2tcpTCP-over-DNSBase64Medium
DNSExfiltratorCustomBase64Low
Cobalt Strike DNSC2 over DNSCustom encodingHigh

Workflow

Step 1: Capture DNS Traffic

Using Zeek:

# Live capture
zeek -i eth0 -C base/protocols/dns

# Offline PCAP analysis
zeek -r traffic.pcap base/protocols/dns

# Output: dns.log with query, qtype, answers, TTL

Using tcpdump:

# Capture all DNS traffic
tcpdump -i eth0 -w dns_capture.pcap port 53

# Capture with size filter (large DNS packets)
tcpdump -i eth0 -w large_dns.pcap 'port 53 and greater 512'

Using Suricata:

# In suricata.yaml, enable DNS logging
outputs:
  - eve-log:
      types:
        - dns:
            query: yes
            answer: yes
            formats: [detailed]

Step 2: Analyze Query Characteristics

Python script for DNS exfiltration detection:

#!/usr/bin/env python3
"""DNS Exfiltration Detector - Analyzes DNS logs for tunneling indicators."""

import json
import math
import re
import sys
from collections import defaultdict
from datetime import datetime, timedelta

import pandas as pd


def calculate_entropy(domain: str) -> float:
    """Calculate Shannon entropy of a string."""
    if not domain:
        return 0.0
    freq = defaultdict(int)
    for char in domain:
        freq[char] += 1
    length = len(domain)
    entropy = -sum(
        (count / length) * math.log2(count / length)
        for count in freq.values()
    )
    return entropy


def extract_subdomain(query: str) -> str:
    """Extract subdomain portion from FQDN."""
    parts = query.rstrip('.').split('.')
    if len(parts) > 2:
        return '.'.join(parts[:-2])
    return ''


def get_base_domain(query: str) -> str:
    """Extract registered domain from FQDN."""
    parts = query.rstrip('.').split('.')
    if len(parts) >= 2:
        return '.'.join(parts[-2:])
    return query


def is_base64_like(s: str) -> bool:
    """Check if string resembles base64 encoding."""
    b64_chars = set('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=')
    if len(s) < 10:
        return False
    char_ratio = sum(1 for c in s if c in b64_chars) / len(s)
    return char_ratio > 0.9 and calculate_entropy(s) > 4.0


def is_hex_encoded(s: str) -> bool:
    """Check if string appears hex-encoded."""
    hex_chars = set('0123456789abcdefABCDEF')
    if len(s) < 16:
        return False
    clean = s.replace('.', '').replace('-', '')
    return all(c in hex_chars for c in clean) and len(clean) % 2 == 0


class DNSExfiltrationDetector:
    def __init__(self):
        self.domain_stats = defaultdict(lambda: {
            'query_count': 0,
            'unique_subdomains': set(),
            'total_subdomain_length': 0,
            'entropy_sum': 0.0,
            'query_types': defaultdict(int),
            'source_ips': set(),
            'first_seen': None,
            'last_seen': None,
            'txt_response_sizes': [],
        })

        # Detection thresholds
        self.thresholds = {
            'min_query_count': 50,
            'min_unique_subdomains': 30,
            'avg_subdomain_length': 30,
            'avg_entropy': 3.8,
            'unique_ratio': 0.7,
            'txt_query_ratio': 0.3,
            'max_label_length': 63,
            'max_subdomain_labels': 5,
        }

    def process_query(self, timestamp, src_ip, query, qtype, response_size=0):
        """Process a single DNS query and update statistics."""
        base_domain = get_base_domain(query)
        subdomain = extract_subdomain(query)

        stats = self.domain_stats[base_domain]
        stats['query_count'] += 1
        stats['unique_subdomains'].add(subdomain)
        stats['total_subdomain_length'] += len(subdomain)
        stats['entropy_sum'] += calculate_entropy(subdomain)
        stats['query_types'][qtype] += 1
        stats['source_ips'].add(src_ip)

        if stats['first_seen'] is None:
            stats['first_seen'] = timestamp
        stats['last_seen'] = timestamp

        if qtype in ('TXT', 'NULL') and response_size > 0:
            stats['txt_response_sizes'].append(response_size)

    def analyze(self):
        """Analyze accumulated statistics and return suspicious domains."""
        alerts = []

        for domain, stats in self.domain_stats.items():
            if stats['query_count'] < self.thresholds['min_query_count']:
                continue

            unique_count = len(stats['unique_subdomains'])
            avg_length = stats['total_subdomain_length'] / stats['query_count']
            avg_entropy = stats['entropy_sum'] / stats['query_count']
            unique_ratio = unique_count / stats['query_count']

            txt_queries = stats['query_types'].get('TXT', 0) + stats['query_types'].get('NULL', 0)
            txt_ratio = txt_queries / stats['query_count']

            score = 0
            indicators = []

            if avg_length > self.thresholds['avg_subdomain_length']:
                score += 25
                indicators.append(f"high_avg_subdomain_length={avg_length:.1f}")

            if avg_entropy > self.thresholds['avg_entropy']:
                score += 25
                indicators.append(f"high_entropy={avg_entropy:.2f}")

            if unique_ratio > self.thresholds['unique_ratio']:
                score += 20
                indicators.append(f"high_unique_ratio={unique_ratio:.2f}")

            if txt_ratio > self.thresholds['txt_query_ratio']:
                score += 15
                indicators.append(f"high_txt_ratio={txt_ratio:.2f}")

            if unique_count > self.thresholds['min_unique_subdomains']:
                score += 15
                indicators.append(f"unique_subdomains={unique_count}")

            # Check for encoding patterns
            encoded_count = sum(
                1 for sd in list(stats['unique_subdomains'])[:100]
                if is_base64_like(sd) or is_hex_encoded(sd)
            )
            if encoded_count > 20:
                score += 20
                indicators.append(f"encoded_subdomains={encoded_count}")

            if score >= 50:
                duration = (stats['last_seen'] - stats['first_seen']).total_seconds() if stats['first_seen'] and stats['last_seen'] else 0
                alerts.append({
                    'domain': domain,
                    'score': min(score, 100),
                    'query_count': stats['query_count'],
                    'unique_subdomains': unique_count,
                    'avg_subdomain_length': round(avg_length, 1),
                    'avg_entropy': round(avg_entropy, 2),
                    'unique_ratio': round(unique_ratio, 2),
                    'txt_ratio': round(txt_ratio, 2),
                    'source_ips': list(stats['source_ips']),
                    'duration_seconds': duration,
                    'indicators': indicators,
                })

        return sorted(alerts, key=lambda x: x['score'], reverse=True)

    def process_zeek_dns_log(self, log_path):
        """Process Zeek dns.log file."""
        with open(log_path, 'r') as f:
            for line in f:
                if line.startswith('#'):
                    continue
                fields = line.strip().split('\t')
                if len(fields) < 22:
                    continue
                try:
                    ts = datetime.fromtimestamp(float(fields[0]))
                    src_ip = fields[2]
                    query = fields[9]
                    qtype = fields[11]
                    self.process_query(ts, src_ip, query, qtype)
                except (ValueError, IndexError):
                    continue

    def process_eve_json(self, log_path):
        """Process Suricata EVE JSON DNS log."""
        with open(log_path, 'r') as f:
            for line in f:
                try:
                    event = json.loads(line)
                    if event.get('event_type') != 'dns':
                        continue
                    dns = event.get('dns', {})
                    ts = datetime.fromisoformat(event['timestamp'].replace('Z', '+00:00'))
                    src_ip = event.get('src_ip', '')
                    query = dns.get('rrname', '')
                    qtype = dns.get('rrtype', '')
                    self.process_query(ts, src_ip, query, qtype)
                except (json.JSONDecodeError, KeyError, ValueError):
                    continue


def main():
    detector = DNSExfiltrationDetector()

    log_file = sys.argv[1] if len(sys.argv) > 1 else '/opt/zeek/logs/current/dns.log'

    if log_file.endswith('.json'):
        detector.process_eve_json(log_file)
    else:
        detector.process_zeek_dns_log(log_file)

    alerts = detector.analyze()

    if alerts:
        print(f"\n{'='*80}")
        print(f"DNS EXFILTRATION DETECTION RESULTS - {len(alerts)} suspicious domains found")
        print(f"{'='*80}\n")

        for alert in alerts:
            severity = "CRITICAL" if alert['score'] >= 80 else "HIGH" if alert['score'] >= 60 else "MEDIUM"
            print(f"[{severity}] Domain: {alert['domain']}")
            print(f"  Score: {alert['score']}/100")
            print(f"  Queries: {alert['query_count']}, Unique Subdomains: {alert['unique_subdomains']}")
            print(f"  Avg Subdomain Length: {alert['avg_subdomain_length']}, Avg Entropy: {alert['avg_entropy']}")
            print(f"  Source IPs: {', '.join(alert['source_ips'][:5])}")
            print(f"  Indicators: {', '.join(alert['indicators'])}")
            print()
    else:
        print("No DNS exfiltration indicators detected.")


if __name__ == '__main__':
    main()

Step 3: Deploy Suricata Rules for DNS Exfiltration

# Detect long DNS queries (potential tunneling)
alert dns $HOME_NET any -> any 53 (msg:"DNS Exfiltration - Excessive query length"; dns.query; content:"."; pcre:"/^.{60,}/"; threshold:type both,track by_src,count 20,seconds 60; classtype:bad-unknown; sid:3000001; rev:1;)

# Detect high-entropy DNS subdomain
alert dns $HOME_NET any -> any 53 (msg:"DNS Exfiltration - High entropy subdomain"; dns.query; pcre:"/^[a-zA-Z0-9+\/=]{30,}\./"; threshold:type both,track by_src,count 10,seconds 60; classtype:bad-unknown; sid:3000002; rev:1;)

# Detect large TXT record responses
alert dns any 53 -> $HOME_NET any (msg:"DNS Exfiltration - Large TXT response"; content:"|00 10|"; byte_test:2,>,400,0,relative; classtype:bad-unknown; sid:3000003; rev:1;)

# Detect NULL record queries (used by iodine)
alert dns $HOME_NET any -> any 53 (msg:"DNS Exfiltration - NULL record query (iodine indicator)"; content:"|00 0a|"; classtype:bad-unknown; sid:3000004; rev:1;)

# Detect dnscat2 traffic pattern
alert dns $HOME_NET any -> any 53 (msg:"DNS Exfiltration - dnscat2 indicator"; dns.query; content:"dnscat"; nocase; classtype:trojan-activity; sid:3000005; rev:1;)

Step 4: SIEM Detection Rules

Splunk SPL query for DNS exfiltration:

index=dns sourcetype=zeek:dns
| eval subdomain=mvindex(split(query,"."),0)
| eval subdomain_len=len(subdomain)
| eval label_count=mvcount(split(query,"."))
| stats count as query_count,
        dc(subdomain) as unique_subdomains,
        avg(subdomain_len) as avg_sub_len,
        values(src_ip) as source_ips
        by query_domain
| where query_count > 100 AND avg_sub_len > 30 AND unique_subdomains > 50
| eval risk_score = case(
    avg_sub_len > 50 AND unique_subdomains > 200, "Critical",
    avg_sub_len > 40 AND unique_subdomains > 100, "High",
    avg_sub_len > 30 AND unique_subdomains > 50, "Medium",
    true(), "Low")
| sort -query_count
| table query_domain risk_score query_count unique_subdomains avg_sub_len source_ips

Response Actions

  1. Block the tunnel domain at DNS resolver and firewall level
  2. Isolate the source host from the network for forensic investigation
  3. Capture full PCAP of the DNS traffic for evidence preservation
  4. Identify exfiltrated data by decoding captured DNS queries
  5. Check for persistence mechanisms on the compromised host
  6. Update blocklists with identified C2 domains and infrastructure

Best Practices

  • DNS Logging - Enable full DNS query and response logging at resolvers and network level
  • Internal DNS Only - Force all DNS through internal resolvers; block direct external DNS (port 53)
  • Response Policy Zones - Deploy RPZ feeds to block known tunneling domains
  • Baseline First - Establish normal DNS query patterns before setting detection thresholds
  • TXT Record Monitoring - Pay special attention to TXT and NULL record queries
  • Encrypted DNS Awareness - Monitor for DoH/DoT usage that may bypass DNS inspection

References

how to use detecting-dns-exfiltration-with-dns-query-analysis

How to use detecting-dns-exfiltration-with-dns-query-analysis on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add detecting-dns-exfiltration-with-dns-query-analysis
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills install mukul975/Anthropic-Cybersecurity-Skills/detecting-dns-exfiltration-with-dns-query-analysis

The skills CLI fetches detecting-dns-exfiltration-with-dns-query-analysis from GitHub repository mukul975/Anthropic-Cybersecurity-Skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/detecting-dns-exfiltration-with-dns-query-analysis

Reload or restart Cursor to activate detecting-dns-exfiltration-with-dns-query-analysis. Access the skill through slash commands (e.g., /detecting-dns-exfiltration-with-dns-query-analysis) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Exploratory Data Analysis

Quickly understand datasets, identify patterns, and generate insights

Example

Analyze CSV with 100K rows, identify outliers, visualize correlations, suggest hypotheses

Reduce EDA time from hours to minutes, uncover insights faster

Data Cleaning & Transformation

Write scripts to clean messy data, handle missing values, normalize formats

Example

Generate Python/SQL to fix date formats, impute missing values, remove duplicates

Automate 80% of data preprocessing work

Statistical Analysis

Perform hypothesis testing, regression, and statistical modeling

Example

Run A/B test analysis, calculate confidence intervals, interpret p-values

Get statistically sound analysis without PhD in statistics

Data Visualization

Create charts, dashboards, and visual reports

Example

Generate matplotlib/seaborn code for time series plots, distribution charts, heatmaps

Build presentation-ready visualizations 3x faster

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client
  • Python environment (pandas, numpy, matplotlib) or SQL database access
  • Basic understanding of data analysis concepts
  • Sample datasets for testing skill capabilities

Time Estimate

20-40 minutes to set up and run first analysis

Installation Steps

  1. 1.Install data analysis skill using provided command
  2. 2.Prepare a sample dataset (CSV, JSON, or database connection)
  3. 3.Start with descriptive statistics: 'Summarize this dataset'
  4. 4.Progress to visualization: 'Create a scatter plot of X vs Y'
  5. 5.Advanced analysis: 'Run linear regression and interpret results'
  6. 6.Validate outputs: check calculations, verify visualizations make sense
  7. 7.Document analysis workflow for reproducibility

Common Pitfalls

  • Not validating statistical assumptions before applying tests
  • Accepting visualizations without checking data accuracy
  • Overlooking data quality issues (missing values, outliers)
  • Misinterpreting correlation as causation
  • Using wrong statistical test for data distribution
  • Not considering sample size and statistical power

Best Practices

✓ Do

  • +Always validate data quality before analysis
  • +Check statistical assumptions (normality, independence, etc.)
  • +Visualize data before running statistical tests
  • +Document analysis steps for reproducibility
  • +Cross-validate findings with domain experts
  • +Use skill for initial exploration, then dive deeper manually
  • +Save generated code for reuse on similar datasets

✗ Don't

  • Don't trust analysis without verifying data quality
  • Don't apply statistical tests without checking assumptions
  • Don't make business decisions solely on AI-generated analysis
  • Don't ignore outliers without investigating cause
  • Don't skip data validation and sanity checks
  • Don't use for mission-critical financial or medical analysis without expert review

💡 Pro Tips

  • Describe data context: 'This is user behavior data from e-commerce site'
  • Ask for interpretation: 'What does this correlation mean for business?'
  • Request multiple approaches: 'Show 3 ways to handle missing data'
  • Combine AI analysis with domain expertise for best insights
  • Use for rapid prototyping, then refine analysis manually

When to Use This

✓ Use When

Use for exploratory data analysis, data cleaning, statistical testing, visualization prototyping, and learning new analysis techniques. Best for initial exploration and rapid insights.

✗ Avoid When

Avoid for mission-critical financial analysis, medical research requiring regulatory compliance, production ML models, or when deep statistical expertise is required for nuanced interpretation.

Learning Path

  1. 1Basic: descriptive statistics, data cleaning, simple visualizations
  2. 2Intermediate: hypothesis testing, regression, correlation analysis
  3. 3Advanced: time series analysis, clustering, predictive modeling
  4. 4Expert: causal inference, experimental design, advanced statistical methods

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.668 reviews
  • Pratham Ware· Dec 28, 2024

    Keeps context tight: detecting-dns-exfiltration-with-dns-query-analysis is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Evelyn Harris· Dec 28, 2024

    detecting-dns-exfiltration-with-dns-query-analysis has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Kaira Zhang· Dec 28, 2024

    detecting-dns-exfiltration-with-dns-query-analysis has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Chaitanya Patil· Dec 20, 2024

    detecting-dns-exfiltration-with-dns-query-analysis reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Nia White· Dec 12, 2024

    Solid pick for teams standardizing on skills: detecting-dns-exfiltration-with-dns-query-analysis is focused, and the summary matches what you get after install.

  • Evelyn Liu· Dec 12, 2024

    detecting-dns-exfiltration-with-dns-query-analysis fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Evelyn Smith· Nov 19, 2024

    detecting-dns-exfiltration-with-dns-query-analysis fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Zaid Ghosh· Nov 19, 2024

    detecting-dns-exfiltration-with-dns-query-analysis fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Li Diallo· Nov 15, 2024

    Keeps context tight: detecting-dns-exfiltration-with-dns-query-analysis is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Piyush G· Nov 11, 2024

    I recommend detecting-dns-exfiltration-with-dns-query-analysis for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

showing 1-10 of 68

1 / 7