NetAttackClassifier#
A network traffic analysis and threat classification tool based on Snort rules and the ATT&CK framework
- 🕵️♂️ Snort-based anomaly traffic detection
- 🎯 ATT&CK tactic mapping
- 🧬 High granularity traffic classification
Address: https://github.com/w0s1np/NetAttackClassifier
Project Introduction#
When training traffic classification models, most use publicly available datasets online, but they all share a common drawback: the granularity of traffic classification is insufficient. It would be ideal to distinguish the vulnerability IDs and details used in the traffic for practical use. However, most datasets are constructed either based on known vulnerabilities or malware types to generate corresponding traffic (e.g., https://github.com/yungshenglu/USTC-TFC2016
, https://github.com/safest-place/ExploitPcapCollection
), or through extensive manual classification based on search engines, vulnerability ID databases, etc. (as I learned from some security companies).
Thus, I wrote a tool that detects traffic based on Snort, allowing a large model to analyze logs and map them to ATT&CK tactics, enabling high granularity classification of traffic.
Project Features#
- pcap file detection: Use Snort rules to detect a large number of pcap files, quickly identifying suspicious packets and their corresponding network flows that meet the rules.
- ATT&CK tactic mapping: By calling the DeepSeek API, map the Snort rules triggered by detected suspicious packets to the tactics in the MITRE ATT&CK framework, clarifying the attack methods and stages that attackers may adopt.
- High granularity multi-classification: Further subdivide and classify detected suspicious pcap flows, providing detailed annotations and classifications from multiple dimensions (such as attack type, target system, attack tools, etc.) to provide more precise information for subsequent security analysis and response.
Environment Setup#
-
Install Snort and configure the corresponding rule library (
https://www.snort.org/downloads#rules
), my setup configuration:Install Snort 3 Dependencies#
# Update system sudo apt update && sudo apt upgrade -y # Install compilation dependencies sudo apt install -y build-essential autotools-dev libdumbnet-dev \ libluajit-5.1-dev libpcap-dev libpcre3-dev zlib1g-dev pkg-config \ libhwloc-dev libcmocka-dev liblzma-dev openssl libssl-dev cpputest \ libsqlite3-dev uuid-dev libtool git autoconf bison flex libnetfilter-queue-dev \ libmnl-dev libunwind-dev libfl-dev libsafec-dev # Optional: Install Hyperscan (high-performance regex library) sudo apt install -y libhyperscan-dev # Install CMake (for building dependencies) sudo apt install -y cmake
Install DAQ 3.x Dependencies#
# Install dependencies required to compile DAQ 3.x sudo apt install -y libpcap-dev libdumbnet-dev libssl-dev liblzma-dev \ libcurl4-openssl-dev libhwloc-dev libcmocka-dev libpcre3-dev bison flex
Download and Compile DAQ 3.x#
# Download DAQ 3.x source code cd ~/snort3_build git clone https://github.com/snort3/libdaq.git cd libdaq # Configure, compile, and install ./bootstrap ./configure --prefix=/usr/local make -j$(nproc) sudo make install # Update dynamic library links sudo ldconfig
Download and Compile Snort 3#
# Create build directory mkdir ~/snort3_build && cd ~/snort3_build # Download Snort 3 source code git clone https://github.com/snort3/snort3.git cd snort3 # Configure build options ./configure_cmake.sh --prefix=/usr/local --enable-tcmalloc cd build # Compile and install make -j$(nproc) && sudo make install
Step 4: Configure Environment Variables#
# Add Snort 3 to PATH echo 'export PATH=/usr/local/bin:$PATH' >> ~/.bashrc source ~/.bashrc # Verify installation snort -V
-
pip install openai
Usage#
-
Place the pcap files in the specified input directory.
-
Run
run_snort.py
, perform batch detection on the pcap, and parse the log content to extract threat information into the specified json file. -
Run
extract_tcp_streams.py
, first useSplitCap.exe
(https://www.netresec.com/?page=SplitCap
) to perform TCP bidirectional stream splitting on the pcap, then retrieve the packets with alerts from the json file, calculate therule priority
,trigger frequency
, andprotocol consistency
for each packet in the entire stream, obtain theweighted average score
for each packet in the same stream, and select the alert of the highest-scoring packet as the alert for that stream. This is to align Snort's packet detection mechanism with the flow detection mechanisms of most models, thus obtaining the split pcap packets and their corresponding json files, for example:{ "timestamp": "09/06-01:18:41.206357", "gid": 1, "sid": 45015, "rev": 3, "description": "FILE-OTHER Jackson databind deserialization remote code execution attempt", "classification": "Attempted User Privilege Gain", "priority": 1, "protocol": "TCP", "src_ip": "192.168.56.1", "dst_ip": "192.168.56.11", "src_port": 1056, "dst_port": 8090, "pcap_path": "/Users/lnhsec/Desktop/NetAttackClassifier/extracted_streams/fastjson1224ldap/192.168.56.1_1056_to_192.168.56.11_8090.pcap", "weight_score": 0.8031363764158987, "priority_score": 1.0, "frequency_score": 0.47712125471966244, "protocol_score": 0.8, "alert_count": 2 }
-
Run
gpt_analysis.py
, extract important traffic information from the above json file, construct the corresponding prompt, and submit it to gpt to map it to the tactics in the MITRE ATT&CK framework, obtaining the corresponding json result, for example:{ "attack_tactics": { "TA0006": "T1110" }, "rule_trigger_reason": "An attempted Oracle login with a suspicious username triggered a misparsed login response alert", "attack_type": "Credential Brute Force Attempt", "behavior_pattern": "Suspicious username used in login attempt to Oracle server via TCP/1521", "threat_level": "medium", "suggested_action": "1. Investigate source IP 192.168.1.20 for compromise 2. Review Oracle account activity 3. Enforce strong authentication policies", "confidence": 0.9, "dynamic_behavior": { "network": { "ip": "192.168.1.20", "dns": "" } }, "pcap_path": "/Users/lnhsec/Desktop/NetAttackClassifier/extracted_streams/msf_oracle_sid_brute/192.168.1.20_1521_to_192.168.1.14_34055.pcap" }
The prompt is as follows:
def construct_prompt(key_info): prompt = """Analyze network security events based on Snort alerts, requirements: 1. Strictly follow the MITRE ATT&CK tactic framework classification 2. The attack_tactics field must include relevant TA and T numbers 3. Each TA number must correspond to specific and correct T technique numbers """ prompt += "Analyze the following Snort alert information and return in JSON format:\n" for entry in key_info: prompt += ( f"Time: {entry['timestamp']}, Alert ID: {entry['sid']}:{entry['gid']}:{entry['rev']}, " f"Description: {entry['description']}, Classification: {entry['classification']}, " f"Priority: {entry['priority']}, Protocol: {entry['protocol']}, " f"Source IP: {entry['src_ip']}:{entry['src_port']}, Destination IP: {entry['dst_ip']}:{entry['dst_port']}\n" ) prompt += '''\nPlease answer the following questions and return in JSON format: { "attack_tactics": {"TAxxxx": "Txxxx"}, "rule_trigger_reason": "xxx", "attack_type": "xxx", "behavior_pattern": "xxx", "threat_level": "low/medium/high/critical", "suggested_action": "xxx", "confidence": 0.9, "dynamic_behavior": { "network": {"ip": "xxx", "dns": "xxx"} } }''' prompt += "\nPlease strictly return content that conforms to JSON format, do not include other explanatory text, only return the JSON string." print(f"Constructed DeepSeek API input content: {prompt}") return prompt
-
Finally, classify the corresponding files based on tactics or
attack_type
.
Future Directions#
- Rule Library Optimization: Due to the different types and versions of rule libraries, many known malicious pcaps have not been detected by Snort, leading to many missed or false positives.
- Detection Mechanism Improvement: The correspondence between packet detection and flow detection methods needs further refinement to see if there are better approaches.
- Performance Optimization: The time cost of calling the gpt api for mapping is relatively high. I have tried the deepseek-v3 model (faster, but with some inaccuracies) and the deepseek-r1 model (very slow, but more accurate). I will consider whether to add some knowledge base in the future and use other faster models for mapping.