Explore Machine learning by creating a custom network security tool ingesting only PCAP files and PCAP generated data.
As a network security analyst I have found myself in remote locations attempting to perform analysis on either to little useful data or to much ambiguous data. In either case, the customer still expects quick, fast, and quality results. My goal is to create a program that relies only on PCAP data and data that can be generated from PCAP files to make educated guesses on the “Goodness” and “Badness” of network traffic. This will include teaching the program using know Good and Bad examples.
To date, I am working on an outline for the program. I intend to make the program modular and I am thinking of using the network stack to communicate between the modules from the beginning to allow for modules to be moved from machine to machine as the load grows.
- Input Module
- PCAP data should be able to be read in line by line in the event that pcap is being written from live capture. It should also be able to be read in from whole files and should be able to accept tags for data labeling for known good and known bad.
- Folder location and data group are created.
- PCAP Processing Module
- Made to be modular.
- Run the raw PCAP data through each sub-module to create data.
- Data Sanitation Module
- Turn the data from the PCAP Processing Module into data that can be used for machine learning.
- Tag the data.
- Machine Learning Module
- This is where the real challenges will start. I am not this far yet in my studies.
- Stored Data
- Known GOOD
- Known BAD
As I learn and being to write the program I will continue to post updates. I will be starting with Python as the base language because that is what I am familiar with but the project I am starting also requires R. I am also not going to limit the project to a single or several languages.