Following up on our quest for better data analysis techniques, I've been playing for a couple of weeks with a product called Circos. This useful tool (written by Martin Krzywinski) is designed to visualize two-dimensional tabular data, and was originally written to map genome relationships! However it is very extensible, so I put it to use on the SensorNET data.
The following maps are some of the many I've built with circos, and represent about 18 months worth of our data from the SensorNET. Once you get used to reading this style of map, they are quite trivial to understand but they can take some getting used to. If you don't 'get it' straight away, please stick with it and you will see the beautiful simplicity soon enough.
On the right hand side of the circle are our Australian based nepenthes sensors. These are the 'business end' of our SensorNET, as they collect network-borne malware, and also log where it comes from. The sensors are spread throughout the Australian IP space. On the left hand side is the attribute we are exploring, which in these examples is the attacking country, ASN, or MD5 of the malware. The color ribbons between the sensor and the attribute relate to the strength of the relationship. The thicker the ribbon, the more prominent the relationship.
Top 15 attacking countries
The first thing that jumps out here is the large proportion of attacks coming out of the Japanese IP space. This backs up the results of our geographic heatmap analysis. Note that all of the sensors almost without exception showed the most activity from Japan, and secondly from local Australian IP space.
Top attackers from overseas
This shows the top countries (excluding Australia). Note the dominance of Asian countries, in fact the top 6 countries are Japan, China, India, Taiwan, Cambodia and Mongolia, and only then comes the US. We are thinking this is attributable to the closeness and faster inter-country network links, and potentially to the 'closeness' in terms of IP space, particularly the first and second octet as some bots may scan local/adjacent subnets first, but we need to research this further.
Top 15-30 attacking countries
This is the top 15-30 attacking countries.
Most attacking ASN's
This shows the ASN (Autonomous System Number) network which attacked this set of sensors. You'll notice that one of our nodes (node 19) had a very busy time being attacked repeatedly by one particular ASN. We actually considered this an anomaly in terms of it's statistical relevance, and suspect the sensor may have been playing up, so this sensor/ASN pair was mostly dismissed from the rest of the analysis at this stage. I included this graph here simply to show how easily these anomalies show up with this visualization technique.
Most commonly seen malware
Here we see some of the more prevalent malware files being captured by the SensorNET. These are actually the files that will be infecting unprotected computers in Australia right now! I've abbreviated the MD5's to what I consider to be human readable for this graph.
For reference, here are some links to the Virustotal results for the top 5 pieces of malware (nominal name taken from F-Secure/Kaspersky)
So there you have it, although a lot of network-borne malware has great AV coverage, some do not ***. This means that a 'defense in depth' strategy is still required. Do not rely solely on any one of the main technical security controls (Antivirus, Patching, Firewall) - use them ALL for the best protection !
What can you see in these diagrams ? Do you have any suggestions or observations ? let me know at [email protected]
One of our main focuses this year for the AHP is to work on how we present data efficiently and meaningfully.
I've turned to the Visualization field to learn how to present data in ways that can be understood, trends spotted, and outliers and anomalies identified. Armed with this, these topics can then be studied further, can answer questions, or give rise to new questions.
We are starting to understand how we can use a few tools now, particularly after the KL workshop (thanks Raffy and Sebastian for your help).
One obvious tool is cartographic heat mapping. We are all very used to the concept of heat gradients when we look at weather maps.
It is very useful to display data in this form to answer the question "Where ARE these things I'm interested in?, is there particular place they are more concentrated?"
Well, our good friend (and in fact, newest contributor) David Z helped me understand and install the gheat infrastructure, which seems to suit some of our needs fairly nicely. I like this application, it allows you to zoom into an area (just as you do in google maps), and the product then recalculates the heatmap for that perspective. It is quite interactive in that way, and can be used by non-geeks. Over the next few months we plan to make some data from this application available to interested parties in such an interactive way. Until then, I've got a few screenshots showing some early results.
This is a map of the locations of computers that are attacking our Australian SensorNET. One thing that stands out is that we seem to have a lot of activity from Japan. We are currently analyzing this, and if you attend Shaun's presentation at AusCERT2009 you'll learn more about this.
We hope to make more posts involved data visualization techniques this year. It's an important area for us.
If you have any suggestions or viz tools that you can recommend, please let us know at [email protected]
Founded in Jan 2008, the Australian Honeynet Project supports the mission of understanding the tools, tactics and motives of those who represent a cyber threat. We share our findings with other security researchers and Law Enforcement authorities. We hope our activities will benefit Australian citizens who make legitimate use of the Internet. Our broad aim is to help make the Internet a safer place for end users.
Essentially the Australian Honeynet Project is a community based 'For Public Good' project, we currently have three core members who guide and maintain the project.
We also have 'contributors' who provide advice, intelligence, data and resources to the project. Contributors can participate for as long as they desire. We try to match contributor's skills and resources up with pieces of work we need done at the time.
Work on the project is done in our spare time and as private citizens. We do not receive wages or other payments for our work. We are not affiliated, guided by or indebted to any commercial or government entity - be they sponsors, contributors or employers. This is our most important ethos, because it allows us to maintain our independence to develop the project without outside pressures, and keep the public good as our primary concern.
Read more about us here.