Like a lot of people, unfortunately we get a LOT of spam. I thought it would be interesting to sort these into distinct groups and make some wordclouds , or more specifically spamclouds from the content of the spam.
The idea behind these spamclouds is that a quick glance draws your eye to the more dominant words, and also gives a sense of the relative importance of words used in each spam type.
I sorted the spam from around 3-6 months worth of data into 3 distinct groups as follows:
Phishing spam: These are emails claiming to be from a legitimate institution, such as the tax office, bank, ISP or credit union. They attempt to dupe victims into handing over their banking details, and other data that could be used in many forms of identity theft. For the purpose of this exercise I concentrated on emails purporting to be from Australian based institutions. The word Commonwealth stands out due to the fact that the Commonwealth Bank of Australia have been the target of a large amount of phishing attacks recently. Read more about this style of scam at the Scamwatch website here.
Money mule spam: These are emails that attempt to recruit people into become "money mules" for the purpose of laundering stolen funds. Often the victim believes they are partaking in legitimate activity, such as a new job as a transfer agent, where their pay is a small percentage of each 'transfer'. Read more about this style of scam here.
(click to enlarge)

Advance Fee Fraud spam: Also known (quite unfairly) as "Nigerian Scams" or "419" scams. Read more about this style of scam here.
(click to enlarge)

I was initially going to do medication/viagra spam as a category as well. However the words that are typically used in the majority of these emails are just so bizarre and nonsensical, that the spamcloud would probably be quite humorous, but not really useful.
Now, obviously the results will vary with different datasets and time periods, so please don't read too much into this piece of work, it's not overly scientific, but hopefully it is still useful and instructive to the public.
We recommend anyone thinking they (or someone they know) may have fallen for one of these scams to check out the Scamwatch website http://www.scamwatch.gov.au. This is a very useful resource for the public to learn about many types of scam, and is run by the Australian Competition and Consumer Commission (ACCC).
Since the Annual workshop in KL earlier in the year, I've been learning a lot about VOIP from Sjur Usken from the Norwegian Honeynet Chapter, and Sandro Gauci from Enable Security. Both of these guys are expert in the field of VOIP security, and we thank them for their assistance to the Australian Honeynet Project.
We've been testing a couple of different styles of VOIP honeypots (yes, phoneypots..). Presently we have one sensor in operation in the AU IP space, which is piloting. Plans are to increase the number, once techniques are matured and the tools are released by the authors.
We've seen some very interesting scanning of our phoneypot sensor during the pilot and the results will be posted shortly - so stay tuned for the following installments !
VOIP phoneynet : PART 2 "OBSERVATIONS OF THE VOIP PILOT THUS FAR"
VOIP phoneynet : PART 3 "WHAT WOULD CROOKS DO WITH A COMPROMISED VOIP GATEWAY ANYWAY?"
VOIP phoneynet : PART 4 "HOW BEST TO PROTECT AGAINST VOIP THREATS"
VOIP phoneynet : PART 5 "WHAT MIGHT THE FUTURE HOLD WITH VOIP SECURITY"
VOIP phoneynet : PART 6-n "TBA, what do you want ? , I'm taking requests at ben@honeynet.org.au "
This is an interesting area, and increasingly important as VOIP gets more popular, and is targeted by wrong-doers.
We feel that the general level of understanding of VOIP security and more-so malicious activity is relatively low, and that we need to increase this by getting a better view of the sort of malicious VOIP activity out there. This has been the driver behind this project. If you have any experience in VOIP honeynets, or actual incidents (anecdotal, or specific), please feel free to contact us.
Following up on our quest for better data analysis techniques, I've been playing for a couple of weeks with a product called Circos. This useful tool (written by Martin Krzywinski) is designed to visualize two-dimensional tabular data, and was originally written to map genome relationships! However it is very extensible, so I put it to use on the SensorNET data.
The following maps are some of the many I've built with circos, and represent about 18 months worth of our data from the SensorNET. Once you get used to reading this style of map, they are quite trivial to understand but they can take some getting used to. If you don't 'get it' straight away, please stick with it and you will see the beautiful simplicity soon enough.
On the right hand side of the circle are our Australian based nepenthes sensors. These are the 'business end' of our SensorNET, as they collect network-borne malware, and also log where it comes from. The sensors are spread throughout the Australian IP space. On the left hand side is the attribute we are exploring, which in these examples is the attacking country, ASN, or MD5 of the malware. The color ribbons between the sensor and the attribute relate to the strength of the relationship. The thicker the ribbon, the more prominent the relationship.
Top 15 attacking countries
The first thing that jumps out here is the large proportion of attacks coming out of the Japanese IP space. This backs up the results of our geographic heatmap analysis. Note that all of the sensors almost without exception showed the most activity from Japan, and secondly from local Australian IP space.

Top attackers from overseas
This shows the top countries (excluding Australia). Note the dominance of Asian countries, in fact the top 6 countries are Japan, China, India, Taiwan, Cambodia and Mongolia, and only then comes the US. We are thinking this is attributable to the closeness and faster inter-country network links, and potentially to the 'closeness' in terms of IP space, particularly the first and second octet as some bots may scan local/adjacent subnets first, but we need to research this further.

Top 15-30 attacking countries
This is the top 15-30 attacking countries.

Most attacking ASN's
This shows the ASN (Autonomous System Number) network which attacked this set of sensors. You'll notice that one of our nodes (node 19) had a very busy time being attacked repeatedly by one particular ASN. We actually considered this an anomaly in terms of it's statistical relevance, and suspect the sensor may have been playing up, so this sensor/ASN pair was mostly dismissed from the rest of the analysis at this stage. I included this graph here simply to show how easily these anomalies show up with this visualization technique.

Most commonly seen malware
Here we see some of the more prevalent malware files being captured by the SensorNET. These are actually the files that will be infecting unprotected computers in Australia right now! I've abbreviated the MD5's to what I consider to be human readable for this graph.

For reference, here are some links to the Virustotal results for the top 5 pieces of malware (nominal name taken from F-Secure/Kaspersky)
Detection Result: 39 out of 40 Antivirus vendors (97.50%)
Detection Result: 38 out of 39 Antivirus vendors (97.44%)
Detection Result: 6 out of 36 Antivirus vendors (16.67%) ***
Detection Result: 40 out of 40 Antivirus vendors (100.00%)
Detection Result: 40 out of 40 Antivirus vendors (100.00%)
So there you have it, although a lot of network-borne malware has great AV coverage, some do not ***. This means that a 'defense in depth' strategy is still required. Do not rely solely on any one of the main technical security controls (Antivirus, Patching, Firewall) - use them ALL for the best protection !
What can you see in these diagrams ? Do you have any suggestions or observations ? let me know at ben@honeynet.org.au
One of our main focuses this year for the AHP is to work on how we present data efficiently and meaningfully.
I've turned to the Visualization field to learn how to present data in ways that can be understood, trends spotted, and outliers and anomalies identified. Armed with this, these topics can then be studied further, can answer questions, or give rise to new questions.
We are starting to understand how we can use a few tools now, particularly after the KL workshop (thanks Raffy and Sebastian for your help).
One obvious tool is cartographic heat mapping. We are all very used to the concept of heat gradients when we look at weather maps.
It is very useful to display data in this form to answer the question "Where ARE these things I'm interested in?, is there particular place they are more concentrated?"
Well, our good friend (and in fact, newest contributor) David Z helped me understand and install the gheat infrastructure, which seems to suit some of our needs fairly nicely. I like this application, it allows you to zoom into an area (just as you do in google maps), and the product then recalculates the heatmap for that perspective. It is quite interactive in that way, and can be used by non-geeks. Over the next few months we plan to make some data from this application available to interested parties in such an interactive way. Until then, I've got a few screenshots showing some early results.
This is a map of the locations of computers that are attacking our Australian SensorNET. One thing that stands out is that we seem to have a lot of activity from Japan. We are currently analyzing this, and if you attend Shaun's presentation at AusCERT2009 you'll learn more about this.





We hope to make more posts involved data visualization techniques this year. It's an important area for us.
If you have any suggestions or viz tools that you can recommend, please let us know at contact@honeynet.org.au
Great news, the Honeynet project is now a designated mentor in the upcoming "Google Summer of Code" . Congratulations goes to the chapters that worked on the submission.
Of course, it will be winter time in Australia soon - which is all the better to stay inside and code :)
The project is looking for skillful and enthusiastic coders who are interested in working on any of these projects .
This could be a great opportunity for an aspiring student. If this sounds like you, and you are up for a challenge, you need to get involved soon as application period closes on April 3 2009.
For full information, check out the main honeynet GSOC page.
If you need any additional information or want to ask questions, you can get in touch at project@honeynet.org or on IRC (#gsoc-honeynet on irc.freenode.net).
Best of luck.
The Annual Honeynet Workshop was held in Kuala Lumpur in February 2009. The event was sponsored and hosted at IMPACT's brand new facilities in the technology hub Cyberjaya, about 40 minutes drive south of KL city.
For more info on IMPACT (International Multilateral Partnership Against Cyber Threats), please visit there site http://www.impact-alliance.org/ .
This was our first annual conference, and its fair to say we were extremely impressed with the breadth and depth of skill sets that exist in the wider Honeynet Project, I'm really not sure where else such skills exist in such a collaborative environment. We found the workshop invaluable to meet other researchers face to face to discuss ideas and experiences. We learned some great things from the other chapters, as well as presenting to the group some of the projects that are working on in Australia.
Key topics for us were: Visualization techniques, VOIP honeypots, high Interaction honeypots, and the deployment of Global honeynets.
Thanks go to IMPACT, the core organizers, and all the other volunteers who participated. Special thanks to the local Malaysian teams, whose hospitality and friendliness was really nice :)
The excellent people at Internet Service Providers have recently been kind enough to lend us a hand with a few ISP services. Thanks ISP!
Visit them here .
Today we are happy to announce the release of an automated spam processing tool.
It will extract out all urls from an email, try to pick the correct sender of the email and then link the two together in a database.
Information such as geolocation and ASN is also collected and stored for both the sender and the url.
It has been working now for a few weeks but I'm sure someone will find something wrong with it, if so please let us know.
The code can be found in the tools section http://honeynet.org.au/?q=node/10
-Vlashef
Today we are very proud to announce that labyrinthdata.net.au has donated a server to the Australian Honeynet Project.
If you are looking for good Australian VPS hosting take a look at these guys!
http://honeynet.org.au/?q=node/2
-Vlashef
Hi all,
The latest version of the Tracking system has finally been released. The changes to version 1.0 are not too great.
This version allows us to decide which hostnames we want to track.
http://honeynet.org.au/?q=node/10
Enjoy.
-Vlashef