Deriving Malware Context Requires Human Analysis

Man versus machine is one of the oldest technology tropes. In the modern tech economy, it represents one of the largest driving forces in many industries in which processes are streamlined by the inclusion of robotics and automated processes. For the threat intelligence industry, the automated malware sandbox represents the machine that has been put in place to replace the work done by analysts. However, while producing high quality threat intelligence can be enhanced with the inclusion of some automation, completely replacing the human aspect greatly impacts the quality of your analysis.

The automated sandbox provides a snapshot of a malware’s behavior—what it does and how—but it often leaves out important context such as why. Another way to describe this is to consider much of what a sandbox collects as quantitative data that lacks qualitative explanation. Quantitative characteristics of indicators include facts such as the type of indicator (URL, IPv4 Address, etc.) while qualitative characteristics provide insight into the role this indicator plays in the malware’s lifecycle and botnet infrastructure. It is these qualitative characteristics that provide the most insight into how the malware operates and how organizations leveraging threat intelligence can mitigate the threat.

For example, even the longest-lived malware families and types can be subject to sudden change at the whim of a threat actor. The characteristics and traits that represent established indicators for a certain malware type can change overnight. When a change like this takes place, automated sandboxes will not produce the expected analysis results. If these results do not match existing rules, the machine may not know that something bad will come of running that application. This may allow new malware binaries to slip past automated defenses.

However, having humans have a greater ability to identify unwanted behavior even if that behavior does not match any known rules. In these cases, an analyst can know an application is hostile and define what makes it hostile even if the malware has not been previously defined.

Identifying these qualitative characteristics can be a complex task. The process by which this definition takes place must consider the unique context of every malware sample analyzed while at the same time provide a consistent framework for identifying the role each associated indicator plays in a malware’s lifecycle. PhishMe’s malware analysis is driven by human beings who manipulate the malware’s execution within a specialized environment. This human-driven analysis process gives PhishMe analysts an intimate and contextual understanding of the malware’s lifecycle.

Having analysts involved in this process means that communications between malware samples and their supporting infrastructure are subject to scrutiny in real-time. This in turn means that analysis results include a one-to-one parity between observations of a malware’s behavior and its use of supporting infrastructure. This has two implications. First, it allows for the detailed classification and qualification for a malware’s infrastructure. Secondly, it reduces the incidence of false positives since each quantitative indicator is matched to a behavior adding a vetting process to malware analysis.

Given the controlled nature of PhishMe’s analysis, it is easy to construct a distinct ontology for each malware sample based on the parity that can be drawn between infrastructure usage and resulting behavior. It is this understanding of cause-effect relationships that provides the context for categorizing the qualitative characteristics of malware indicators. Those characteristics, vetted by human analysts form the core of the rich intelligence provided by PhishMe.

CERT Researchers Examine Domain Blacklists

After researching everything you want to know about domain blacklists, Jonathan Spring and Leigh Metcalf – two members of the technical staff at the CERT Division of Carnegie Mellon University’s Software Engineering Institute – performed an additional analysis and case study on the Domain Blacklist Ecosystem.

Their research supports a hypothesis regarding how the difference in the threat indicators available from a range of different sources is related to sensor vantage and detection strategy. To facilitate this, they required a source of intelligence that varied the detection strategy without changing the sensor vantage.

University research continues to play an important role in how we develop and deliver our threat intelligence services today. As such, we are very pleased to assist Jonathan and Leigh in their on-going analysis of the cyber threat landscape and the intelligence being leveraged to protect networks, employees, and data from threat actors.

An indicator detection process enables us to specify whether the network touchpoint is a mail sender, an initial infection vector, or a location derived during malware runtime. Our intelligence feed further specifies how IP addresses, domains, and URLs are being used in support of an attack. This provides insight into where overlap is occurring and if components are being used for multiple purposes, both of which were key aspects of the CERT analysis.

PhishMe’s Indicators

Compared to 26 domain-based lists and 53 IP-address-based lists provided by other threat intelligence providers, we reported unique threat indicators 50% – 77% of the time.

Payload server:   77% unique
C2 server:           59% unique
Infection URL:     58% unique
Spam sender:     50% unique

Table 1: Sub-list intersections with all other indicator sources. (From CERT blog)

These data demonstrate that our threat intelligence exposes significant unique indicators while adding context and validity to duplicate indicators being collected from other sources. If a threat provider’s data have little overlap with 79 other blacklists, one should consider the applicability of those data. Are they stale? Are they regional? Do they apply to my business? Conversely, if a threat provider offered nothing unique, it would have little additive value. We believe this analysis demonstrates the ideal blend of confirmation and uniqueness of our data.

Bad Intelligence Is Costly Intelligence

Based on the premise that more is better, there was a rush over the past few years to collect as much threat intelligence as possible. However, it’s costly to analyze data on the way into security appliances to ensure that unreliable indicators are removed. It is even more expensive to filter and chase false positives triggered as a result of mediocre data sets. Choosing reliable providers that facilitate an effective response is therefore critical. The Ponemon Institute recently calculated that it costs the companies they surveyed $1.2M per year in time wasted chasing false positives. The Ponemon chart below shows that companies don’t even respond to most of the alerts that are generated – information overload is another problem altogether.

Chart 1: 2015 Ponemon Institute Cost of Data Breach Study

Data Quality

We filtered out benign domains, IP Addresses, and URLs during our malware and phishing analysis. This is one reason why you see less overlap between our intelligence and that of other sources. The high-signal aspect of our intelligence service makes it a viable source for automated rules designed for blocking network communication and escalating events. Furthermore, while the spam sender’s IP is useful for forensics, we don’t recommend automating actions using this indicator.

We use the MITRE STIX Campaign definition as the primary way of publishing threat intelligence in machine-readable format, including impact scores for each element. The full campaign file contains a rich set of vetted indicators collected using a combination of proprietary analytics and malware analyses. Portions of threat intelligence service are published in formats optimized for SIEMs and other security appliances. We also provide the intelligence in JSON format for data scientists and the data hungry among us.

From Research to Production

The CERT analysis required a multi-faceted detection strategy with structured reporting of malware campaigns. This same approach is critical to deriving threat intelligence that is reliable, consumable, and contextual – all requirements for InfoSec teams relying on more automation to keep up with increasing volumes of incidents and alerts. It’s much easier to respond when you know what caused an alert or what’s at the other end of a network request. Similarly, finding value in threat intelligence is much easier after finding the right source of threat intelligence.

DNS Abuse by Cybercriminals – RATs, Phish, and ChickenKillers

This week in our malware intelligence meeting, our analysts brought up DNS abuse by cybercriminals. Two malware samples were seen this week which had the domain “chickenkiller.com” in their infrastructure.

I thought this sounded familiar, but my first guess was wrong.  Chupacabra means “goat sucker” not “chicken killer”.  So, we did a search in the PhishMe Intelligence database and were surprised to see not only that “chickenkiller.com” was used in two different malware samples in the past week, but that there were also more than sixty phishing sites that linked to that domain!

What we’re seeing here is a combination of “Free subdomains” and “Dynamic DNS.”

The Anti-Phishing Working Group reports on the use of Subdomain Services for Phishing in its twice yearly Global Phishing Survey.  In their last report, released on May 27, 2015, they found that free Subdomain services were used for phishing in approximately 6% of all reports.  About half (49.5%) of all those occurrences involve DNS abuse by cybercriminals, specifically, free “altervista.org” subdomains.

PhishMe’s Phishing Operations team would certainly agree that Altervista.org hosts a large quantity and variety of phishing subdomains!  Already in 2015, we’ve seen altervista.org used in eleven different malware campaigns delivered via spam email, the majority of which distributed fake antivirus software and CryptoLocker ransomware. Additionally, 724 phishing sites on 424 different hostnames have been identified. Those phishing sites spoof 42 different online brands, and all are freely provided by Altervista.org.

When a “Free subdomain” is provided, it just means that rather than registering your own domain name and having to pay for it, you can add a hostname to an existing domain name that the free subdomain provider is giving out.  Often the quid pro quo for the free subdomain is that advertising may appear on the website that offers the free service.

Dynamic DNS

“Dynamic DNS” is something else.  For various reasons, people may want to have a name for their computer which follows them wherever they go.  This is common, for instance, with the online gaming community.  If I’d like my fellow gamers to be able to use a gaming server on my computer and I have DHCP, it is possible that my IP address might change from time to time. I could therefore register my computer with a Dynamic DNS service.  If I were to register a box for gaming, I may name it something like “GaryGamingBox.hopto.org”.   Each time my computer came online, it would reach out to the Dynamic DNS service at “hopto.org” and let that Dynamic DNS service know my current IP address.  The Dynamic DNS service would then publish a record so that anyone looking for “GaryGamingBox.hopto.org” would know my current IP address and could play a game.

While the service is valuable, it is open to DNS abuse by cybercriminals.  Rather than having to risk exposing their identity by purchasing a domain name, cybercriminals can set up a phishing site on a laptop computer, link that computer to a Dynamic DNS service, and visit a nearby Internet café or hack someone’s Wi-Fi and connect anonymously to the Internet.  The problem is also very common with cybercriminals who run a class of malware called Remote Administration Trojans or RATs.

In June of 2014, there was a great deal of controversy when the Microsoft Digital Crimes Unit disrupted two very large Remote Administration Trojan groups which they called Bladabinid (more commonly known as njRAT) and Jenxcus (better known as H-Worm.)

In order to disrupt the RATs, the Microsoft Digital Crimes Unit obtained a court order allowing them to seize control of the Dynamic DNS service Vitalwerks Internet Solutions, d/b/a NO-IP.com.  While the seizure was quickly reversed due to public outcry, the truth remained that many hacking websites and documents on how to set up your own RAT begin with instructions on how to link your Botnet Controller to a Dynamic DNS service.

The “builder” that lets a malware author create his own customized RAT prompts the criminal for the hostname that an infected victim should “call back” to in order to provide the Botnet criminal with remote control of the targeted machine.  These RATs are used for a variety of purposes, including in many cases, controlling the webcam and microphone of the victim which can lead to “sextortion” and blackmail.

ChickenKiller?

While the Microsoft takedown and the APWG report identify many of the most popular domain names used for Dynamic DNS, ChickenKiller.com is a gateway to a much larger and more varied community.  When we visit “ChickenKiller.com” we are provided with this screen, informing us that ChickenKiller.com is one of the 90,000 Free DNS domains operated by Afraid.org, currently serving 3.7 million subdomains and processing 2,000 DNS queries per second.

The Afraid.org domain list provides 91,647 domains that users can choose to host their free subdomain.  Since they are ordered by popularity, we checked the most popular ones against our phishing database:

mooo.com = 21 phishing campaigns, the most recent of which was a Wells Fargo phish wellsfargo.com-login-online.mooo.com. Others included Poste Italiene, Paypal, Carta Si, Bank of America, QuickBooks (Malware), Netflix, and Banco de Reservas.

chickenkiller.com = 59 phishing campaigns for a variety of brands, most recently Poste Italiane and Taobao.

us.to = 311 phishing campaigns, most of which were Paypal related, including some PayPal phishing campaigns from today on info-limit.us.to. Others included Facebook (warnku.us.to) and National Australia Bank.

strangled.net= 10 phishing campaigns, most recently a PayPal phish on www.paypal.service.com.strangled.net, but also Apple, Sicredi, Visa, MasterCard, and Taobao.

crabdance.com = 8 phishing campaigns, most recently an Apple iTunes phish.

info.tm = 75 phishing campaigns, including a Paypal phish from this week, paypal-serviced.info.tm and paypal.verfield.info.tm

While many of the phishers are taking advantage of Afraid.org’s offer of “Free subdomain AND domain hosting!” others are being more subtle with their use of the free services.  For example, a recent Paypal phisher used the host “pplitalyppl.chickenkiller.com” in order to avoid having the true location of his phishing site shared in the spam emails that he was sending.  The spam contained the ChickenKiller link, which had a simple PHP forwarder that redirected the user to the phisher’s hacked website in the Netherlands.  In other cases the phishing page is on a “normal” hacked website, but the ACTION script that processes the stolen credentials, usually emailing them to a criminal, is hosted on a Free or Dynamic DNS subdomain.

The bottom line is that business customers need to be aware of DNS abuse by cybercriminals. Free subdomain and dynamic DNS services are often used by criminals for their Trojans AND their phishing pages.  These types of domains are also fairly unlikely to be used for legitimate B2B purposes, so their presence in your log files are likely to be highly suspect.  Also, be aware that Afraid.org is a white hat hacking group.  Josh Anderson who runs a wide variety of interesting DNS services at that site, hates to have his domains abused as much as anyone else.  If you see a suspicious subdomain address and the nameservers are set to “NS1.AFRAID.ORG” be sure to report it by emailing “abuse@afraid.org”. It could be yet another case of DNS abuse by cybercriminals.

Forget About IOCs… Start Thinking About IOPs!

For those who may have lost track of time, it’s 2015, and phishing is still a thing. Hackers are breaking into networks, stealing millions of dollars, and the current state of the Internet is pretty grim.

We are surrounded with large-scale attacks, and as incident responders, we are often overwhelmed, which creates the perception that the attackers are one step ahead of us. This is how most folks see the attackers, as being a super villain who only knows evil, breathes evil, and only does new evil things to trump the last evil thing.

This perception leads to us receiving lots of questions about the latest attack methods. Portraying our adversaries as being extremely sophisticated, powerful foes makes for a juicy narrative, but the reality is that attackers are not as advanced as they are made out to be.

Disrupting an Adware-serving Skype Botnet

In the early days of malware, we all remember analyzing samples of IRC botnets that were relatively simple, where the malware would connect to a random port running IRC, joining the botnet and waiting for commands from their leader. In this day and age, it’s slightly different. Whereas botnets previously had to run on systems that attackers owned or had compromised, now bots can run on Skype and other cloud-based chat programs, providing an even lower-cost alternative for attackers.

Surfing the Dark Web: How Attackers Piece Together Partial Data

The recent Carefirst breach is just the latest in a rash of large-scale healthcare breaches, but the prevailing notion in the aftermath of this breach is that it isn’t as severe as the Anthem or Premera breaches that preceded it. The thinking is that the victims of this breach dodged a bullet here, since attackers only accessed personal information such as member names and email addresses, not more sensitive information like medical information, social security numbers, and passwords. However, attackers may still be able to use this partial information in a variety of ways, and a partial breach should not be dismissed as trivial.