Often the question of "who used it?" is just as salient as "who made it?" when dealing with attribution of malicious activity. Researchers at the Institute for Internet Security & Privacy are working across several fronts of the attribution problem set -- spanning activities such as cyber espionage, cyberattack, and cyber influence.
For example, pieces of malicious code are often the most common pieces of forensic evidence available to the security research community. However, a standalone binary by itself is not overly useful in driving attribution statements. This is because a standalone binary lacks context: it does not give away who wrote it, by whom it was controlled, who the infected victims were, or what the aim of the operation was that used the binary. A standalone binary does not even confirm in which operation it was used. What is lacking is context.
Georgia Tech researchers are working on large-scale collections of malware samples to posit relationships between binaries. With a sufficient sample size, machine-learning techniques can be applied. Current research builds upon the community’s state-of-the-art approach to attribution, in which code stylometry looks at stylistic features (i.e., white spaces, operators, literals, etc.) and author-created attributes (i.e., average number of characters per word, character count, use of special characters, punctuation, etc.). Our aim is produce credible links between a binary and a given set of binaries from the same cyber threat actor in a measurable way. We focus on the following domains to derive attribution inferences, and require multiple positive correlations between domains to produce results:
- String constants;
- Implementation traits;
- Custom features, and
- Infrastructure.
Another emerging research area is the attribution -- not just of cyber espionage or cyber attack, in which information theft or damage to a network or physical environment is the goal -- of the growing number of cyber-based information operations. Trolls are undertaking sophisticated operations to sway popular opinion, curb dissent, stir unrest, and instill fear. Regardless of what this type of activity is called – information operations, influence campaigns, cyber manipulation, "fake news," or irregular warfare – this asymmetric and increasingly dangerous activity is on the rise. Researchers at Georgia Tech are looking at ways to attribute this activity using Internet metadata and other fact-based log analysis techniques.