The alleged NSA malware developers are at risk to be identified

108 points by yousry 9 years ago · 47 comments

Reader

This post seems lacking in the data required to make such a claim; I do not understand how it has gained so much traction.

Where is the actual research, and where are the probable identified candidates? Did I miss a data analysis part somewhere that explained the methodology, and probable attribution to actual people? This appears to be a basic string search of the code and some simple syntax analysis.

There are learning algorithms for stylometry, and they can probably be adapted to code. This article appears to state that "it might be possible to use these anomalies as clues", but does not elaborate on, how, why, or what any hypothesis is other than this.

exo762 9 years ago

Haven't analyzed author's claims, but in general programmer identification is solved problem:
https://www.youtube.com/watch?v=YMa04HovKfs [De-anonymizing programmers 32c3]
- akerro 9 years ago
  
  My first thoughts were about the demo you linked and about this one: https://www.youtube.com/watch?v=xipI-0HU010
- micaksica 9 years ago
  
  Awesome. Thanks for this. I missed this one.
CoryG89 9 years ago

Looks to me like the author is posting initial findings (and if I am reading this right, withholding some).
It doesn't look like a crazy amount of time/resources have gone in, but it looks like a basic proof of concept to me. Perhaps it will get the ball rolling and someone else who reads this will figure it out.

zigzigzag 9 years ago

However, in contrast to 3.5 billions Internet users, only a few hundred experts have to be identified.

This is the sentence that lets you know the post can be safely ignored. Anyone who thinks there are only a few hundred people in the world capable of writing Linux exploits doesn't have a grip on the scale of the world at all.

rincebrain 9 years ago

The assertion was not that there are only a few hundred people, but that the organization responsible for this software employed at most a few hundred people to write it.
(There are other problems with the article's conclusions absent the data they withhold, but I don't agree that this is one of them.)
e: Actually, on further reflection, neither your interpretation of their statement nor mine is a reasonable conclusion, so I now agree with you that this is a flaw in their argument.
- zigzigzag 9 years ago
  
  But that isn't the approach the article takes - it tries to narrow down the list of possible authors from public data, not identify employees of organisations that may have a few hundred hackers.
CoryG89 9 years ago

Perhaps it's possible to limit the search space by also looking only at experts likely (or possibly) have worked with the US government or NSA in the past or present. Then maybe you could get the list down to a reasonable number? For example, any experts that have never been to the US for extended periods of time can probably be excluded.
- zigzigzag 9 years ago
  
  How would you know? I've encountered at least one person who was without a doubt ex-GCHQ but didn't identify that anywhere.
micaksica 9 years ago

Agreed. There are probably 5-25K (yes, large range, but still order of magnitude higher) people in the Bay Area alone that are capable of writing exploits.
- mseebach 9 years ago
  
  Also, there's a huge difference in the number of people capable of secretly building exploits alone in their bedrooms at night (probably committing a crime), and those building them as a day job, where you can solicit feedback and advice from peers, reference well-organised documentation and study the original source code of previously successful exploits and freely discuss ideas and approaches with colleagues over lunch.
  Which of course partially challenges this assumption in the article:
  The developers of the malware [..] were discovered and not trained.
  - lawnchair_larry 9 years ago
    
    people capable of secretly building exploits alone in their bedrooms at night (probably committing a crime)
    No, that isn't how exploit research works. I don't understand why one would think that writing exploits is associated with being a criminal.
    
    mseebach 9 years ago
    
    Research, no, but turning it into malware is.
    
    lawnchair_larry 9 years ago
    
    Do you consider exploits to be malware? If so, then no, you couldn't be more wrong.

yousryOP 9 years ago

I'm currently working on anomaly detection algorithms and used the good opportunity (the Shadow Brokers release) to analyze a number of malware applications at once.

bitxbitxbitcoin 9 years ago

I'd love to see your results once you're ready to share them!

matt_wulfeck 9 years ago

The author appears to run "strings" on the binaries and then goes on to shoot a few theories in the dark:

> The developers of the malware are leading experts in the area of Linux, Network and Security development.

> They were discovered and not trained.

> Because the archive contains a collection of applications, the calculated result-set is reasonable small for further investigations.

drvdevd 9 years ago

Also:
> LinkedIn will show you the professional discipline, GitHub the shared libraries and their publicity.
I would guess that NSA has a firm grasp on this sort of basic OSINT problem and code attribution techniques.
- wjnc 9 years ago
  
  Retroactively scrubbing a programmers published work and social media participance is a red flag in itself.
  - dogma1138 9 years ago
    
    Indeed from what we also know or is suspected at least this is a group which is external to the NSA.
    It could consist of former NSA employees and military personnel but it's not clear if this is a fully sanctioned group or just really good hackers for hire.
- lostboys67 9 years ago
  
  Like many NSA or GCHQ developers will have a public account on github

alfiedotwtf 9 years ago

After seeing this post, the malware devs may have unfollowed/unstarred the repos used in order to evade discovery.

It would have been interesting to have GitHub's star/follow history...

andruby 9 years ago

Github has a comprehensive open dataset [1]. I'm not sure if it keeps historical data, but I'm sure there are people hitting the API's and keeping the data archived :)
[1] https://www.githubarchive.org/

carlsborg 9 years ago

Nice forensic analysis and tutorial.

Note that parsing out strings from a binary and finding names from it gives you mainly false positives. e.g. from glibc

https://fossies.org/dox/glibc-2.24/C-identification_8c_sourc...

pulse7 9 years ago

TLDR: Assumptions: "The developers of the malware are leading experts in the area of Linux, Network and Security development." and "They were discovered and not trained."

sschueller 9 years ago

Why is it a problem if they are identified? It is probably the only case where writing Malware doesn't get your in trouble with the government because they paid you to do it.

avh02 9 years ago

a naive question: would sending this code through an obfuscater not mess up this methodology? (other than lib identification)

It clearly hasn't happened here, but wouldn't that be a reasonable step to cover tracks given this kind of analysis?

Settings

The alleged NSA malware developers are at risk to be identified

Keyboard Shortcuts