The alleged NSA malware developers are at risk to be identified
yousry.deThis post seems lacking in the data required to make such a claim; I do not understand how it has gained so much traction.
Where is the actual research, and where are the probable identified candidates? Did I miss a data analysis part somewhere that explained the methodology, and probable attribution to actual people? This appears to be a basic string search of the code and some simple syntax analysis.
There are learning algorithms for stylometry, and they can probably be adapted to code. This article appears to state that "it might be possible to use these anomalies as clues", but does not elaborate on, how, why, or what any hypothesis is other than this.
Haven't analyzed author's claims, but in general programmer identification is solved problem:
https://www.youtube.com/watch?v=YMa04HovKfs [De-anonymizing programmers 32c3]
My first thoughts were about the demo you linked and about this one: https://www.youtube.com/watch?v=xipI-0HU010
Awesome. Thanks for this. I missed this one.
Looks to me like the author is posting initial findings (and if I am reading this right, withholding some).
It doesn't look like a crazy amount of time/resources have gone in, but it looks like a basic proof of concept to me. Perhaps it will get the ball rolling and someone else who reads this will figure it out.
However, in contrast to 3.5 billions Internet users, only a few hundred experts have to be identified.
This is the sentence that lets you know the post can be safely ignored. Anyone who thinks there are only a few hundred people in the world capable of writing Linux exploits doesn't have a grip on the scale of the world at all.
The assertion was not that there are only a few hundred people, but that the organization responsible for this software employed at most a few hundred people to write it.
(There are other problems with the article's conclusions absent the data they withhold, but I don't agree that this is one of them.)
e: Actually, on further reflection, neither your interpretation of their statement nor mine is a reasonable conclusion, so I now agree with you that this is a flaw in their argument.
But that isn't the approach the article takes - it tries to narrow down the list of possible authors from public data, not identify employees of organisations that may have a few hundred hackers.
Perhaps it's possible to limit the search space by also looking only at experts likely (or possibly) have worked with the US government or NSA in the past or present. Then maybe you could get the list down to a reasonable number? For example, any experts that have never been to the US for extended periods of time can probably be excluded.
How would you know? I've encountered at least one person who was without a doubt ex-GCHQ but didn't identify that anywhere.
Agreed. There are probably 5-25K (yes, large range, but still order of magnitude higher) people in the Bay Area alone that are capable of writing exploits.
Also, there's a huge difference in the number of people capable of secretly building exploits alone in their bedrooms at night (probably committing a crime), and those building them as a day job, where you can solicit feedback and advice from peers, reference well-organised documentation and study the original source code of previously successful exploits and freely discuss ideas and approaches with colleagues over lunch.
Which of course partially challenges this assumption in the article:
The developers of the malware [..] were discovered and not trained.
people capable of secretly building exploits alone in their bedrooms at night (probably committing a crime)
No, that isn't how exploit research works. I don't understand why one would think that writing exploits is associated with being a criminal.
Research, no, but turning it into malware is.
Do you consider exploits to be malware? If so, then no, you couldn't be more wrong.
I'm currently working on anomaly detection algorithms and used the good opportunity (the Shadow Brokers release) to analyze a number of malware applications at once.
I'd love to see your results once you're ready to share them!
The author appears to run "strings" on the binaries and then goes on to shoot a few theories in the dark:
> The developers of the malware are leading experts in the area of Linux, Network and Security development.
> They were discovered and not trained.
> Because the archive contains a collection of applications, the calculated result-set is reasonable small for further investigations.
Also:
> LinkedIn will show you the professional discipline, GitHub the shared libraries and their publicity.
I would guess that NSA has a firm grasp on this sort of basic OSINT problem and code attribution techniques.
Retroactively scrubbing a programmers published work and social media participance is a red flag in itself.
Indeed from what we also know or is suspected at least this is a group which is external to the NSA.
It could consist of former NSA employees and military personnel but it's not clear if this is a fully sanctioned group or just really good hackers for hire.
Like many NSA or GCHQ developers will have a public account on github
After seeing this post, the malware devs may have unfollowed/unstarred the repos used in order to evade discovery.
It would have been interesting to have GitHub's star/follow history...
Github has a comprehensive open dataset [1]. I'm not sure if it keeps historical data, but I'm sure there are people hitting the API's and keeping the data archived :)
Nice forensic analysis and tutorial.
Note that parsing out strings from a binary and finding names from it gives you mainly false positives. e.g. from glibc
https://fossies.org/dox/glibc-2.24/C-identification_8c_sourc...
TLDR: Assumptions: "The developers of the malware are leading experts in the area of Linux, Network and Security development." and "They were discovered and not trained."
Why is it a problem if they are identified? It is probably the only case where writing Malware doesn't get your in trouble with the government because they paid you to do it.
a naive question: would sending this code through an obfuscater not mess up this methodology? (other than lib identification)
It clearly hasn't happened here, but wouldn't that be a reasonable step to cover tracks given this kind of analysis?