RFC: Rust Has Provenance

55 points by killcoder 2 years ago · 18 comments

Reader

Provenance In C and Rust, essentially mean that each object, lives in its own memory space, and you cant travel between them. You cant use a pointer to one object, add an offset and get a valid pointer to another object.

This is key to a lot of compiler optimizations. It is de facto in C compilers, and therefor in a lot of compiler infrastructure used by other languages. There is an attempt to make it more clearly defined in the C standard:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3005.pdf

ralfj 2 years ago

It's not just per-object. When you consider things like `restrict` pointers in C, and the aliasing model in Rust (e.g. Stacked Borrows), you have provenance distinctions even within a single allocated object.

vaylian 2 years ago

> The pointer's "provenance" says where in memory the pointer is allowed to access when.

Is that the definition of provenance or is there a wider definition that we should know?

DougBTX 2 years ago

See the doc for a definition: https://github.com/RalfJung/rfcs/blob/provenance/text/0000-r...
aredox 2 years ago

"where in memory the pointer is allowed to access when." - ... when what?
Sorry, this sentence is difficult to understand.
- asplake 2 years ago
  
  ‘when’ here meaning “and under what circumstances”. An ‘and’ might help
- _nalply 2 years ago
  
  Probably just a mistake. I just would assume that the word "when" is superfluous and respond in a way that allows the asker to correct their mistake but not invest too much in an answer because the assumption might be wrong.
  I am picky in language as well, so I understand where you are coming from. But I discovered that LLMs often answered anyway when my question was incomplete or contained mistakes and they got it correct not too rarely.
  It's a useful trick I discovered late in my life (I am in my fifties). I started to use it with my children and my wife. Their utterances are often not mathematically and linguistically perfect, but very human and lovely. When I got stumped I just thought about what they might have asked and answered that. More often than not they were happy with my answer.
  - foldr 2 years ago
    
    It’s just a multiple wh construction as in
    Who arrived when?
    which you might answer with e.g. “John on Tuesday and Mary on Wednesday”. This one is a bit harder to parse because it’s longer, but the answers will be something like “it can access region A at time T, region B at time T’, …”
    
    tialaramex 2 years ago
    
    Right, in particular objects (in the general sense, not the OOP sense) have lifetimes and so the provenance of a pointer to or into that object is restricted to the lifetime.
    If I make a Box<Goose> some heap is allocated and my Goose is in there, and I can get myself a pointer to that Goose, but the pointer must not be used either before I made the Box<Goose> or after it's dropped.
    
    aredox 2 years ago
    
    Thanks!I really couldn't recognise it was this construction here!
Buttons840 2 years ago

Isn't this a definition of lifetimes too?
A lifetime is some set of memory a reference may refer to. Consider:
fn foo<'a, 'b>(x: &'a str, y: &'b str) -> &'a str
Aren't 'a and 'b sets of memory a reference can refer to? As far as this function call is concerned, both 'a and 'b will live throughout, so it's not about life and death, it's about what memory the references may refer too.
Is a lifetime and a "provenance" the same thing?
- Rusky 2 years ago
  
  Provenance is a dynamic property of pointer values. The actual underlying rules that a program must follow, even when using raw pointers and `unsafe`, are written in terms of provenance. Miri (https://github.com/rust-lang/miri) represents provenance as an actual value stored alongside each pointer's address, so it can check for violations of these rules.
  Lifetimes are a static approximation of provenance. They are erased after being validated by the borrow checker, and do not exist in Miri or have any impact on what transformations the optimizer may perform. In other words, the provenance rules allow a superset of what the borrow checker allows.
  - Buttons840 2 years ago
    
    So this is about solidifying very specific semantics that are of more concern to projects like Miri than it is to regular users?
    Speaking of Miri, is the long term goal to say for certain whether or not a program execution encountered UB? (Which is, of course, different than verifying it before execution at compile time.)
    
    Rusky 2 years ago
    
    It is about solidifying semantics, but those semantics are still a concern to regular users who are writing `unsafe` code, not just Miri.
    As I understand it, Miri would like to be certain about the presence/absence of the UB of any particular program execution, but there will probably always be some cases it can't catch.
    
    digama0 2 years ago
    
    > Speaking of Miri, is the long term goal to say for certain whether or not a program execution encountered UB? (Which is, of course, different than verifying it before execution at compile time.)
    That's more than a long term goal, that's the present behavior, to the extent that the UB rules are defined in the first place. Miri will tell you when it has to make approximating assumptions (e.g. accessing FFI or using nondeterministic operations), and it doesn't happen very often. This is very much the intent of the tool.
- Georgelemental 2 years ago
  
  > Aren't 'a and 'b sets of memory a reference can refer to?
  No, the set of memory the reference refers to is encoded by the reference's bits. Two references with the same lifetime can refer to different memory, and vice-versa. Lifetimes determine when the reference is valid.
  - Buttons840 2 years ago
    
    I was speaking more abstractly.
    Strictly speaking, a pointer points to 1 address, not a set.
    Abstractly speaking, a lifetime represents a set of memory that lives a certain length of time, and thus also represents the set of memory a reference with that lifetime can reference.
    It's just a different way of thinking about lifetimes which helped me understand them.
    
    Georgelemental 2 years ago
    
    > a pointer points to 1 address, not a set.
    Rust "fat pointers" like `&str` contain a length as well.
    > represents the set of memory a reference with that lifetime can reference.
    No, you can have memory that is alive for a particular lifetime but that you are not allowed to access via a pointer with said lifetime (because it's out of bounds).

Settings

RFC: Rust Has Provenance

Keyboard Shortcuts