Settings

Theme

Native Reflection in Rust

jack.wrenn.fyi

277 points by jswrenn 3 years ago · 67 comments

Reader

davidhyde 3 years ago

Great writeup! The defmt logging crate uses a linker script to extract debug symbols so that you get nicely formatted stack traces on embedded systems. It works on linux, macos and windows. I wonder if the same technique can be applied to this project. It needs a runner though so may not be the right approach.

https://github.com/knurling-rs/defmt

olvy0 3 years ago

I've used very similar method, at work, to provide C++ "reflection" between my own system and a system from another team.

Basically, the other system is a dynamic library which sends and receives C structures from my application. Those structures are then mapped into a buffer that is supposed to have the same size and there are pointers with metadata pointing into the buffer that are supposed to be exactly like the struct elements. Those structures can have arbitrary complexity, and are passed around through type erasure (essentially char*).

I wrote a "reflection" code for the other team, which runs when they register the struct instance to be sent, checks if there's a matching PDB [0] around, reads it, and outputs a json including the metadata needed, which can then be used to define the structures' metadata on our side correctly.

This is all in C/C++ since in some contexts we have soft real-time requirements, else I would have used any of the many RPC frameworks available.

This has been working for several years now.

This is not a generic solution but it's good enough for in-house communication between 2 systems that are maintained by different parts of the organization, where the API between them, that like I said is based on passing around char* buffers, has been more or less set in stone a long time ago. Conway's law [1] and all that. Sigh.

[0] We are a Windows shop although the same thing should work with DWARF info, same as the OP library works. In fact he says "It may never work on Windows, which does not use DWARF to encode debug info" but I can say that the same approach does work on Windows, for C++ at least. The PDB format might be a tad undocumented, but its documentation has been improved in the last decade or so since I started working on my library. Writing some small test programs is enough to understand how to access it, if all you need is meta info on C-style structures. Other stuff is more... challenging. But it wasn't necessary for my use-case.

[1] https://en.wikipedia.org/wiki/Conway%27s_law

  • jagged-chisel 3 years ago

    Was the other team completely unwilling to provide a header?

    • olvy0 3 years ago

      Yes, they are willing, that wasn't the problem. The problem was that the consuming app on my side is historically metadata driven, and historically tries to avoid having to recompile when the interface changes. We do that by keeping the code generic and by reading the interface from a database. This leads to faster iterations. The problem rises when we have to interface with any other system which is not generic and has its interface defined in H files.

      Yeah, I know, it's our problem, not theirs. It's something I cannot fix on my own without a huge effort. I've tried pushing for it for more than a decade, and at some point my wish was sort of abducted by my boss' boss as an excuse to create a DSL [0]. This did solve some huge problems but also created many others. It didn't solve that char* / h file problem since it doesn't really have an FFI.

      [0] Domain specific language, custom-made for our own internal users. I've come to hate DSLs since I have to support that one, which never wanted.

jeroenhd 3 years ago

Does using DWARF info imply that this will break when you strip the resulting executable? I often strip my Rust binaries because it practically halves the application size, which can become quite a lot in a language where you're statically linking everything.

Regardless, quite an ingenious use of standard ELF features, I didn't think this would be possible in Rust without adding some kind of VM around reflection code.

  • jswrennOP 3 years ago

    Yes, unfortunately that's a tradeoff here. Rust does support splitting debug info into other files, but Deflect doesn't support loading split debuginfo yet.

  • HideousKojima 3 years ago

    C# has similar issues where they have to be conservative about what them trim from binaries for AoT in case it is used for reflection, so I imagine you'd run into the same issues for almost any compiled language you want to implement reflection for.

Animats 3 years ago

"When you call .reflect on a dyn Reflect value, deflect figures out its concrete type in four steps:"

* invokes local_type_id to get the memory address of your value’s static implementation of local_type_id

* maps that memory address to an offset in your application’s binary

* searches your application’s debug info for the entry describing the function at that offset

* parses that debugging information entry (DIE) to determine the type of local_type_id’s &self parameter.

This is a rather strange thing to bolt onto a language. I could see this as an external tool. The use case seems to be programs which used "async" so much they can't figure out the resulting state machine. External debug tools to view and examine the async state machine might be helpful.

My experience with Rust has been that debugging of safe code is just not a problem. Print statements and logging are enough.

  • pcwalton 3 years ago

    > This is a rather strange thing to bolt onto a language. I could see this as an external tool.

    It is an external tool. This is a crate, not a part of the compiler.

  • slashdev 3 years ago

    Are you saying there aren’t any legitimate uses for runtime reflection? Because I think Java and .Net, even Go have proved that wrong over the years.

    This seems like a valuable library. It’s impressive that it can be so powerful in a compiled language. C and C++ are much older but don’t have anything quite like this.

    • saghm 3 years ago

      A lot of what you'd use reflection for in GC languages is done with macros/code generation at compile time in Rust. For example, rather than using reflection to map objects to something like JSON to serialize, Rust has a library called serde (https://serde.rs/) that lets you annotate structs and enums and generate conversions at compile time that you can use. I wouldn't go so far as saying that there's no possible legitimate use of reflection, but I do wonder how much could be happening in Java and C# and Go that's so dynamic that you wouldn't be able to reason about it in advance. I think most of what reflection is used for in those languages _could_ be done at compile time, but it would both require a way to express it (via macros, codegen, or something like that) and be worth the extra compile time in order to save runtime. Rust's ethos is to try to optimize as much as possible for runtime efficiency even at the expensive of compile time, and while there can be (and often are!) ways to opt out of this for a given feature, it's almost never the default.

      • slashdev 3 years ago

        I’ve used Rust extensively the last couple years. I understand that. A lot of what people do with reflection in Go could be done more efficiently with code generation - but more easily with reflection. I’m sure the same is true in Rust, to a lesser extent. There are times when runtime reflection would be really nice to have.

  • loeg 3 years ago

    > This is a rather strange thing to bolt onto a language.

    It can just be an extremely fun and cute demo, without practical application.

    • jerf 3 years ago

      It can also be something that looks cool and doesn't necessarily ever get past "kinda works", but piques the interest of the core dev team and they take steps to make it work even better, resulting in the ultimate "deprecation" of this sort of thing by virtue of it being even better integrated into the core.

      I don't have the context to judge the probability of that in this specific case (lots of technical nitty-gritty comes in to this sort of thing), but I've certainly seen similar things happen in other communities.

    • More-nitors 3 years ago

      how about adding this to debuggers for better object-views? (could it be possible to provide near-js/python/java level of obj view?)

      • gpderetta 3 years ago

        Thus is already using DWARF debug infos. Using this for debugging would be a long way around to arrive where you started

        You can already script gdb to provide rich views of any data structure.

      • Deukhoofd 3 years ago

        DWARF is a standard for data to support debuggers, so this crate does effectively the opposite: it uses info normally only available during debugging to provide reflection.

8jy89hui 3 years ago

This is a beautiful (hacky) demo of something that I didn't think was possible in Rust (yet). I hope other applications don't accidentally start using it just to discover that it doesn't work in release mode.

Very impressive work!

  • jswrennOP 3 years ago

    Oh, I should add a note about that. Fortunately, it's quite easy to tell Rust to generate debuginfo even in release mode.

kp995 3 years ago

Can’t we rely more on Rust’s Pattern Matching and it’s strong type system?

Reflection seems more helpful when the programming language is little unsounded.

  • jswrennOP 3 years ago

    Absolutely! That's the approach that frunk [0] takes. Frunk (and other reflection libraries like it) are suitable for most use cases, and make better use of Rust's affordances.

    My crate is suitable for cases where you cannot know (or control) the set of types you might need to reflect on in advance. It's primary use-cases are related to debugging.

    [0]: https://docs.rs/frunk

Thaxll 3 years ago

Today I learn that Rust does not have reflection.

  • estebank 3 years ago

    Reflection is usually not available in AoT compiled languages. The prevalent Rust coding styles rely heavily on monomorphic data types and functions, meaning there's nothing left to reflect at runtime. But if you want to deal with trait objects and need to access the underlying type, you need to use Any::downcast or rely on annotations on every type you want to reflect on. Or now, leverage DWARF info on Linux with deflect.

    • planede 3 years ago

      That's runtime reflection.

      Compile time reflection AFAIK is available in D and Zig, and is planned for C++.

      • lmm 3 years ago

        "Compile time reflection" is an inconsistent and nonstandard concept; originally it seemed to just mean typeclasses for people who hadn't heard of typeclasses.

        • anonymoushn 3 years ago

          for weirdos who only have ad-hoc constraints instead of knowing what typeclasses are, it means that you can first say "I only have ad-hoc constraints" then say "wait wait I need to make decisions based on the specifics of the type" which may be useful for e.g. generating serializers and deserializers at compile time instead of using code generators like protobuf

      • GrumpySloth 3 years ago

        Yup. I consider runtime reflection an antifeature, which has negative performance effects, is unsafe (see e.g. log4j) and leads to fragile code.

        I would however welcome static reflection with open arms. In Rust in particular, I’d prefer it if derive was implemented using static reflection, rather than proc macros.

        • lmm 3 years ago

          Derive or equivalent ought to be something you can implement on top of frunk (so you're ultimately still depending on a proc macro, but the whole ecosystem only needs to depend on that one macro, and tools etc. can build in support for it) - that's how it's usually done in Scala.

      • elcritch 3 years ago

        That's right. Nim does as well. It's amazing. Once you get used to having CTTI and being able to use it, it's hard to program without it. Bonus points if you can do basic dependent types too.

        In C++ with SFINAE you can effectively do CTTI-style programming in C++. C++ has long had runtime type reflection as well (RTTI), though it needs to be compiled in. Looks like there's a boost library for CTTI.

        • Conscat 3 years ago

          The C++ reflection improves a lot in C++20, but it's still very limited compared to that aspect of Nim, or even Zig. The std::meta::info and "splices" based on Haskell for C++26 are incredibly exciting to me. I have many use cases in mind. Splices in combination with std::embed will make C++ basically just a bad Racket (but one with inline assembly!).

    • omginternets 3 years ago

      What are monomorphic data types? What should be my first read on the subject?

      • estebank 3 years ago

        It's a fancy way of saying "every time this type is used, replace all the generic type params with what was used and generate code for it". It's how generics are implemented in Rust. If you have

            struct Foo<T>(T);
        
        And you create Foo(42i32) and Foo(0.0f64), the compiler will create the equivalent to

            struct Fooi32(i32);
            struct Foof64(f64);
        
        In other languages like Java, generics are implemented the way that Rust does "trait objects" (&dyn Trait).

        Rust is not the only language that does this, to be clear.

        If you're interested in a quick intro on the compiler side of this, you can read https://rustc-dev-guide.rust-lang.org/backend/monomorph.html

        • estebank 3 years ago

          Expanding on trait objects: these are implemented as "V-Tables", structs holding pointers to the trait's methods and to the underlying type. This means that if you need to know what the underlying type, you have to do something fancy, usually referred to as "reflection". Also, invocation of generic functions that use V-Tables require "chasing pointers", which makes cache locality worse (because data might not be in the same cache read as the v-table itself), but makes the generated binary smaller (because if you have something like Foo<T> used with 1000 types, with monomorphization you end up with 2000 generated types in the binary, instead of 1001 with trait objects).

        • codeflo 3 years ago

          To add to this, even the Foo-wrapper is gone, just the i32 remains. Rust values are amorphous data blobs at runtime.

          • estebank 3 years ago

            Yes, that's true but that is an implementation detail that only comes into play when dealing with ABI, and then you should be using #[repr(transparent)] to ensure that the compiler won't do something else :)

            • codeflo 3 years ago

              Sure, it’s good to point out the difference between “the behavior of a typical optimizing compiler” and “things actually guaranteed by the language”. The context of the discussion was the former, I think. I’m not even that certain that monomorphization is actually required in theory.

              • estebank 3 years ago

                Yes, monomorphization isn't needed in theory, as long as the user-visible behavior remains the same, and in practice the team is exploring options[1] to identify cases where the currently manual practice of writing

                    pub fn foo<T: AsRef<X>>(x: T) {
                        inner_foo(x.as_ref());
                    }
                    fn inner_foo(_: &X) { todo!() }
                
                can be instead done by the compiler automatically (turning monomorphized code back into polymorphic code, hence the polimorphization hame).

                [1]: https://rustc-dev-guide.rust-lang.org/backend/monomorph.html...

          • CryZe 3 years ago

            ABI wise that is not true though. structs have struct ABI, even just a newtype struct around an integer will not use integer ABI unless annotated with #[repr(transparent)].

        • Joker_vD 3 years ago

          Pretty sure that some usage patterns of polymorphic types can not be completely monomorphized. Here's example in Golang:

              package main
          
              import (
                  "fmt"
              )
          
              type wrapper[T any] struct {
                  Value T
              }
          
              func (w wrapper[T]) String() string {
                  return fmt.Sprintf("{%v}", w.Value)
              }
          
              func stringWrapped[T any](n int, v T) string {
                  if n == 0 {
                      return fmt.Sprintf("%v", v)
                  }
                  return stringWrapped(n-1, wrapper[T]{Value: v})
              }
          
              func main() {
                  n := 0
                  fmt.Scanf("%d", &n)
                  result := stringWrapped(n, "test")
                  fmt.Println(result)
              }
          
          Go refuses to compile because it can't possibly generate all instances of wrapper[T] that this program may use: wrapper[string], wrapper[wrapper[string]], wrapper[wrapper[wrapper[string]]], etc.
        • shpongled 3 years ago

          Nice examples - you can also have languages (like SML) where monomorphization is simply an implementation detail. Some compilers (e.g., MLton) perform monomorphization and others don't.

          • codeflo 3 years ago

            I only recently realized that certain type system features, like polymorphic recursion, make monomorphization impossible in the general case. In Haskell for example, it’s by necessity only an optimization that’s used where applicable, and not the general strategy.

          • yakubin 3 years ago

            That depends on what you mean. SML has "polymorphism" boiling down to being able to plug an arbitrary type in some places, which is denoted like 'a. But when people talk about generics, they more often talk about C++ templates, Java generics, Rust traits, etc. whose SML equivalent are signatures, structs and functors. Signatures are a bit like Rust traits, structs are a bit like Rust implementations of traits, whereas functors are like Rust's "templates", i.e. wherever you swap angle brackets to parametrise something with types constrained by traits, or values constrained by types. Except in Rust this parametrisation can be slapped on a bunch of things. It can be on structs, on functions, on traits, on implementations of traits etc. In SML you need to group all the "parametrised" things into a struct (and a corresponding signature), which is going to be returned by a functor.

            And now the thing is: with transparent signature ascriptions, functors are monomorphised in SML, instead of everything being hidden behind signatures (as is in the case of Rust with traits when you use dyn), which has semantic consequences. E.g. a struct returned by a functor may contain a type. You can't perform proper type-checking without monomorphising, because you don't know what the exact type is. E.g. in the following program, the final line couldn't be type-checked without monomorphisation:

               signature ITERABLE = sig
                   type ElemT
                   type SrcT
               
                   val new_iter: SrcT -> unit -> ElemT option
               end
               
               signature LIST_ELEM_TYPE = sig
                   type T
               end
               
               functor ListIterFun (ListElemType: LIST_ELEM_TYPE): ITERABLE = struct
                   type ElemT = ListElemType.T
                   type SrcT = ElemT list
               
                   fun new_iter l = let val lr = ref l
                                    in
                                      fn () => case !lr of
                                                 nil => NONE
                                               | (x::xs) => (lr := xs; SOME x)
                                    end
               
               end
               
               structure IntElemType: LIST_ELEM_TYPE = struct
                   type T = int
               end
               
               structure IntListIter = ListIterFun(IntElemType)
               
               val next = IntListIter.new_iter [1, 2, 3, 4, 5]
            
            If I change the signature ascription on ListIterFun to an opaque ascription (:> ITERABLE), the final line won't type-check, because it's not obvious from the signature, that ElemT is int. So transparent signature ascriptions require monomorphisation (Rust traits without dyn), and opaque signature ascriptions free the compiler from having to do monomorphisation (Rust traits with dyn*).

            There was a lot of discussion of this issue when Go was settling on a design for its generics, under the phrase "reified generics".

        • dgb23 3 years ago

          Not exactly the same thing but JITs can turn dynamic objects into structs if the structure is consistent. JS runtimes and Julia do this as far as I know.

          • mmis1000 3 years ago

            Firefox's js runtime also do tricks like generate multi copy of optimized function when the function has multi call site instead make one with lots of if else. So it no longer suffer from the problem that function that frequently get multi different type of parameters from different call site has poor performance.

            It's probably exactly how templates work, except the details are invisible to users.

            https://hacks.mozilla.org/2020/11/warp-improved-js-performan...

          • estebank 3 years ago

            Yes! Java as well. And this is how those languages can show impressive benchmarks for consistent workloads. In theory they can even surpass AoT languages. In practice it depends on the specifics.

          • adgjlsfhk1 3 years ago

            Julia doesn't do this. It just has structs in the first place.

        • gloryjulio 3 years ago

          I think cpp does this too

          • estebank 3 years ago

            It indeed does. The only difference is that Rust has traits (similar to C++'s concepts) which require explicit mention of what interface the type parameters have inside the function, whereas C++'s templates will have a compile error after instantiation if you passed something that didn't meet the expected contract. This is closer to Rust's macros in operation.

            Given

                fn foo<T>(a: T, b: T) -> T { a + b }
            
            The compiler will complain that you should have been explicit on how T is going to be used:

                error[E0369]: cannot add `T` to `T`
                 --> src/lib.rs:1:32
                  |
                1 | fn foo<T>(a: T, b: T) -> T { a + b }
                  |                              - ^ - T
                  |                              |
                  |                              T
                  |
                help: consider restricting type parameter `T`
                  |
                1 | fn foo<T: Add<Output = T>>(a: T, b: T) -> T { a + b }
                  |         +++++++++++++++++
            
            whereas in C++ this would have been accepted until you called foo with two things that couldn't be added together, like a Rust macro[1].

            [1]: https://play.rust-lang.org/?version=nightly&mode=debug&editi...

  • Tuna-Fish 3 years ago

    Reflection is typically provided by a runtime, and languages that don't have runtimes usually don't have it. You shouldn't expect a low-level systems language to have reflection. There is no zero-cost way of implementing it.

    • Joker_vD 3 years ago

      Except Rust has runtime: [0]. And so, usually, does C (in hosted implementations).

      [0] https://doc.rust-lang.org/reference/runtime.html

      • pornel 3 years ago

        These are a couple of functions executables can call at run time, but they're more like an extra standard library. It's not a runtime in the same sense as a runtime in dynamic or GC languages that manages all objects and is able to know types of arbitrary objects and inspect/trace them.

        Rust has no run-time type information except limited downcasts via `dyn Any` or explicitly derived traits on per-type basis, and these features compile to type-specific monomorphic code rather than calling some run-time reflection.

        • throwaway894345 3 years ago

          Pretty sure you don’t need a runtime to track runtime type info. What we think of as a “runtime” in GC languages is usually several distinct things (a scheduler, a GC, and maybe some other stuff in the case of Java/.Net).

    • spacechild1 3 years ago

      This is of course only true for runtime reflection. And which language does not have a runtime?

  • snordgren 3 years ago

    Rust has very little influence from reflection-heavy languages like Java and C#. On their list of influences (https://doc.rust-lang.org/reference/influences.html), Java is not even mentioned, and C# is only mentioned for its attributes. There is very little overlap between the design philosophies that influenced Rust and Java/C#.

    Ruts does not support inheritance either. But I have never missed either feature in a Rust program.

  • nestorD 3 years ago

    The usual argument is that between having macro and focusing on a strong type system, there are very few legitimate usecase for reflection left in Rust.

unconed 3 years ago

My version of Greenspun's Tenth [1] is that any sufficiently complex static language contains an adhoc, informally specified, bug ridden and slow version of a dynamic "any" type.

Thx OP for providing an example.

[1] https://en.wikipedia.org/wiki/Greenspun's_tenth_rule

  • kibwen 3 years ago

    Rust has a dynamic any type, `std::any::Any`.

    • unconed 3 years ago

      The entire purpose of OPs thing is to give you a semblance of workable reflection so you can actually operate on said type. It requires byzantine hacks to read debug info and doesn't work on macOS.

      I don't think you understand how people in dynamic languages use any types at all.

bouk 3 years ago

It would be really cool if it was possible to natively inspect the state of a Rust generator in a type-safe way

armchairhacker 3 years ago

Does this still work if the application is complied in release mode or with optimizations?

Even if not, this is still very useful for debugging

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection