Settings

Theme

Pstore: Ruby Built-In Hash Persistence

github.com

102 points by hstaab 3 years ago · 41 comments

Reader

fny 3 years ago

I do a lot of ML and AI work nowadays... I miss Ruby a lot especially the its culture around ergonomics.

  • prescriptivist 3 years ago

    I recently had the need to build an internal system that distributed workloads across many workers via a client/server model. I did the proof-of-concept using druby [1] and it turned out to be so simple and stable that we just ran with it. It'd been years since I had used that library and instinctively I assumed we'd get the prototype out and then rebuild it using some sort of web service and utilize a high concurrency web server but druby just worked!

    [1] https://github.com/ruby/drb

  • brightball 3 years ago

    It keeps me coming back.

    Ruby itself is just such an enabler.

  • cdiamand 3 years ago

    There have been some interesting ML gems rolled in the past few years:

    https://ankane.org/new-ml-gems

    Any thoughts on what the Ruby community would need to build in order for it to become an attractive tool for AI work?

    • mattnewton 3 years ago

      My guess is some kind of corporate sponsorship. Someone with deep pockets to maintain it, encourage new apis keeping up with the latest papers, and make sure it works out of the box with the accelerator people want to use this month.

    • fny 3 years ago

      A huge cultural shift. People in scientific computing speak Python and R.

      Something would need to happen that makes Ruby far more attractive. Say performance parity with Crystal or Nim.

      • mattnewton 3 years ago

        I think it’s more than that, Julia exists and adoption is still slow. Lua and torch were plenty fast and they were still replaced by pytorch. I think to compete with python you need at least a fraction of the de-facto corporate sponsorship for python in the ML space.

    • waffle_ss 3 years ago

      As a primarily Ruby dev I'd prefer the AI/ML ecosystem not be split-brained between two languages that are semantically 90% the same thing. Just learn Python and integrate the models into your Rails (or whatever) apps.

  • green_on_black 3 years ago

    Have you tried Scala?

    • fny 3 years ago

      It's more of a cultural thing. People tend to write Ruby in a literate fashion and think critically about their APIs. Scala devs get a little over their skis sometimes playing with language features.

3pt14159 3 years ago

Don't use this. Marshal has too many issues. If you really need persistence and can't use something like Postgres, use the Ox gem instead. It's more reliable between versions of Ruby and easier to parse from other languages if you ever have to.

  • e12e 3 years ago

    > use the Ox gem

    The main thing is that it's part of the standard library. If you import a gem anyway, often you'd be well off with sqlite.

    As for storage format, there's also:

    https://ruby-doc.org/stdlib-3.1.2/libdoc/yaml/rdoc/YAML/Stor...

    • brunno 3 years ago

      I love the simplicity of YAML::Store. It was introduced in Ruby 1.8, almost 20 years ago (https://github.com/ruby/ruby/commit/55f4dc4c9a5345c28d0da750...).

      I even created a little gem when I was starting with Ruby, 10 years ago, that was a very thin wrapper around it so that I could play around using an ActiveRecord like syntax (https://github.com/brunnogomes/active_yaml). I used in some pet projects so I could do stuff like:

        p = Post.new
        p.title = "Great post!"
        p.body = "Lorem ipsum..."
        p.save
      
        Post.all # => [#<Post:0x895bb38 @title="Great post!", @body="Lorem ipsum...", @id=1>]
      
        Post.find(1) # => #<Post:0x954bc69 @title="Great post!", @body="Lorem ipsum...", @id=1>
      
        Post.where(author: 'Brunno', visibility: 'public')
        # => [#<Post:0x895bb38 @author="Brunno", @visibility="public", @id=1>, #<Post:0x457pa36 @author="Brunno", @visibility="public", @id=2>]
      
      And have access to the data directly in the YAML files.

      Good times!

      • 3pt14159 3 years ago

        The problem with YAML is that meaningful whitespace means that the size grows quickly for highly nested documents. I don't love XML, but there is a reason I recommended Ox. I've used it for real projects and it never fell over like so many of the alternatives I've tried where databases were not in the cards.

        • fny 3 years ago

          The problem with XML is that angle bracket expressions take up too much space because you need to duplicate element names. I don't love JSON, but there is a reason I recommend OJ.

          ...

          The problem with JSON is that the keys take up too much space because they are duplicated. I don't love BSON, but there's a reason why I recommend bson-ruby.

          And I could keep going... ;)

          The benefit of using YAML is precisely that there's meaningful whitespace. Different strokes for different folks.

    • Fire-Dragon-DoL 3 years ago

      I don't get the value of "it's in the standard library". Ruby has the amazing (fir scripts) require "bundler/inline" that allows you to use a single file for code and Gemfile, as well as auto installing the dependencies, so going for standard library doesn't seem to provide any practical value except offline support

      • e12e 3 years ago

        I used pstore for an ad-hoc monitoring service on an outdated windows server running an outdated ruby version - it was easy to set it up to run from task scheduler every five minutes and check resident memory of an old ruby service - logging the ram, and killing/restarting it if it was over 1 GB (this all on 32bit ruby with the limits of 4gb address space per process).

        Sure there are many things that "should" have been fixed above - but just having any old ruby version on hand was enough to help check for a memory leak and mitigate it - while taking the time to figure out if the leak could be plugged.

        And offline support (a server in dmz/locked down wrt new software) is big too!

  • vr46 3 years ago

    Is Marshal still tied to Ruby version? Boy was this fun about ten years ago for a system I inherited that Marshaled huge complex objects into TokyoTyrant and back. You try migrating or upgrading a system where the runtime version is tied to EVERY object in a database.

  • jrochkind1 3 years ago

    > too many issues

    Such as?

    • woodruffw 3 years ago

      Marshal is Ruby's version of pickle in Python: it serializes arbitrary objects, which means that correct deserialization requires arbitrary code execution.

      This is bad enough on its own, but it also makes pivoting a file read/write primitive into code execution much easier.

      • nurettin 3 years ago

        Why the "don't use it"? Just say "use it with caution" or, since we are being rude telling people what to do whenever pickle or marshal comes up, just don't say anything and assume people know what they are doing.

        • woodruffw 3 years ago

          I don't think I phrased that in a particularly rude way, but I'm sorry if it came across as rude.

          The answer is that we have serialization techniques that are as good on all the dimensions that matter (speed, serialized size, etc.) and better in terms of security. Pickle and Marshal are, at best, footguns in otherwise very safe language ecosystems.

          • nurettin 3 years ago

            > The answer is that we have serialization techniques that are as good on all the dimensions that matter

            I'd look at that sentence with great skepticism. What could possibly surpass a conversion to raw object representation? Do you mean libraries which require you to use protocol languages like protobuf or inheritance?

      • Rafert 3 years ago

        https://github.com/ruby/psych defaults to only loading permitted classes since 4.0 so that seems less of a concern now?

        • jrochkind1 3 years ago

          `psych`, used for YAML, is a different thing than Marshal. pstore uses Marshal. https://ruby-doc.org/core-2.6.3/Marshal.html. I don't believe psych will be involved with pstore.

          I'm honestly not sure, though, how much I should be worried about the fact that someone who has write access to my database can maybe escalate that to an arbitrary code execution if I use pstore. Literally not sure. Write access to my DB seems pretty disastrous already...

      • solarkraft 3 years ago

        Pickle is fine (in a pinch). It's not meant for untrusted data.

        • woodruffw 3 years ago

          Anything is fine when the data is trusted. The problem is that the data is almost never actually trusted :-)

why-el 3 years ago

Interesting. Transactionality is implemented via a regular thread lock, this means in a concurrent Rails app where this library is used in a hot path you might suffer some contention. Best is to use for marshaling data in non-hot paths such as stand alone scripts or app start up. I only say this because it's quite different from expectations around transactions in an SQL sense.

kayodelycaon 3 years ago

Note, this is a wrapper around Ruby’s Marshal class.

mrinterweb 3 years ago

I would think this would have limited usefulness for most web applications as the latest trend for web apps is to think of the deployed code as ephemeral, and local files are not something devs often rely on. I guess if you're mounting block storage or some other virtual file system that would be another thing. For non-web applications, this could be a simplistic replacement for what people often use sqlite for. The readme doesn't talk much about concurrent access to the store other than the transactions, so concurrent operations may also be a limitation.

aartav 3 years ago

pstore has been a built-in with Ruby stdlib for as long as ruby has existed, so _over_ 20 years.

  • mperham 3 years ago

    I'm assuming it pre-dates Rubygems because it really should be a gem. I can't speak for Japan but few people in the Western world seem to use it.

    • brunno 3 years ago

      There was a time when some stuff was being extracted (removed) from Ruby core and becoming gems and I really tought PStore and YAML::Store were going to be among those, but no, they decided to keep them in core. So maybe there are some important enough use cases that justify it being there.

      Or maybe it would be a hard task that didn't justify the effort.

      • byroot 3 years ago

        Many parts of the stdlib are being slowly gemified, that's the case of `pstore` too hence why it has it's own repo.

        It's now no longer technically stdlib, but a "default gem", a gem that is installed by default with ruby, see: https://stdgems.org/

        Since a few years every version remove one or two rarely used default gems. The Ruby core team just doesn't like big breaking changes.

  • tyingq 3 years ago

    Pstore also uses Marshal behind the scenes, so I assume has similar caveats you see in other comments on this thread.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection