Settings

Theme

Show HN: Kvass, a personal key-value store

github.com

227 points by maxmunzel 3 years ago · 128 comments (125 loaded)

Reader

jamesbfb 3 years ago

You stole my idea! I love it. As a dev who spends a big chunk of their day in the shell, this is the kind of tool that I was destined to create myself, but never did thanks to lack of time, laziness, life, etc.

tkindy 3 years ago

I’m wondering what sorts of use-cases people would use a personal key-value store for. Maybe it’s just a useful foundation for building other tools on top of, like a password manager.

  • maxmunzelOP 3 years ago

    The primary use case is for shuffling around files or clipboards between different computers. I also regularly use the url-sharing capability.

    Prior, I had to deal with ephemeral http servers, which I didn't like from an ergonomic perspective.

    Ergonomically, I find redis nice. The problem is, that it is in-memory and that encryption is cumbersome. Also, kvass is able to be used offline, as the kv-store is implemented as a CRDT.

  • nextaccountic 3 years ago

    For passwords specifically there's a similar tool, https://www.passwordstore.org/ - but it stores GPG-encrypted plain text files versioned with git, instead of managing a sqlite db

    More importantly, it has Firefox and Chrome extensions for auto-filling passwords on the web https://github.com/passff/passff https://github.com/browserpass/browserpass-extension

  • cyberge99 3 years ago

    I use a KV, Hashi vault, so my shell scripts get api keys, secrets, etc and they’re not stored plaintext or in SCM.

  • traviscj 3 years ago

    I use a similar setup to store code snippets (certain Java annotations for integration/unit tests, various things like that), vehicle license plate/vins, internal (but nonsensitive) ids for test accounts, tons of things like that.

    Honestly a password manager would probably be technically better—or a bunch of flat files lol—but there was a certain charm to having it displayed / function exactly as I like it, and lightning quick with nothing I didn’t need.

    IDE would be another natural place for a lot of my usages, but I kept finding I needed to leave it in a pull request review or slack conversation or similar, not necessarily programming myself.

  • resoluteteeth 3 years ago

    I use skate to store secrets used by some personal programs. I have scripts that pull out the secrets and set them as environment variables that are used by the programs. This way I don't have them sitting around in a configuration file in the source directory and can't accidentally commit them to git but they're easy to sync between computers.

  • tjpnz 3 years ago

    I already use my password manager for the problem this tool is trying to solve.

  • apavlo 3 years ago

    But it's just a wrapper around SQLite. Skip the middleman and just use SQLite.

    • pdpi 3 years ago

      Or don't skip the middleman and get a simple k/v interface instead of having to deal with a whole sqlite database.

    • capableweb 3 years ago

      It's clearly not "just a wrapper around SQLite", read through the README and it'll be evident why.

    • mosselman 3 years ago

      But you can’t access Sqlite over the web.

      • goodpoint 3 years ago

        ...and you shouldn't.

        • jonnycomputer 3 years ago

          Seems to me that for a personal tool like this, sqlite3 is non-problematic.

          https://www.sqlite.org/whentouse.html

          "Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."

        • gkbrk 3 years ago

          Any concrete reasons? SQLite is probably good enough for 99% of websites / apps.

          • goodpoint 3 years ago

            It's not designed to be straight exposed as a web service.

            It's not hardened to handled malicious traffic.

dheera 3 years ago

In case anyone is wondering about the name, it's a Slavic fermented bread drink that's much less alcoholic than beer (and commercially canned versions are near zero alcohol). It's one of my favorite chilled summer drinks, and you should be able to find it in Slavic stores in the US as well.

Sujeto 3 years ago

What does it do better than Skate? Or what additional things does it do, url and qr codes?

  • maxmunzelOP 3 years ago

    I think it's fair to say that skate is the more mature tool. Kvass on the other hand is more focused and simpler.

    Especially self-hosting kvass is even simpler than skate, and I had issues linking/syncing skate in the past.

    It would probably be a nice weekend project to port the url/qr features to skate.

prussian 3 years ago

Cool. Curious why you chose sqlite instead of something like badger [https://github.com/dgraph-io/badger] given you expose it as a key value database, which badger is.

  • maxmunzelOP 3 years ago

    SQLite allows me to keep multiple versions of the same entry, which is convenient for state merging. Half the sync logic is actually implemented in SQL. Other than that, I’m already familiar with it and the storage backend is not very performance critical for the intended use case.

losfair 3 years ago

Nice project!

I'm wondering why you choose to implement your own cryptography routines instead of using something standard like TLS. Apparently your `DecryptData` and `Encrypt` methods are vulnerable to replay attacks due to a lack of (EC)DH-style key exchange.

  • maxmunzelOP 3 years ago

    Thanks for the critique! I wanted to use symmetric crypto as its trivial to use without domains and certificates. The possibility of replays is a non-issue, as the key-value store is implemented as a CRDT and therefore all operations are idempotent.

    On the other hand, I didn't anticipate replay attacks in the design and thanks to your comment, I'll keep them in mind should I ever find myself in a scenario where they are undesirable...

    • losfair 3 years ago

      TLS is available in pre-shared key (PSK) mode. Looks like there is ongoing work to add TLS-PSK to Go's standard library: https://github.com/golang/go/issues/6379#issuecomment-117006...

    • makeworld 3 years ago

      It doesn't matter if the operations are idempotent. The point is that an eavesdropper can replay a message that sets a key, for example, overwriting whatever was there previously.

      It would be better to use an established cryptography system. You could do self-signed certs with TLS, like Syncthing does. Or just use SSH.

      • DecoPerson 3 years ago

        If the CRDT part is done correctly, then replaying a message that sets a key will not change anything, ever.

        If the message is:

        Key: Foo

        Reference CRDT node ID: 7654321 (the last node that the clients knows of that updated the value of ‘Foo’)

        Operation: Update

        Value: Bar

        The ID of this new node: 1122112211

        (Omitted for simplicity: Timestamps, hashes, …)

        Replaying that message won’t do anything if the target already knows about the existence of that new node.

        If the target didn’t know about the node, then I guess you’re helping them sync their own data? Maybe they owe you a thanks? If you knew what each encrypted message contained, you might be able to do some split-state shenanigans; for example: replay the message that sets a “PasswordAuthEnabled” key to “Yes” but deliberately omit the message that changes the “Password” key from its default of “password” to a genuine password. It’s very hard to imagine an actual situation like this occurring, but I guess that’s what makes crypto (and designing secure systems in general) so damn tricky. That and the math. And end users. And…

        • makeworld 3 years ago

          I see, thanks. I was focusing on the "idempotent" part but yeah a CRDT would protect against replays. Still not a great design though, still opens yourself up to issues, in case not all messages are part of the CRDT, or you have a buggy CRDT implementation.

          • tlb 3 years ago

            It's a shame that the meaning of 'idempotent' has gotten watered down by half-assed implementations. The original NFS paper from Sun [0] claims that write operations are idempotent, but they aren't really. Not if another operation has occurred. Like in:

              write '1' @ 0
              write '2' @ 0
              write '1' @ 0 (replayed through a duplicated packet)
            
            the duplicated write RPC reverts the second write. Duplicated link and rename RPCs are even worse. They added a replay detection cache in the server later to prevent some common error cases, but it fails if the server reboots in the middle.

            Anyway, CRDT correctness is hard enough that I'd be reluctant to trust it against an adversary who can inject replays.

            [0] https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=75...

greatNespresso 3 years ago

Cool project! Congrats on launching ! What is the benefit compared to reddit or CF workers KV?

  • maxmunzelOP 3 years ago

    Mainly self-hosting and generating share-able urls. If your key's end in ".html" the mime type is even set accordingly and you can use it for toy-websites ;)

    This is by no means meant to replace the backend of your app. It's more of an alternative to usb-sticks and google drive.

mordae 3 years ago

I like the idea. What surprised me was the custom network protocol. I expected it using ssh to work with the remote instance.

prezjordan 3 years ago

The built-in server and remote support is pretty nice! API seems solid, and I dig the QR codes.

  • vorticalbox 3 years ago

    Qr codes are pretty cool just a shame it never really took off.

    • ponyous 3 years ago

      What do you mean it didn't take off? QR code detection is implemented in native iOS camera and IIRC most android implementations too. Almost everyone can use it.

      In that sense it took off more than bitcoin.

      • vorticalbox 3 years ago

        That is true, its just not as widely used as I would have hoped.

        • rscrawfo 3 years ago

          I feel like that has changed over the past few years. Many restaurants in my area started using them for menus, and I recently saw them used to setup wifi while on vacation.

          • vorticalbox 3 years ago

            Maybe it's just were I live?

            The only places I see qr codes is on my phone to share the WiFi password and on products to scan for compitions and from time to time on advertising at bus stoos

          • Skunkleton 3 years ago

            I just paid my lunch tab by scanning a QR code on a receipt, and then tapping Apple Pay. It was rad.

            • swah 3 years ago

              In my country (BR) this transfer method (Pix) that can be iniatiated with a QR code has really picked up - I'm surprised a simple "scam" - replacing printed QR codes that are glued to resturant tables - hasn't caught on yet.

sigmonsays 3 years ago

i just have a directory in git and store everything in files

can anyone help explain what i'd use this for?

  • staindk 3 years ago

    I'm late to reply here but I only now got around to setting Kvass up and testing it out.

    I got it running on a free GCP Compute VM and linked it through to my PC so that the VM hosts the Kvass server and my PC (and in future laptop) set/get stuff on there.

    I plan on using Kvass to pass things between my laptop and PC - links, files, images... etc. Will see how that goes - perhaps I don't end up using it at all.

    If it seems useful I'll try hook my web domain in so that I have a more static domain to use it with.

vlan121 3 years ago

What is the benefit compared to the private use of Redis? Redis is under BSD licence and continues to be very actively maintained and used.

  • maxmunzelOP 3 years ago

    Redis is in-memory so it's prohibitive for big files. Also kvass still works if its disconnected from the server. This is important, if you want to use it for config files.

    On the other hand, using redis (/skate) for storing files was the inspiration for creating kvass.

  • jbverschoor 3 years ago

    I thought it’s a password store. Also bc of the name “v” pronounced Spanish = b, so key-bass, sounds like pass

uwagar 3 years ago

uh oh leave that drink alone mate.

VMG 3 years ago

now all that is missing is a FUSE driver

mattmerc 3 years ago

I don't know much about the other solutions that people are mentioning in the comments, but I have to say... this looks elegant! Great job!

mbreese 3 years ago

I have so many questions about this. Much of the architecture seems off to me. I like the concept, but it doesn't seem as secure as it could be.

For the README, I'd hope to find a bit more information about the way data is stored and transmitted. For example, this seems to just be a SQLite database with values in fields? Is there a separate encryption key for the database itself? Otherwise anyone with access to the file would be able to see all data stored?

The encryption key is only used to encrypt data in transit, but not at rest? And then you're encrypting the full JSON blob instead of only the values? This seems risky to me.

What is the purpose of the ProcessID? It is randomly generated and stored in the database (thus used by all clients too). So, I'm not sure what this is for? I see it's used to resolve conflicts, but these should probably be given out by the server?

Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.

Finally, I don't understand why you're using plain HTTP (no TLS) for communication b/w client and server. I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.

This would have been a great use-case for a simple (non-HTTP/JSON) TCP server:

    >>> AUTHTOKEN xxx
    >>> SET $KEY $LEN $SHA1
    >>> <bytes>
    <<< OK

    >>> AUTHTOKEN xxx
    >>> GET $KEY
    <<< $LEN $SHA1
    <<< <bytes>
Custom protocols have their own security issues, but it can also be easier to see where there are potential issues (like unmarshalling unvalidated blobs). If you wrap something like the above in TLS-PSK, you're set. If you want to use encryption for a session (after you authenticate), that's possible too, but you're at risk of effectively re-creating TLS.
  • maxmunzelOP 3 years ago

    Hi mbreeze!

    > this seems to just be a SQLite database with values in fields?

    Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...

    > anyone with access to the file would be able to see all data stored?

    Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.

    > Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.

    The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.

    > What is the purpose of the ProcessID?

    The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.

    > I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.

    Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.

    As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.

  • koheripbal 3 years ago

    I just use WinSCP with remote file encryption turned on and have VeraCrypt for the local temp storage.

    That way my entire working file system is encrypted at rest, in transit, and while stored remotely - entirely with heavily mature off the shelf open source tools.

mike_hock 3 years ago

Can you also drink it?

mahebub 3 years ago

Hack Mama

markstos 3 years ago

For personal use, I’ve had good luck storing things in files. Then when I need those those things, I read the files.

  • chrisseaton 3 years ago

    This seems unnecessarily snarky. You can make anything sound silly by reducing its functionality to the most basic level possible, ignoring all aspects of ergonomics and packaging. And you could make this comment about any storage engine. Like the infamous Dropbox comment here.

    • markstos 3 years ago

      Fair. The project could do a better job of explaining what benefit is it has over the file system API.

      • chrisseaton 3 years ago

        For example sharing a public link to a value.

        And syncing between file systems across a network is hard. (Before you say it's easy you can just do X, Y, and Z... remember that infamous Dropbox comment.)

        • markstos 3 years ago

          It was easy to share public links to values hosted on the file system in 1995 with Apache. It remains easy today with Nginx and other web servers.

          Syncing filesystems across networks with rsync has worked well for years.

          If you are considering a personal key value store, you are probably already familiar with web servers and rsync. If not, they are two general purpose tools which are likely to be useful for other projects as well.

          I was absent the day of the infamous Dropbox comment.

          • chrisseaton 3 years ago

            > It remains easy

            You're just parroting the original comment which was proven to be so so wrong in practice. Most people aren't able to / don't want to duck-tape random systems together like this.

            I could snakily ask you what's the point of Nginx? Why not just run a dial-in BBS? Don’t you have the skills to do that? Why do you need this fancy Nginx and why did anyone bother writing it? That’s what you sound like.

            There's value in building something that is integrated.

      • kaoD 3 years ago

        Mostly the "remote" command as seen in the README.

    • pydry 3 years ago

      Dropbox explained itself pretty well.

      A simple one paragraph why at the top of this project's README wouldnt be amiss.

      • chrisseaton 3 years ago

        For example

        > Its trivial to set up and operate kvass across multiple devices

        > remember the file we stored earlier? Let's get a shareable url for it!

        • pydry 3 years ago

          I read it. Im pretty clear on what it does. Im still not feeling the why (or the differentiator from other things that store files and give you URLs).

          Remember when Dropbox explained itself by telling you you didnt need to carry around USB sticks in your jean pockets that get washed or lost? I thought that was pretty neat.

        • amelius 3 years ago

          > Its trivial to set up and operate kvass across multiple devices

          Still, using a distributed file system is so much better, as its API is supported by basically everything else (including Dropbox!).

          I feel that a key-value store goes against the Unix philosophy and is solving an imaginary problem.

          • vineyardmike 3 years ago

            A distributed file system seems like way more work to set up.

            Also not everything has to follow the Unix philosophy. Plenty of very useful things are better off less Unix-y eg ffmpeg. But this doesn’t seem to do a bad job - it’s a very dedicated tool to do one thing, it just doesn’t store everything as files.

    • dadoge 3 years ago

      Out of curiosity, what is the infamous Dropbox comment?

    • vbezhenar 3 years ago

      What's wrong with Dropbox comment? I still didn't find any use for this service, but rsync works for me almost every day. IMO Dropbox is useless.

      • chrisseaton 3 years ago

        > What's wrong with Dropbox comment? ... IMO Dropbox is useless.

        That's just repeating the original ignorant Dropbox comment. Over 15 million paying users don't think Dropbox is useless. And hundreds of millions of non-paying users don't either.

      • NoraCodes 3 years ago

        Many people do find it useful, and the people who created it have become very wealthy.

        I'm in the same boat as you, but there are more kinds of people and situations in the world than just us.

      • shreyshnaccount 3 years ago

        yc and Dropbox realised that people would like to pay for it. in the same logic, a toaster is useless, I can always just heat bread in a pan no?

      • shreyshnaccount 3 years ago

        a browser is useless, you can always send a request through curl and read the html.

  • VoodooJuJu 3 years ago

    Yeah but is your filesystem endorsed by a fun & quirky children's cartoon beaver? Can it do QR codes? Didn't think so.

  • nkrisc 3 years ago

    To your point, the very first examples don’t really demonstrate much value, even if they are the most basic examples of how it works.

    It’s a bit like selling a car by showing all the different things you can hold in the cup holders.

    • vineyardmike 3 years ago

      There are literally hundreds of distributed networked KV stores used by software developers for all sorts of projects. Showing how to store “hello world” seems like a pretty good intro.

      Why can’t people see a use case for this? It maybe doesn’t compare as unique against the hundred other KV stores but it’s also a toy project and a KV store seems to have an obvious use?

      Personal, I’m going to try this out since I was actually looking for a similar KV store. Only because I was looking and HN presented it to me tbh.

      My use case is that I have a few Raspberry Pis at home (aka low powered) that I wanted to have a distributed config on. I wanted something easy to manipulate with a command line that was lightweight (eg not redis or consul or a password manager). Since it’s for LAN use (or actual Tailscale) the security wasn’t really important.

  • christophilus 3 years ago

    Heh. Snarky but true. I store just about everything in a “notes” folder which is mostly markdown files. Easily searchable / editable with any tool you like.

  • goodpoint 3 years ago

    This is spot on. Filesystems are more powerful, fast and scalable than people think.

vander_elst 3 years ago

What's the use case for this (besides being a nice learning project)?

I didn't see this on the readme.

raydiatian 3 years ago

I hope this feeling is me catching onto the joke in the name rather than being a first responder

  • nathell 3 years ago

    The name can be read as an acronym of ‘Key-Value ASSociative store’, but also alludes to the beverage: https://en.wikipedia.org/wiki/Kvass

    • jve 3 years ago

      Picture in README.MD really tells you that author is aware of kvass (the drink). This repo actually made me google up that wiki page to get an answer: "Is this drink really called kvass elsewhere, not only in my country?". Yes it does it seems.

mkoryak 3 years ago

Somewhat unrelated: Can one buy kvass starter in the United States, and if so, what is it called?

I'm not interested in bottled kvass, it never tastes like the real thing and you don't get to watch kvass explosions in the bottle as it is being made

deltasepsilon 3 years ago

how about:

  echo "value" > ${home}/.db/key
  cat ${home}/.db/key > value
  scp -r ...
d1l 3 years ago

Wait, this is just a toy project.

  • tuxie_ 3 years ago

    What do you mean? It's a project. It has a purpose and it achieves that purpose. If you don't need a lot of code to achieve it, what's the problem? What makes it "toy"?

izhak 3 years ago

Not trolling or trying to downplay anybody here, but honestly - how “kvass” (readed as “k-v-ass” given it is a “key-value” storage) is a good name?..

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection