Tell HN: Upgrade your Metabase installation

208 points by zhoutong 3 years ago · 75 comments

Reader

One of the better decisions we took at my firm was to not allow direct access to any production DB to analytics visualization tools like Metabase and Redash.

Always write your analytics data to a separate DB in a periodically run job. Only store aggregated anonymized data in the analytics DB you expose to internal stakeholders via tools like Metabase.

jimmytucson 3 years ago

Also your production database is optimized for different workloads than your analytics database.
Usually production is used for fetching and updating a small number of records at a time (think updating a shopping cart), and has strict latency requirements whereas analytics involves reading a large amount of data in columns (think count group by one or two columns), and can be done in batches where the results can get a more and more stale until the next batch runs.
- bingemaker 3 years ago
  
  How do you batch write the results (say updating shopping carts) when frontend has to reflect whats in the database?
  - patmorgan23 3 years ago
    
    They're talking about moving data between two different back end databases. Your production database optimized for your application/latency.
    Then you have your warehouse database that you updated once a day with information from prod.
namaria 3 years ago

That's a great idea and it articulates something I have thought about the whole "use boring tech" things (which I support). It doesn't preclude letting people use the shiny new thing. You can always let them plug it in and use it. But the core of the system should be as simple as possible and based on thoroughly understood tech (from the point of view of the team in question/accessible labor market).
- nucleardog 3 years ago
  
  I tend to discuss things in terms of the trunk, branch, and leaves.
  Mostly in that the leaves of your system (parts that nothing else connects to or builds on) are generally a low risk place to try new things sometimes. If you do run into any intractable issues, it’s also an easy spot to pluck it off and replace it.
nneonneo 3 years ago

Worth pointing out that we recently discovered an RCE in RestrictedPython that affects Redash: https://github.com/zopefoundation/RestrictedPython/security/...
This should further emphasize the need to isolate these tools and ensure they are only accessible to people who need them.
98codes 3 years ago

Exactly right -- we do all of that, and even then tightly control and audit who has access to the anonymized, aggregated, read-only data cube.
MattJ100 3 years ago

What kind of tooling do you/people use for that? Or just custom scripts?
- appplication 3 years ago
  
  Look up OLTP vs OLAP data stores to get an idea. There are a lot of common patterns for the specifics of implementing this. Usually you run a regularly scheduled job that dumps data representing some time period (e.g. daily jobs). There are some considerations for late arriving data, which is a classic DE interview question, but for the most part, big nightly dumps of the last day’s data/transactions/snapshots to date-partitioned columnar stores using an orchestration engine like Airflow is sufficient for 99% of use cases.
  - patmorgan23 3 years ago
    
    Tangent: I hate OLTP and OLAP as acronyms. They're only one letter/word off and completely obscure the relevant meaning lots of semantic noise. Just say transactional vs analytical processing. (They are still good search key terms because lots of existing literature/resources use the terms)
- mh- 3 years ago
  
  (not the person you're replying to)
  I can't recommend any specific tools without knowing a lot about the environment, but if you're looking for terms to google: ELT (Extract, Load, Transform) and CDC (Change Data Capture) will give you a sense of the landscape.
  edit: the sibling comment that mentions Airflow is a good answer for an example of an ELT workflow.
- pbreit 3 years ago
  
  Don't Maria, Postgres, etc make replication pretty easy?

lecha 3 years ago

How many of you have received this notice via an official security advisory channel you're monitoring/acting on? If so, which advisory service do you use and how you configure it? Learning about HN is useful, but far from a reliable solution.

swe_dima 3 years ago

I am subsribed to their Github releases and when I saw a release for every old version I knew what's up :-)
- rudasn 3 years ago
  
  Yeah I do the same for projects I use. I also received an email but don't remember if I also signed up to their newsletters or something like that.
Mandatum 3 years ago

Saw it on HN.
not_your_vase 3 years ago

It is definitely not announced on Full Disclosure nor on oss-security mailing lists.
- worthless-trash 3 years ago
  
  Doesn't look like there is a CVE either: https://www.cvedetails.com/vulnerability-list/vendor_id-1947...
  - capableweb 3 years ago
    
    > Will you release any information about the vulnerability?
    > Yes, we’ll be releasing the patch publicly, as well as a CVE and an explanation in two weeks. We’re delaying release to give our install base a bit of extra time before this is widely exploited.
    From their blog.
    
    ungamedplayer 3 years ago
    
    Oh absolutely, but its trivial to get a CVE from the relevant CNA's. A webform or a phone call.
    Its a bit silly.
    
    capableweb 3 years ago
    
    Don't you have to share more details about the exploit then? That seems to be the thing they're trying to avoid for now.
    
    worthless-trash 3 years ago
    
    Negative, you can request a CVE without specific details, CNA's do this all the time until unembargo.
xctr94 3 years ago

I got an email directly from Metabase.

exabrial 3 years ago

I think it's important to review the term "Zero Trust" because so many companies are getting it wrong.

Zero Trust does not mean: "No mor VPNs and private IP network ranges, everything is public. ::elitist hipster noises::"

Zero Trust simply means: "Just _because_ you're on a private network [or coming from a known ip], doesn't mean you're authenticated."

You should have every single one of your internal network services (like Metabase) behind a VPN like Wireguard or numerous other options. The sole purpose of this is to reduce your firewall log noise to a manageable level that can be reviewed by hand if necessary.

Obviously this isn't perfect security, but that's the _entire_ point: every security researcher says security should be an onion, not a glass sphere; many layers of independent security.

kevincox 3 years ago

This is why I try to put everything behind NGINX with basic auth. Unfortunately not everything works well that way but in this case I suspect that this is made unexploitable by anyone without the password.

tedeh 3 years ago

Ha, I was just about to go in here and say the same thing.
"Fortunately" some "white hat" hacker contacted us last year about another Metabase exploit. I gave him a 30 USD tip and ended up doing exactly what you are suggesting.
Now I'm glad that means I don't need to interrupt my vacation to fix this thing right now.
- fuomag9 3 years ago
  
  Here in Italy you get lucky if the company is not suing you :(
  - konschubert 3 years ago
    
    EDIT: I misunderstood.
    
    calessian 3 years ago
    
    That’s simply not true, sadly; you’re very much reliant on the company not attempting to sue you. Counter examples (not implying these have been successful, but it is also not unheard of to have the police show up at your door and collect all computers/phones etc. to investigate)
    - https://www.golem.de/news/connect-app-cdu-verklagt-offenbar-... - https://www.heise.de/news/Modern-Solution-Anklage-gegen-Aufd...
    
    selimco 3 years ago
    
    We have the same
    https://www.zeit.de/digital/datenschutz/2021-08/cdu-connect-...
    
    konschubert 3 years ago
    
    I thought gp was talking abhobt their employer suing them for bugs they created.
vdfs 3 years ago

Hmm, I was thinking that's a standard thing, atleast in HN crowd. basic setup Cloudflare -> Nginx -> Docker -> 3rd Party app, all on a dedicated vm
tlrobinson 3 years ago

You can also setup some reverse proxies to auth with SSO like Google. I use Traefik + https://github.com/thomseddon/traefik-forward-auth for personal projects, even on my local network.
square_usual 3 years ago

I like NGINX, but I prefer how simple it is to set up Caddy with basic auth. Caddy is already simpler to configure (and has automatic SSL via Let's Encrypt), but it's so simple to get its basic directive working compared to NGINX that I do it by default now.
lomereiter 3 years ago

Better yet, oauth2-proxy in case of an organization: only admins need to know the secrets, every user simply uses SSO to get access.
nullcipher 3 years ago

or vpn

riadsila 3 years ago

For more context: https://www.metabase.com/blog/security-advisory

vxNsr 3 years ago

They say they’ll be releasing the patch publicly, but isn’t this OSS, can’t anyone just do a diff and with a little “elbow grease” find the patch?
- JJJollyjim 3 years ago
  
  They haven't released the source, and the compiled versions are non-trivial to diff (e.g. there are nondeterministic numbers from the clojure compiler that seem to have changed from one to the other, and .clj files have been removed from the jar).
  The old version has `hash=1bb88f5`, which is a public commit: https://github.com/metabase/metabase/commit/1bb88f5
  Whereas the new version has `hash=c8912af`, which is not: https://github.com/metabase/metabase/commit/c8912af
  - batch12 3 years ago
    
    I could be wrong (and often am), but I am seeing updates related druid client authentication.
  - MuffinFlavored 3 years ago
    
    I didn't even know you could have a "private" commit on GitHub/an open source repo like that.
    
    JJJollyjim 3 years ago
    
    Oh, I didn't mean to imply you can, just that it's 404... presumably it exists in a repo checked out on someone's machine, and maybe in a separate private Github repo.
    
    MuffinFlavored 3 years ago
    
    This is silly on my end (I woke up early and have time to kill)...
    Also like, note: I would never publicly disclose whatever I find, I'm just curious
    I observed exactly what you said about the Clojure filenames not matching up, etc. etc.
    #!/bin/bash # Variables DIR1=~/metabase-v0.46.6.jar.src # decompiled with jd-cli / jd-gui (java decompiler) DIR2=~/metabase-v0.46.6.1.jar.src # decompiled with jd-cli / jd-gui (java decompiler) # Function to create fuzzy hash for each file in a directory create_fuzzy_hashes() { dir=$1 for file in $(find $dir -type f) do ssdeep -b $file >> ${dir}/hashes.txt done } # Create fuzzy hashes for each file in the directories create_fuzzy_hashes $DIR1 create_fuzzy_hashes $DIR2 # Compare the hashes ssdeep -k $DIR1/hashes.txt $DIR2/hashes.txt
    How far do you think this gets us (fuzzy hashing)?
    I was thinking this, or binary diffing the .class (instead of the "decompiled" .java)?
    
    JJJollyjim 3 years ago
    
    I found something which is clearly a security fix, using the same idea but more naive: just diffing at the lengths of the decompiled files. It's not at all clear how the issue I found would be triggered by an unauthenticated user though.
- hadrien01 3 years ago
  
  > Yes, we’ll be releasing the patch publicly, as well as a CVE and an explanation in two weeks. We’re delaying release to give our install base a bit of extra time before this is widely exploited.
  - bdonlan 3 years ago
    
    Unfortunately that means it's not possible to deploy this without violating the AGPL...
    
    jabart 3 years ago
    
    No one cares. It's a two week violation and no one is going to hunt anyone down who released this early internally.
    
    Nextgrid 3 years ago
    
    Even though this is technically a violation, licenses aren't black & white. The objective and intent of the AGPL is not being violated by delaying release by a couple weeks to give time for security patches to be applied.
- MuffinFlavored 3 years ago
  
  https://github.com/metabase/metabase/compare/v0.46.6...v0.46...
  I can't tell if that's it?
  edit: I've looked at it a few times, I don't think that's it?
  - panki27 3 years ago
    
    The only thing that seems remotely interesting is the "private key" part - I don't know Clojure but it doesn't seem like that's it.
    
    MuffinFlavored 3 years ago
    
    They backported it to v0.45x and those changes don't seem to be included: https://github.com/metabase/metabase/compare/v0.45.4...v0.45...
    aka, It isn't checked in to source control publicly yet. Interesting.
    I tried to "decompile" the jars and loop over the files but it didn't yield much/wasn't clean enough to be of help.

thomasfromcdnjs 3 years ago

It would be nice to know if this vulnerability affects people who never made their Metabase installations publicly accessible.

Aka if I am running Metabase locally.

Mandatum 3 years ago

It’ll be an RCE. If you are network isolated or have a proxy in front of it, you can take the weekend off.
MuffinFlavored 3 years ago

How would an attacker exploit that?
- ac2u 3 years ago
  
  A vulnerability (not necessarily this one, just hypothesising) could be exploited via a payload result from an outbound request to the internet.
  - MuffinFlavored 3 years ago
    
    I thought when the OP of this comment thread said locally they meant like, it isn't exposed to the Internet
    
    ac2u 3 years ago
    
    "exposed" as a word does a lot of heavy lifting here. When someone is asking me casually "hey, is this server exposed to the public internet"?
    I take it to mean "can someone connect to it in an inbound manner from the public internet?"
    If the answer is no, it doesn't necessarily mean that packets don't have other ways of making their way to the server, for example, a service running locally could have a webhook mechanism that fires events to an internet-accessible server whenever certain events happen.
    You might trust the services you're sending requests to as part of that, but they could become compromised and send exploits as a response. Other vulnerabilities could be services running locally but that reach out to the internet to check for updates... more surface area to exploit.
    If the OP was asking "I'm running this locally and I've set up my machine and firewalls to disallow any packets outside of the loopback interface", then the risk of the unpatched server is certainly reduced, but they could still be affected by another piece of software running on the same machine with internet access that is compromised first.
    Anything beyond an isolated machine with 100% air-gapping is theoretically at risk.
    Doesn't mean that the OP's question was a bad question or anything, they can use the answer to know how quickly they should worry about patching based on their own situation and risk tolerance.
    
    thomasfromcdnjs 3 years ago
    
    Great answer btw.
    And yes, that is what I meant. curl hackmeplease.com 57 stack traces down.

not_your_vase 3 years ago

Emergency deployment late Friday afternoon (by EU time, at least), the best way to end a week :)

kmitz 3 years ago

Thanks for the heads up ! Without your message I'd probably have found out in a couple months :)

smithcoin 3 years ago

If I have my metabase installation protected behind oauth with G suite am I protected from these kinds of vectors?

Dachande663 3 years ago

Perhaps a naive question, but if running metabase within a docker container, what permissions would this RCE have? AFAIK the container has network access and access to the mounted volumes and that's it right?

JJJollyjim 3 years ago

Presumably the metabase instance also has credentials to access some databases, some of which may be have enough privileges to also get RCE on the database machines (as well as messing with the data they hold).
- Dachande663 3 years ago
  
  We issue separate read-only credentials for database access fortunately. Still doesn't remove the risk of all the data been exfiltrated though.
hiatus 3 years ago

The container has access to whatever database you connect metabase to for BI. If the db connection credentials are available to the container, it's possible a malicious actor could access your prod db.
throwaway6734 3 years ago

It depends on how the container is being run and if it has root Access

formerly_proven 3 years ago

> Extremely severe. An unauthenticated attacker can run arbitrary commands with the same privileges as the Metabase server on the server you are running Metabase on.

Java deserialization strikes another one down, I assume?

theanonymousone 3 years ago

Will it still be (as) dangerous if Metabase is running inside a container?

Mandatum 3 years ago

To all the data inside of it? Sure.
To all of the auth tokens and user creds? Why not.

jacob_rezi 3 years ago

What would happen if a software's database was completely accessible via an open api end point?

exabrial 3 years ago

thank you!

Settings

Tell HN: Upgrade your Metabase installation

Keyboard Shortcuts