Meilisearch 1.0 – Open-source search engine built in Rust

blog.meilisearch.com

448 points by tpayet 3 years ago · 185 comments

Reader

We’ve used Meilisearch in production and it is the closest thing to self hosted Algolia you can get, which in itself is pretty amazing.

Unfortunately the performance of indexing (constantly changing records) wasn’t great and Meilisearch would fall behind on indexing records for hours.

Meilisearch has been amazingly great for projects where records don’t change all that much (eg docs, or even a customer database), but if you have for example a fast paced ecommerce system with 50k records constantly changing (eg product inventory), it falls over pretty quick. We had to transition over to Elastic for this aspect of our app.

The other issue we faced is their Rails gems falling out of step with the server, and when fixes came out, the Rails gem was incompatible for a while.

I really really hope 1.0 increases performance to the point where it becomes production ready, because the initial out of the box performance (before getting bogged down with indexing) was pretty amazing. Better than Elastic and on par with Algolia.

I recommend keeping Meilisearch on your radar. It is going to be great.

I wish the best for the Meili team and hope they succeed!

Kerollmops 3 years ago

Thank you very much for this amazing feedback, really appreciated.
We did a lot of improvement to the indexing part of the engine and now can auto-batch updates which gaves incredible improvements. We will continue to work on this in 2023. Can I know the version you were using?
- nezirus 3 years ago
  
  My experience with indexing is similar. Up to lets say 1M docs it works fine, but after that it goes south. Even with auto-batch I had to manually prepare large bulk updates and wait for completion during inserts to not overload MS. (I am using Rust client).
  Other than that, it is simply great. Ranking stuff is great, simple, I only need custom weights there, some additional functions (not just asc/desc) and it would be perfect.
  - nacs 3 years ago
    
    I had the same experience.
    Pro: Meilisearch search speed and memory use was great compared to others (at the cost of large storage requirements but that's the cheapest thing to upgrade).
    Con: Indexing documents (even with recommended batch sizes) was extremely intensive on the system as the document count increased (upwards of 20 million docs to index).
    I had to modify the indexing script to completely pause indexing when system load average went too high to prevent the whole server crashing.
    Also, this 1.0 upgrade apparently requires a full export and import of data if you're upgrading from the previous release? I hope this isn't the case for >= v1.0 releases because I'm not looking forward to exporting/reimporting 200+GB of Meilisearch data files over and over again.
message 3 years ago

Same ran into same version inconsistencies with Java library.

manigandham 3 years ago

Congrats to the team, it's been interesting to watch the development of Meilisearch (and it's close competitor Typesense). Algolia has really paved the way here but it's nice to see the open-source options with more configurations and better default UX.

There's also many search libraries if you want to embed search more deeply into your app. I have a list of modern search systems and libraries here: https://manigandham.com/post/search-systems-libraries

naiv 3 years ago

I will never understand who the target group of Algolia is besides a website where the number of records coincidentally is in the range of the number of queries. At least they got rid of the pricing per indexing transaction which made it even more absurd.
If Algolia would offer an instance based pricing on cpu, ram and storage they would be the clear winner imho.
- manigandham 3 years ago
  
  Why do the number of records and searches have to be similar? The current pricing is simple - you pay per "search unit" which scales in both dimensions.
  The vast majority of small/medium customers would rather pay-as-you-go than maintain a fixed cost instance, and it allows Algolia to efficiently pack them into a multitenant architecture instead of wasting resource overhead.
  - naiv 3 years ago
    
    If you eg index geonames, you have 4 mio. records but you might only have 50.000 queries a month. you pay $4,000 for minimal compute resources, 4GB of RAM and 3 gigabytes of storage space. Would be less but algolia requires you to create a replica for each sort option separately.
    With 4 mio. records and 4 mio. queries I would pay the same. But then at least have 4 mio. queries.
    The other way around, if we would just index all 200+ countries in the world and have autocomplete with a lot of visitors we would pay for eg 50.000 users per day typing in 3 letters again $4.000.
    Same for us, we offer 350.000 movies with 2 mio. scenes. With Typesense or even Elasticsearch Cloud we would pay 5% of what we would pay Algolia.
    
    manigandham 3 years ago
    
    Your usage seems to be in the "large" customer category where provisioned capacity is a better deal. Algolia does have volume discounts if you talk to them, but yes the other alternatives might be a better fit.
    
    Aeolun 3 years ago
    
    50000 people per day is ‘large customer’ territory? That’s less than a request per second.
    
    manigandham 3 years ago
    
    You're missing the point. Their scale (in number of records and queries) is a better fit for provisioned capacity than pay-as-you-go in the context of billing.
    The exact small/large label doesn't matter, nor does the requests-per-second.
    
    maxFlow 3 years ago
    
    Would you mind sharing your thought process to get from `daily_users` to `reqs_per_sec`? I'm playing with some estimations of `concurrent_users` for a basic website, and I'd be quite interested in the breakdown.
    
    swyx 3 years ago
    
    let's think step by step:
    24hrs * 60mins * 60secs = 86400
    86400 < 50000
    QED
- luhn 3 years ago
  
  A while back I priced out what Algolia would cost us and it ended up being thousands of dollars per month for something that was currently running a t3.micro Elasticsearch instance. Our usage was just worst-case when it came to their pricing dimensions: A large number of very small documents with a low search volume.

kacy 3 years ago

We’ve been using a Meilisearch for the last six months or so and have been delighted with its performance and usability. It uses a fraction of the resources as Elasticsearch, and the language support is extensive and very active.

That being said, our cluster is much smaller than other ones I’ve worked with in the past, so I can’t comment on its reliability at massive scale. I’ve also been very impressed with how active contributors are on GitHub and in their Discord. Everyone seems like good people, and it’s a project I’m excited to keep using.

tempest_ 3 years ago

This is the thing I find when people post "ElasticSearch Alternative".
80% of ElasticSearch's value add (wrt search anyway) is all the clustering and frame work that allows you to span the search over tens or hundreds of machines "easily".
I think the same is true here. Probably the comparison should be with the underlying search libraries that ES sits on.
I suppose this comparison makes sense in a world where most people don't run their own servers much any more since the clustering etc would be a problem for the cloud offering and not the consumer.
- Semaphor 3 years ago
  
  > 80% of ElasticSearch's value add (wrt search anyway) is all the clustering
  Or configurability. I looked at this again now that 1.0 is out, but besides the .NET client still being in an alpha state, it’s also very zero-configuration. There seems to be no configurability regarding tokenization strategies, for example.
  Now, I certainly see the appeal, I barely understand my own ES code and meilisearch replicates probably 70% of it with no configuration at all, that’s impressive, but it also means that switching would mean giving up on those 30%.
- kmac_ 3 years ago
  
  Yeah, Elastic also brings advanced aggregates and filters, Kibana, nice UI where you can explore data and create dashboards easily and tons of bigger and smaller features. But in some areas both products are comparable.
- trilobyte 3 years ago
  
  This was the sense I got as well, though I have only started playing w/ Meilisearch. Clustering was one of the top 3 features that let Elasticsearch take over the market so quickly. In the playing around w/ Meilisearch I've done, it seems more like a replacement for something like Sphinx so far.
tpayetOP 3 years ago

Thank you very much! I'll share your comment with the team <3

sandstrom 3 years ago

Great news!

Been following along for a while and it's a great project. ElasticSearch needs some competition.

For us, there are two things missing for us before we could make the switch:

1. Multi-index search; Standard use-case is searching across e.g. users and companies. Common in many SaaS-applications, where you want a single search field with type-ahead for e.g. contacts/organisations/tasks/events.

2. Decay functions; Basically to gradually phase out results for things based on age, distance or something similar. ElasticSearch has pretty good support for these. https://www.elastic.co/guide/en/elasticsearch/reference/curr...

ferdi05 3 years ago

Thanks for your feedback! The Multi-index search is planned, coded, and will be integrated on v1.1 (scheduled for April). The decay function is really interesting, the team will reach you back to know more about this need :)
joking 3 years ago

Elastic already has a great competitor called solr, which I prefer on multiple aspects over elastic by the way.

chimen 3 years ago

Is Rust that important that you have to place "built in Rust" in the title? Is this like a cult following that we only bet on traffic and interest coming from other evangelists where Rust is the only feature that matter?

4 months ago: " Meilisearch, open-source alternative to Algolia in Rust lands a $15M Series A"

It's not the first time I see, there are at least 2-3 daily submissions reaching the FP in this manner so I'm curious: "built in Rust" = marketing these days?

spoiler 3 years ago

As some people mentioned: I generally expect a higher standard of software when I see "build in Rust".
That expectation includes a few things such as stability and operational UX (ie how easy it is to run and maintain).
And these (in my experience as a Rust developer) stems from the fact that it's much easier to get the MVP and business logic taken care of becau I'm not bogged down by the drudgery of menial tasks that C++ imposes.
There's also a much lower "devtime" cost to adding UX in Rust than C++
Of course, this all holds equally true when comparing Rust to a higher level language like TypeScript and its rich ecosystem, but it does come at higher resources utilisation for the same task too (on average, maybe not always, especially after the code gets JITed).
- wejick 3 years ago
  
  While I can agree on the argument that rust offer many ergonomics and keep us away on many classes of security and memory management related issues. The quality of UX/DX is more defined by product and design requirements, not much to do with language of choice. Similar like restful api, good or bad is on the designer hand most of the time, not because it's implemented on some esoteric language.
  - sodapopcan 3 years ago
    
    OP said "devtime cost", ie, time to implement. They made no mention or insinuation of quality.
  - spoiler 3 years ago
    
    Yeah, but there's usually more time to put an emphasis on UX, as well as the fact that the ergonomics of the language again make it easier to implement the UX.
thiht 3 years ago

An argument that I didn't see mentioned: when I see "search engine", I immediately think of ElasticSearch, which is Java, which requires a JVM.
Knowing Meilisearch is written in Rust makes me confident I can probably just run `./meilisearch` and get something working. I can also guess it'll be more resource efficient (CPU, memory) than ElasticSearch. I also *hate* ElasticSearch developer experience, and have had extremely good DX with Rust tools, so I can guess maybe their query language is saner. Maybe all this is wrong, but this is what I'm feeling when I see "written in Rust". So yeah, writing it conveys some meaning.
curquiza 3 years ago

Hello!
For me, "built in Rust" can be a real marketing argument. Indeed, Rust is a language that has proved its safety in the past. Building a technical product in Rust guarantees stability and safety (no memory issues in general) and performance (no garbage collector issue), so it brings more trust to the users.
- rockwotj 3 years ago
  
  FWIW milli does use unsafe in places.
  Also I would recommend not conflating no GC and performance. There are lots of reasons for Rust being fast and many have nothing to do with no GC. The main reasons a lot of languages with GC are slower is due to allocating on the heap as opposed to the stack, and in general Rust does a lot of static linking and the compiler has the full amount of information to optimize calls without needing to move stuff to the heap. That's the main perf win.
  Actually there are times when GC is more efficient than than automatically freeing memory because GC can batch cleanup work.
  - Kerollmops 3 years ago
    
    I am the co-founder and maintainer of the engine, and I confirm we have some localized unsafe blocks for when we interface with the C library: LMDB.
    However, I prefer having a few unsafe blocks that I can review carefully than a single one encapsulating the primary function.
  - UltimateEdge 3 years ago
    
    > Actually there are times when GC is more efficient than than automatically freeing memory because GC can batch cleanup work.
    Where can I read about these details of software performance? Can you recommend a book?
- groestl 3 years ago
  
  Safety yes, but stability? It's crash early model wrt stack overflows and out-of-memory errors seems to trade off availability for safety. Not proficient in Rust as a user nor as a developer, so an honest question.
  - nicoburns 3 years ago
    
    > It's crash early model wrt stack overflows and out-of-memory errors seems to trade off availability for safety
    It's worth noting that while you can catch these kind of errors in C, very little software actually does so. The only software I'm aware of that does this is SQLite. Your C software will more than likely crash in OOM and StackOverflow situations too.
    
    groestl 3 years ago
    
    > while you can catch these kind of errors in C
    Tbh C would have never crossed my mind as an alternative for Rust in this case. I guess that says something.
  - cies 3 years ago
    
    So Zig then?
    
    groestl 3 years ago
    
    I'd have chosen JVM for safety, ease of deployment, stability and performance, and Java for maturity of the toolchain (fully aware that required memory would be a concern). With that in mind, can you pitch Zig to me?
pdimitar 3 years ago

It's important, yes. If it was written in C/C++ I'll fully expect to have my servers pwned due to a memory safety bug that these (and many other) languages don't protect against.
There's a number of technical people with decision-making powers that pay attention. And a part of them prioritize Rust-written projects.
It's a sound (literally) and safe investment.
I don't get the people getting ticked off by the "written in Rust" clarification. Can we finally stop pretending that all programming languages are equal? They absolutely are not.
- rockwotj 3 years ago
  
  Milli uses unsafe for what it's worth so it's not as safe as you may think.
  I do assume any post with "written in rust" does better on hacker news.
  - pdimitar 3 years ago
    
    Question is: do you realize that Rust's `unsafe` is still not as unsafe as C without bound checks, double `free`-s, and others?
    Point is, it's still an improvement. I am regularly amazed as to why that factually correct and (more and more) time-proven argument is always skipped when criticizing Rust.
    
    megous 3 years ago
    
    Yes, but do you realize you don't have to write C code without bounds checks? You can add your own.
    
    pdimitar 3 years ago
    
    No, I never would have imagined. /s
    Point is, many don't do it. And they don't tell anyone. And they deploy important software. And then we get unpleasantly surprised years (or decades) later.
    Same argument as with C++: proponents say that the modern C++ is almost like Rust which is cool and I am happy for them, but there are literal hundreds of millions of C++ coding lines out there that will never get upgraded. Having a cop-out like "yeah but the modern version is better" doesn't help legacy code.
    Rust on the other hand is super strict for a long time now. They did the right thing.
    
    megous 3 years ago
    
    Sorry, I don't know any important C codebase that omits necessary bounds checks casually. Often times it would not even make any sense. (If you want to iterate over an array, you don't just iterate until infinity, lol.) String manipulation is often wrapped in functions that handle length/storage size/reallocation/etc in the background.
    
    pdimitar 3 years ago
    
    Nobody is saying they do it casually. People genuinely believe they are without fault, which leads to stuff like Heartbleed (and many others; from 2017 to 2020 there was a number of HN submissions about various well-known pieces of software having buffer under/over-flows).
    I heard the ideal theory you cite, many many times. Yet many people still do mistakes. How does that fit in your world-view?
    
    megous 3 years ago
    
    Not sure what ideal theory you mean. I didn't state any.
    Most of the code in my Linux distro is written in C, yet I don't see many segfaults or data corruption in my favorite tools, even those exposed to the internet. It just works. Supposed buffer overflows and double-frees don't affect me daytoday despite 95%+ code I run being written in C, "catastrophic" issues like heartbleed notwithstanding.
    People make mistakes, sure. They'll make them with "safe" languages, too. Rust programs are not immune from mistakes. They'll just be of a different kind.
    PHP is memory safe, and there were many easily exploited (not just exploitable) vulnerabilities in software written with PHP. (and it doesn't even have escape hatches out of its memory safety)
    
    pdimitar 3 years ago
    
    > If you want to iterate over an array, you don't just iterate until infinity
    And yet I am sure we all have witnessed code bases where this was done. At least I and no less than 40 other colleagues I knew have.
    The "ideal theory" refers to the old adage of "just be a good programmer, duh" which historically has been proven to be a complete BS.
    > Rust programs are not immune from mistakes. They'll just be of a different kind.
    Glad we agree on something. I want memory safety problems out of the equation.
    Also please don't fight straw-men -- all Rust discussions seem to always spiral from the very reasonable premise of "Rust eliminates a class of bugs" to "But with Rust you can still make logical mistakes!!!!!", and nobody ever claimed the opposite anyway...
    
    Yoric 3 years ago
    
    For context, an obligatory reference (slash shameless self-promotion) to definitions of safety and safe languages.
    https://yoric.github.io/post/safety-and-security/
    
    Yoric 3 years ago
    
    Out of curiosity, do you know projects with C code and bounds checks?
    
    tialaramex 3 years ago
    
    Actually Unsafe Rust is arguably more fraught than C because the rules in Unsafe Rust are just as tough as they are in Safe Rust - much tougher than C - but in Safe Rust the language and libraries promise to take care of that, whereas in Unsafe that's on you. So choices which are in fact harmless in C will result in UB in Unsafe Rust.
    The benefit is that unsafe passages of Rust are rarer and should be safely abstracted from APIs for use by Safe Rust. Mellisearch seems to often (but not always) provide safety rationales for unsafe code, explaining why whatever is done is OK. I don't understand this domain in enough detail to comment on the quality of the rationales.
    
    pdimitar 3 years ago
    
    Yes and no, unsafe Rust still holds a number of invariants, wherever if you just go on your own writing C you have zero.
    I prefer the number of guarantees / invariants that's above zero.
    
    rockwotj 3 years ago
    
    I'm not criticizing Rust, I really like Rust and think it's great, along with the trend of more things written in it.
    At the same time assuming you'll never get a memory issue is over the top.
    
    pdimitar 3 years ago
    
    Dunno, I think it should be clear for all experienced devs that the "never" part is still not achieved.
    Personally though -- and in my work -- I'll take any improvement that I can. I am sick of reading about yet another important piece of software having yet another memory safety zero day pwnage bug.
  - serverholic 3 years ago
    
    A little unsafe is better than all unsafe.
rozgo 3 years ago

Built in Rust tells me a certain bar has been reached. Tells me the team went through a lot of effort to do their best. And the mature tooling makes it easy for me to evaluate and confirm my assumptions. So yes, positive marketing in my case.
Yoric 3 years ago

I believe that there is a good reason.
Rust is currently in the process of trying to eat some of C++'s cake (as well as that of Java, C# or Go). The usual response from C++ (Java, etc.) devotees is that Rust hasn't been tried on large projects so it cannot be compared. Which absolutely makes sense.
Each large scale project that demonstrates that Rust can be used successfully in a domain where C++ (Java, etc.) traditionally rules is a step forward for the Rust community.
Also, as with every language, there is a hype period. We're currently in the Rust honeymoon. My personal honeymoon has stopped a while ago, but Rust remains my favorite language for the foreseeable future.
- bayesian_horse 3 years ago
  
  People overreact to hype and anti-hype. Rust is already a useful tool and has momentum. But it's not trivial to get into and it won't replace C++ codebases overnight either.
  Based on historical data, a good lower bound for its future could be Ruby. According to TIOBE, Rust overtook Ruby in popularity, while Ruby has maintained roughly the same popularity for years. At worst, I expect Ruby to stay about as relevant as Ruby on Rails. But it doesn't look that way...
jabo 3 years ago

I’ve come to see it this way:
If a set of users are using a product only because it is built in X, that user base is most likely the early adopter audience for X and it dangerously masks whether that product has product-market fit or not.
So if a product markets itself as built in X, it is appealing to early adopters of X.
The long-tail of users on the other hand, care more about what painful problem the product is solving for them.
Now, some of the features of X might provide benefits to end users, but the long tail of users care more about those benefits they get rather than the fact that X provides those benefits, and that the product uses X.
- xpe 3 years ago
  
  An open source project also has to attract contributors. Rust is a competitive advantage in terms of appeal relative to languages such as C, C++, Java, empirically speaking.
avinassh 3 years ago

This info helps me decide to check out a project. I am familiar with Rust, so any project in Rust gets my attention quickly. If the title didn’t mention the language, I would first check the language used!
It’s an open-source search engine which one could self-host. Language and tooling matter a lot to me and is often deciding factor.
potatochup 3 years ago

As someone who primarily writes rust for work, it appeals to me because I feel much more confident I can fix an issue myself if I need to. Same with python (which I'm equally familiar with), although I have much less confidence when modifying python code that I won't break something unrelated.
serverholic 3 years ago

I’ve found that there are a confluence of factors that make “built in rust” important to me.
First of all, Rust is relatively new so this tells me that the codebase is likely new.
Secondly, I think rust tends to attract smart people who like programs to be small and fast. Case-in-point meilisearch is a simple, single binary download.
Both of these together indicate that a project has a higher chance of being freshly written code, by smart people, that is small and fast.
Before I get a bunch of we’ll actually’s, I’m not saying these things are true 100% of the time.
hota_mazi 3 years ago

"Built in Rust" carries with it a few positive connotations:
- It's fast. - It's safe. Or more specifically, memory safe, which implies that it will be harder to compromise than similar products written in a different language.
Also, these two points are not hype.
orangepurple 3 years ago

When I see built in Rust I know the software is well built, easy to further extend if it is open source, and probably won't crash or bug out on me.
timeon 3 years ago

> cult following
Seems unnecessary to jump right to cult following. Not "cult" but RSS following is often the case where keyword in title makes the difference. I wonder why does it bothers you if it is not relevant for you. What is your problem? Why can people let others be?
> It's not the first time I see
Obviously and your comment is not first complaining about that title contains "implemented in X".
drcongo 3 years ago

I'm not a Rust dev, but I am the target market for this product and I kinda care that it's written in Rust. That gives me some (possibly entirely wrong) confidence that it's likely to be a single binary, easily installed, fast and relatively safe.
More broadly, if there had been two headlines on the front page today and the other said "Open source search engine written in Node / JS" I would make assumptions about the 7 million dependencies and endless security updates in every single one of them that I'd have to monitor. Obviously I would also skip straight past that one. So yes, the technology choice is important.
- Yoric 3 years ago
  
  For context (and I say that as a Rust developer), please note that Rust has the same strength/weakness as Node in terms of dependencies.
  There is ongoing work to strengthen this. I do not know the status.
  - drcongo 3 years ago
    
    Thanks. Would I be correct in assuming that there should be less of a burden on me as an end user with Rust though as I only need to update the one binary that I installed?
    
    bogeholm 3 years ago
    
    Usually you would only need to update a single binary, as the dependencies are compiled in.
    You may get some libc-issues if you try to run a binary built for a newer Linux on an older Linux, unless it is built to target musl - don’t remember the details 100%
    
    Yoric 3 years ago
    
    How did you install the binary? If you installed it from source, you'll need to `cargo update && cargo build`.
    If you downloaded a binary or installed it from your distro's repo, generally you just need to update that one binary, yes.
    
    drcongo 3 years ago
    
    Thanks for clarifying. Yes, I'd typically install a Rust program from the distro's repos.
  - mcronce 3 years ago
    
    Yes and no - deploying a Node project, I need to install all its dependencies. Deploying a Rust project, I still typically only need to pull down the binary. The general attitude is still to pull in dependencies to do a job instead of inventing your own solution, which I consider a good thing, but not everybody agrees.
didip 3 years ago

Why would this not matter?
Built in Rust means no annoying GC pause, and that's important for a database.
And it also hopefully means less on-heap abuse.
paraboul 3 years ago

For some reason "built in Rust" resonate with "unbloated" to me, which can be quite appealing for this kind of software.
adamnemecek 3 years ago

It is important. Rust projects are infinitely easier to contribute to.
- dividedbyzero 3 years ago
  
  Compared to what?
  - bsnnkv 3 years ago
    
    For me: compared to projects in languages with less mature tooling and compilers less capable of preventing entire classes of errors by default.
  - adamnemecek 3 years ago
    
    Anything. Legit no language comes even close in terms of how easy it is to git clone something and get it to build.
    
    hu3 3 years ago
    
    I'd argue Go projects tend to be easier to build since they require nightly Go builds much less frequently (I don't even remember a project that ever required nightly Go tbh).
    https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
    
    earthling8118 3 years ago
    
    I've had the absolute opposite luck with go. There's a lot of good things made in it but I prefer to not be involved in them. A nightly toolchail for rust is trivial to acquire to the point of it taking only a few seconds
    
    Keats 3 years ago
    
    Maybe it's better now with Go modules but the last couple of repos I wanted to contribute a few years back just didn't build. They both were relying on the master branch of some other projects that had breaking changes.
    
    dividedbyzero 3 years ago
    
    But contributing takes a lot more than just a working build environment.
    
    adamnemecek 3 years ago
    
    Correct but getting a build environment easily is such a quality of life improvement.
  - freilanzer 3 years ago
    
    Python?
renewiltord 3 years ago

For those of us who like editing our source code, it's useful to know. And since this is HN, that's a lot of us.
wtetzner 3 years ago

This is Hacker News, where people are ostensibly interested in the technology used to build projects.

tmikaeld 3 years ago

My team tried to use Meilisearch for large datasets, unfortunately, it's impossible to plan the RAM usage. If you have very little searches, it consumed very little, but if you have a lot of search traffic, it may consume more than we could provision beforehand. This made it too unpredictable and too expensive, so we went with Manticore instead. I don't know if this has been addressed in 1.0, hopefully it has.

marban 3 years ago

Do you have any numeric definitions for few and lots?
- traverseda 3 years ago
  
  I think that they might have fixed it. I noted this as a problem with earlier meilisearch releases as well, but reading through the documentation it looks like they don't require the entire index to be in memory any more, allowing it to be a memory mapped file.
  https://docs.meilisearch.com/learn/advanced/storage.html#lmd...
  >For the best performance, it is recommended to provide the same amount of RAM as the size the database takes on disk, so all the data structures can fit in memory.
  > [...]
  >It is important to note that there is no reliable way to predict the final size of a database. This is true for just about any search engine on the market—we're just the only ones saying it out loud.
  Looks like a 10MB document is taking ~200MB, from their docs. I don't think that scales linearly though, since it's a reverse index it is going to scale based on the number of unique words it finds, with each document adding a bit on top of that. You'd expect it to have a pretty big index to cover common english words, and then each document adds a bit on top of that.
  Definitely seems like somewhere they could make some improvements though. Some transparent compression could probably help, and with zstd's dictionary feature it can be fine tuned to the data they're actually seeing.
  Not about to replace xapian in kiwix (offline wikipedia reader) any time soon, I think.
- tmikaeld 3 years ago
  
  Our index was aimed at handling 20 000 documents at total of 35MB of CSV, this would balloon into 0.7GB to 1GB of RAM and we expected at least 1000 of these indexes, which would require dedicated servers with 1TB of RAM. This was when Meili was at version 0.27.
  With manticore, we've tried to run into these issues in benchmarks, but the only problem we got was temporary high IO load when indexes need to be re-indexed with new or changed documents. In total it's at 50-70% of the RAM usage compared to Meili.
  We'd be happy to re-visit, but looking at the docs - it seems to be about the same as it was back then (a year ago).
  - qdequelen 3 years ago
    
    You should definitely try Meilisearch again. We have optimized a lot of the consumption and indexation performance. Even with all the improvements, we think it's essential to continue focusing on it during 2023.
    And indeed, Meilisearch uses memory-mapping, which means that everything is on disk, and it will try to take as much memory as possible. For your information, we successfully ran a 115M documents dataset on a 1Gb RAM machine.
  - snikolaev 3 years ago
    
    BTW if you are using the default row-wise Manticore storage, you may try out the Manticore columnar storage [1]. It can decrease the RAM consumption further.
    [1] https://manual.manticoresearch.com/Introduction#Storage-opti...
    
    tmikaeld 3 years ago
    
    Thank you for that, I'll give it a go today!
    
    tmikaeld 3 years ago
    
    It seems to be too slow for our amount of updates, updates would need to rewrite the whole column.
k__ 3 years ago

How is the startup time?
Would be nice if you could check a query and then start the instance with an appropriate memory configuration.
- traverseda 3 years ago
  
  Pretty much instant, it loads data from a memory-mapped file so having a fast SSD for that is a must.
  - qdequelen 3 years ago
    
    Yes, indeed, it's crucial to have an SSD. With it, loading will be instant (a few ms).

amateurdev0_07 3 years ago

Thank you for making MeiliSearch. I use it for a personal project that gets a few hits a day, mostly from me and my friends.

https://pulpflakes.com/fmisearch/

It's a search over an index of fiction in the English language, first published in periodicals. Searchable by author, artist, magazine name and specific issue. Biggest index has about 200K documents, doc sizes are tiny.

Integrated with my WordPress site by handwritten PHP. Which was fun.

Performance is great. I didn't run into too many issues, and those I did i could resolve. What i remember:

1. The rules for text searches are too strict by default and if the order of words is different, will result in no matches. A, B will not return a result if B A is in the database.

2. Creating an index, uploading documents and changing settings required quite a bit of work. A week's worth of coding, almost. Would have loved to have a reasonably robust shell script that could take a JSON file with metadata on index and do the grunt work.

3. I have multiple types of documents, would have liked search to cover all of them so I don't have to change search type manually each time.

4. The default number of documents and max uploaded file size is too low. 200K and 200 MB or something. But it fails even on smaller file size.

The above sound like complaints. They're problems I ran into and others might. I love how productive Meilisearch made me. Thank you.

networked 3 years ago

The most specific criticism I have read of Meilisearch is https://news.ycombinator.com/item?id=32940683. It has four points: (1) words beyond 65535 are silently ignored (this is documented in https://docs.meilisearch.com/learn/advanced/known_limitation... ); (2) the position of a matching word in a document non-optionally affects ranking; (3) to get the match information you must retrieve the entire attribute; (4) the meaning of PUT and POST is switched relative to RFC 7231.

Are points (2) through (4) true? Has any of the points been an issue for you in practice?

Kerollmops 3 years ago

What’s funny is that (1) doesn’t look like a real limit when you know that the first Harry Potter book is nearly 77000 words. The recommended way is to split your documents by paragraph to increase relevancy, this way you can see the exact part that match.
About (2) we will work on exposing two new ranking rules to be able to control that.
For (3) I thought it was fixed.
We decided to implement (4) the PUT and POST this way after looking how others were doing that.
- networked 3 years ago
  
  Thanks for your reply. I agree about (1). I have checked the datasets I have set up search for, and they either have no or under 1% of documents with more than 65535 words. (This is without any processing to break up the documents into sections.)

mmachatschek 3 years ago

This is awesome news! We've been using meilisearch in production for a few months now and we're more than happy with its reliability. Their work of the last few months really paid off, as the search speed and especially the indexing speed has increased a lot thanks to their efforts.

I'm excited to see all the things they'll build in the future.

tpayetOP 3 years ago

Thank you Markus <3

nop_slide 3 years ago

This looks really cool and I might try the self hosted option out on my small website as an upgrade from Postgres’ full text search.

I was hoping the cloud version would be more appealing, granted there seems to be a generous free tier but the next option is $1200 a month?!

ren_engineer 3 years ago

their free tier looks like it has a "pay as you go" option once you exceed it that's identical to the paid option per 1K searches and 1K documents. You are basically paying for priority support, pretty common strategy and seems fair to me.
just noticed you don't get high availability on free tier which sucks, but I guess if search is mission critical to the point you need it, you would be willing to pay. Most of these database type companies start off targeting enterprise and then roll out self-serve solutions as they scale.
tpayetOP 3 years ago

Sorry, it might not be obvious, but you can go over the free tier and pay for the usage at 0.25$ for each 1000 searches/documents :)
- nop_slide 3 years ago
  
  Ah yep sorry I missed that! Good to know. I just saw the next option was $1200 and my eyes became fixated on the number.
  Maybe I will try out the cloud version then even though I expect my site would probably be well in the free tier limit, like I said it seems like a very generous tier.
koblas 3 years ago

Was excited to see a non-GC'ed search engine that looked solid. But, without having the replicated - distributed version of it in the "free" tier makes it hard to really evaluate.
- tpayetOP 3 years ago
  
  Feel free to reach out to quentin@meilisearch.com, we'll find a way for you to evaluate the pro plan!

leeoniya 3 years ago

compared to https://typesense.org/ ?

curquiza 3 years ago

Meilisearch made a comparison
https://docs.meilisearch.com/learn/what_is_meilisearch/compa...
bduffany 3 years ago

typesense did their own comparison here:
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...
- curquiza 3 years ago
  
  Unfortunately, the comparison with Meilisearch is not up to date in this link.
  Also, we have to keep in mind that every comparison written by a company is always oriented.
  - jabo 3 years ago
    
    I maintain that comparison page on the Typesense side. I just updated it as recently as yesterday, based on my observation.
    But let me know which ones need updating for Meilisearch. Happy to update.
    While we’re on the topic, reminder about some of the outdated information in your comparison pages: https://twitter.com/typesense/status/1620825236055932928?s=4...
    
    traverseda 3 years ago
    
    I'd say that the bit where typesense can only work with data that fits in ram is actually a pretty big problem for a lot of use cases, as an aside. That feature alone would discount typesense for basically all of my personal projects. Might be a trade off I'd be willing to make on a professional project given the other features but it seems really wasteful.
    Personally I find the meilisearch comparison to be more useful for the type of stuff I'm doing: https://docs.meilisearch.com/learn/what_is_meilisearch/compa...
    Of course I'm not a large enterprise e-commerce site. I'm doing personal projects like web archiving, (dataset probably won't be anywhere near fitting in ram) or I'm using search engines on embedded devices (search needs to play well with others, not use all my ram).
    
    remram 3 years ago
    
    That is also MeiliSearch's recommended setup though:
    > For the best performance, it is recommended to provide the same amount of RAM as the size the database takes on disk
    https://docs.meilisearch.com/learn/advanced/storage.html#lmd...
    
    traverseda 3 years ago
    
    There's a big difference between recommended and required. Of course things work better if your entire dataset fits in ram, and of course at the giant enterprise scale you can do that, but it's not something I'm going to do on my VPS along side wordpress, you know? I don't really care about getting maximum possible performance when the data is only going to be accessed intermittently. I care about letting the OS maximize performance by choosing what gets cached in ram.
    
    jabo 3 years ago
    
    Typesense follows a memory model similar to Redis - you need sufficient RAM to hold the entire dataset.
    I don't want to speak for the Meilisearch team, but from observing user reports like this [1], it seems to me like you'd need at least X-2X RAM to run Meilisearch, if X is the size of your dataset, if you want it to not slow down as it swaps content from Disk to RAM.
    [1] https://news.ycombinator.com/item?id=34708658
    
    traverseda 3 years ago
    
    I mean that user report is from me, and was about a very very early meilisearch version. Maybe wrong link?
    > if you want it to not slow down as it swaps content from Disk to RAM.
    Obviously it's going to be fastest to run with your entire dataset in RAM, that's never in doubt. Part of why I find the whole typesense comparison page disingenuous is that you're making the ability to swap to disk sound like an anti-feature. The whole things just sounds biased in a way that the meilisearch comparison doesn't.
    There are some killer features in typesense for sure, just my first impression of it is that it's very much aimed at someone other than me.
    >Typesense follows a memory model similar to Redis
    The difference is that redis is primarily being used as a cache, or for IPC, or as a task-queue. You're not loading a whole bunch of data into, and you expect that the data you have in it will either be short-lived (IPC, queue) or can be evicted with no issues (caching).
    
    jabo 3 years ago
    
    > Part of why I find the whole typesense comparison page disingenuous is that you're making the ability to swap to disk sound like an anti-feature.
    Didn’t intend it that way. In fact, we recommend that users configure swap space even in Typesense as a safety mechanism.
    May I know which part of the comparison table makes it sound like that?
    The one under Index location says: “Disk with Memory Mapped files” for Meilisearch, which I updated based on the Meilisearch team’s feedback…
    Edit: To your first point, I meant to link to the parent comment: https://news.ycombinator.com/item?id=34708352
    I’ve also seen similar RAM recommendations from the Meilisearch team on GitHub to other users reporting similar performance issues.
    
    traverseda 3 years ago
    
    >May I know which part of the comparison table makes it sound like that?
    Well a few things. Normally I'd try to coach these a bit kinder and all that, but I hope you don't mind if I just come out and talk about the issues. Keep in mind that these are just my interpretations after a quick read through.
    # Bias
    > Instant Search-as-you-type Experiences for up to a few hundred thousand records, that don't require a production-grade highly-available setup.
    Seeing as meilisearch is your biggest competitor saying that they're not "production grade" sounds biased. "Production grade" is subjective, and I understand why it's written that way you define production grade to include high-availability multi-node configurations. I don't necessarily disagree but I think you need to drop "production grade" from that sentence and just say "high availability". Maybe add a row to your overview talking about high availability since it seems to be one of the factors you consider to be a significant differentiator.
    >Only supports a single-node setup, which creates a potential single point of failure and so is not production-ready, despite the v1.x versioning.
    This here is another spot where you seem biased. Remove the "despite v1.x versioning", it comes off as petty. I'd also remove the part where you say "is not production ready". You seem to have a very concrete idea of what production looks like, but for me one example of production looks like a raspberry pi in a school house in rural africa (internet in a box project). Under those constraints typesense isn't production ready.
    I get what you're saying about "production ready" but there must be another way to word it?
    The whole "production ready" line of reasoning comes off as arrogant and petty in general.
    >Runtime Dependencies [...] Recommends use of nginx, apache or the like as a reverse proxy in front
    Meilisearch is also a single self-contained binary with an embedded http server. I don't think either of you support https. Do you really not recommend the use of a reverse proxy? How do you route subdomains? I guess you're assuming it's running on a stand alone computer with a public facing IP and no SSL? Are you not providing a frontend/dashboard? You've made this sound like a draw back, if you had of put "None. Self-contained binary" in front of it like you did for yourself that would be fine but for this you mention a feature that you have while ignoring what looks to me like the same feature in your competitor.
    >Language support
    This is also a bit confusing, and I can't help but think it's probably not completely honest. What makes meilisearch different so that it doesn't support "all languages", but elasticsearch does? Meilisearch certainly claims to support all languages where words are seperated by spaces, do you support languages that don't have words separated by spaces?
    This implication for this line seems to be that meilisearch isn't indexing on unicode, or something. Just weird, needs more detail probably.
    This user claims that meilisearch has better multi-language support: https://news.ycombinator.com/item?id=34708802
    So what's the difference?
    >Number of Documents
    This is completely fine! Good job linking to the pertinent issue and everything. This is how you mention significant drawbacks without seeming biased or petty.
    # Target use case
    This also seems to be pretty firmly aimed at large enterprise clients. If that's not the impression you're going for, well change the memory model but there's some language in this comparison that can probably help.
    > CDN-like Geo-Distributed clusters
    Just sounds buzz-wordy to me. Might be fine if I didn't get the impression for the previous paragraphs that my use cases weren't "production ready (webscale?)" and that I'm using it wrong if it's not on a server with 24 TB of ram.
    This is more about who the intended customer is than about bias though, so I don't think it's really an issue. Your intended customer isn't some bloke running wordpress on a VPS, it's large scale enterprise and that's fine. If you want to soften that there's a few more things you'll need to change, but when combined with the above stuff about "production ready" it leaves a bit of a bad taste in my mouth, like you'd really rather I be paying you exorbitant rates to run this in your cloud than just using it.
    
    jabo 3 years ago
    
    I really appreciate that you took the time to write this detailed comment! Thank you!
    > but for me one example of production looks like a raspberry pi in a school house in rural africa (internet in a box project)
    This is an interesting perspective, one that I hadn't considered before. You're saying that software can be run in a variety of different environments and that the definition of what a "production" environment looks like is context-dependent.
    My definition of "production" in the context of server software is that you typically run this software on a server or set of servers in some datacenter (think Redis, Postgres, MySQL, MongoDB, etc). In this context, I've always defined "production-ready" as:
    1. Can it withstand infrastructure failures?
    2. Is the API stable?
    So when I say Meilisearch is not "production-ready", it's in this specific context - it can only be run on a single node, and it cannot handle infrastructure failures natively. So it could become single point of failure.
    > This here is another spot where you seem biased. Remove the "despite v1.x versioning", it comes off as petty.
    Historically I've seen server software has fault tolerance built-in when they reach v1.0, and it's a common assumption that I've seen engineers make. So I wanted to call attention to it... The phrasing of it comes across as petty, now that you mention it. I'll remove that.
    > I get what you're saying about "production ready" but there must be another way to word it?
    I think "fault tolerance" is a better word to describe what I had in mind. I'll update this.
    > I don't think either of you support https.
    Typesense does support https natively.
    > Do you really not recommend the use of a reverse proxy? ... I guess you're assuming it's running on a stand alone computer with a public facing IP... ? Are you not providing a frontend/dashboard?
    Yes to all your questions, except that Typesense does support HTTPS natively.
    > You've made this sound like a draw back, if you had of put "None. Self-contained binary" in front of it like you did for yourself that would be fine but for this you mention a feature that you have while ignoring what looks to me like the same feature in your competitor.
    I was actually going to add "None. Self-contained binary" for Meilisearch. But then their docs explicitly recommend using a reverse proxy in front: https://docs.meilisearch.com/learn/cookbooks/running_product...
    With Typesense, we use h2o as the http library, which for eg Fastly exposes directly to internet-bound traffic and it's specifically built for handling high-volume traffic. This is why we feel comfortable recommending not putting a reverse-proxy in front of Typesense.
    > Language support... This is also a bit confusing, and I can't help but think it's probably not completely honest. What makes meilisearch different so that it doesn't support "all languages", but elasticsearch does? Meilisearch certainly claims to support all languages where words are seperated by spaces, do you support languages that don't have words separated by spaces?
    Yes, we support all languages that are space-separated. We also added support for CJK languages recently (which are not space-separated). I picked the phrasing you see under the Meilisearch column, from their docs: https://docs.meilisearch.com/learn/what_is_meilisearch/langu... (it used to read slightly different previously).
    > Meilisearch is multilingual, featuring optimized support for: > Any language that uses whitespace to separate words > Chinese > Japanese > Hebrew > Thai > We aim to provide global language support, and your feedback helps us move closer to that goal.
    > This user claims that meilisearch has better multi-language support: https://news.ycombinator.com/item?id=34708802
    We didn't support CJK languages in a GA release, until 2 weeks ago. So they are most likely talking about an earlier version of Typesense.
    
    traverseda 3 years ago
    
    Like I say, it's just my subjective gut reaction. I guess if there was one main take away it's "describe your competitors with the same language you'd use to describe yourself", where possible.
    None, completely stand-alone with built in http(s) server | None, recommends a reverse proxy
    As an example.
    >So when I say Meilisearch is not "production-ready", it's in this specific context - it can only be run on a single node, and it cannot handle infrastructure failures natively. So it could become single point of failure.
    And I don't really disagree with that, but it really is up to a judgement call on whoever is setting it up. If search isn't a critical feature than whoever is setting up might prefer meilisearch for it's memory model. For example I once worked on "the great canadian encyclopedia", which ran on a single VPS and needed search capability. It already had a single point of failure, so running search on the same VPS wasn't a big concern. There are also different roll-over policies, different uptime guarantees, different architectures, etc. If "production grade" was some kind of industry standard that would be one thing, but it really really does depend on the client.
    I think that the single point of failure thing is a very important consideration, and should probably be in your overview along side the memory/data model, but I do honestly think typesense's memory-only disqualifies it from a lot of production systems I've worked on, and that meilisearch's single point of failure hasn't. Fault-tolerance and single point of failure deserves it's own row in your overview, it shouldn't be thrown into the use-case column.
    Honestly it's only really an issue when taken all together, fix a few of those and you'll be in much better shape I think.
    
    curquiza 3 years ago
    
    We sent mails but we got no updates on them.
    
    jabo 3 years ago
    
    Hmmm, I remember those emails and I did reply to gmourier, and made almost all of the changes he pointed out, to our comparison page. Here's [1] the exact commit with the changes I made.
    The only one change I didn't make is the one about Meilisearch not being constrained by RAM, because of reports like this [2] I've seen in the past and because I saw this in your docs:
    https://docs.meilisearch.com/learn/advanced/storage.html#mem...
    >For the best performance, it is recommended to provide the same amount of RAM as the size the database takes on disk, so all the data structures can fit in memory.
    [1] https://github.com/typesense/typesense-website/commit/0103ff...
    [2] https://news.ycombinator.com/item?id=34708658
    Let me know which other ones need updating.
MrBuddyCasino 3 years ago

They do have a pretty good comparison table: https://typesense.org/typesense-vs-algolia-vs-elasticsearch-...
freewizard 3 years ago

they are very similar. I've tested both intensively a few months ago, ended up w Typesense for performance reason.
My test data set is 1.5M doc * 3-10 fields * 10-50 characters. Meilisearch has slightly better multi-language support, but typesense is much better on batch reindex speed and ram usage while a bit shy on supporting asian languages. The query speed is similar in light to medium load, I didn't stress test on query.
- qdequelen 3 years ago
  
  You should try it again since we intensively improved the indexation performances. Most of our actual users no longer have performance problems, even on hundreds of millions of documents.

rsstack 3 years ago

Is there a way to run it in WASM, to get something like Lunr[1]? We prefer to do our (small-index, <2MB) search client-side for a bunch of reasons, currently using Lunr.js, but it's a bit annoying and the typeahead search is something I improvised and not really official.

[1] https://lunrjs.com/

sandstrom 3 years ago

You could have a look at https://github.com/lucaong/minisearch/
- rsstack 3 years ago
  
  Wow, this might fit our needs much better! Thanks!
  - tmikaeld 3 years ago
    
    Hot tip, we experimented with running minisearch in RAM on cloudflare workers and it works excellent for up to 5MB of index due to it being under the 50ms CPU time.
    This means, 10M search requests for 5$. The only drawback is that it's expensive to re-index, but if your use case don't require that, it's hard to beat!
  - nickreese 3 years ago
    
    Can vouch for minisearch. Amazing for relative small data that fits in memory.
    The typeahead is great.

kristiandupont 3 years ago

I am using the core (called "Milli") in a local indexer that I run on my repositories and Obsidian files. It works like a charm and I am very happy with it. Obviously that's a use case with very little traffic but just indexing my repositories folder is quite a bit of work and it does it surprisingly fast.

The only real thing I am missing is a typeahead feature.

dureuill 3 years ago

Hello from a Meilisearch team member,
wow your project looks very interesting. How do you handle things like the filesystem changing while your indexer is offline? Do you reindex from scratch at startup?
Regarding typeahead, is this what we call "query suggestions"[1]? At the moment, we think that this is something that frontends and SDK can provide rather than the engine, so that means you wouldn't find it at the Milli level. We think you could maybe build an ancillary suggestion index and make two queries instead of one when typing, so as to get both results and suggestions at once.
Here's a chat link[2] to our latest discussions on the topic; feel free to come and weigh in if you're interested!
[1]: https://roadmap.meilisearch.com/c/31-query-suggestions
[2]: https://discord.com/channels/1006923006964154428/10685073658...
- kristiandupont 3 years ago
  
  Thank you! Yes, I reindex. I store the file timestamp along with the contents, so it's not quite as involved as it could seem but startup does take a bit. And, I don't have a good way of discovering deleted files at the moment. Not a big deal as it is, but something I will look into.
  And yes, query suggestions are exactly what I mean. Thank you for informing me, I guess I will have to look into how I can make it myself :-)
  - dureuill 3 years ago
    
    You could maybe use something equivalent to the "index hot swap"[1] feature we have at the Meilisearch level at startup, so that you make the reindexing in a another index at startup, and then atomatically swap this fresh index with the old one when it is ready? That way, you have fast startup at the cost of having possibly out-of-date information for a while after startup.
    (you could even reindex from scratch completely in the background at startup, so no need to discover deleted files at all)
    [1]: https://blog.meilisearch.com/zero-downtime-index-deployment/
    
    kristiandupont 3 years ago
    
    That's a great idea, thank you!

scop 3 years ago

Congrats! Question for the team as I see a possible discrepancy on the website.

The "Comparisons" page says there is no limit for number of indices (https://docs.meilisearch.com/learn/what_is_meilisearch/compa...)

However, the "Limitations" page says there is a limit of ~180 indices (https://docs.meilisearch.com/learn/what_is_meilisearch/compa...)

Can you clarify what, if any, are the limitations of # indices?

ferdi05 3 years ago

Thanks! Indeed we now have a limit, but this limit depends on the OS you use. The limit is 200 on Linux. We found a way to remove this limit in the next version of Meilisearch (v1.1), which will be released in approximately two months.
I would like to know the use case for needing more than 200 indexes. We have handled multi-tenant with a single index and multi-tenant tokens. https://docs.meilisearch.com/learn/security/tenant_tokens.ht...
- scop 3 years ago
  
  Multi-tenancy is indeed the use case. Our current solution involves keeping each customer's data in a separate index. I'll review the link. Thanks!

zX41ZdbW 3 years ago

You can query Meilisearch directly from ClickHouse with the integrated table function: https://github.com/ClickHouse/ClickHouse/pull/33332

This feature was a student project, and I'm not sure if it will find its usage. If you are using Meilisearch with ClickHouse, or if you think this feature is worth something, please let me know.

survirtual 3 years ago

This looks like an effective piece for a project I have. It would be significantly more effective if it was published on crates.io and could be instantiated within Rust, and was able to operate in memory (or have a filesystem passed to it, so that can be simulated)

I found this issue which tracks crates.io publication: https://github.com/meilisearch/meilisearch/issues/3367

Would be nice to see that made a priority. Having a powerful search engine that can be embedded in a larger application and made portable (like being able to deploy to WASM) would be extremely novel and valuable. Given Rust is already in use, I think it may not necessarily demand too much effort. When search becomes a focus for what I’m working on, perhaps I will make that happen if not already done yet.

Thanks for making this available to people.

jvans 3 years ago

This looks very cool, nice work. Any plans to support ANN vector searches in the near future?

qdequelen 3 years ago

Yes, it's planned!

heybrendan 3 years ago

How would one begin to use this when data is stored in MySQL, MariaDB, or PostgreSQL?

xarope 3 years ago

I tried RTFM'ing, but can Meilisearch handle restricted documents (P&C) and integrated with LDAP/AD to pull security groups?

P.S. great to see your documentation search is powered by your own product (!)

wiradikusuma 3 years ago

I see comparison against other search engines, but how does it compare to RDBMS full text search e.g. Postgre's? I know it's not apple-to-apple, but most people start with RDMBS.

bayesian_horse 3 years ago

As far as I understand it a search engine like this is meant to perform well on "Human" queries that are hard to formalize.
SQL queries, asking for records based on something like field a has to contain b or something like that are easy to formalize and fulfill by an RDBMS. But the SQL queries get hairier and hairier when the query involves multiple fields or even multiple unrelated tables. Or free form text. And those queries are harder to index.
On top of all of that, Humans often want things sorted in an order that isn't straight-forward to express in SQL. What is "relevancy"? All of that can be done in SQL, but it's not what RDBMS engines shine at.

drcongo 3 years ago

Congrats team. Meilisearch is an absolute joy to work with.

ferdi05 3 years ago

Thanks!
nynapalm 3 years ago

Thanks! :)
tpayetOP 3 years ago

Thanks :D

dawnerd 3 years ago

Very early adopter of meilisearch and it’s pretty great. But bumpy as the team found their footing but overall very impressed with it.

qdequelen 3 years ago

Thanks for your feedback!

hnaccountme 3 years ago

Anyone else having deja vu of when Java did this sort of 'X' build with Java?

fyzix 3 years ago

They need to structure their pricing page better. A quick glance had me thinking that $1200 was the minimum for production use. But the free tier is actually pay as you go.

msvan 3 years ago

How does Meilisearch compare to ElasticSearch from an operational point of view? I've experienced ElasticSearch to be quite painful to maintain, requiring lots of manual tweaking to balance shards and careful design of indices.

paraboul 3 years ago

I've been using Meilisearch in production for quite some time now, and TBH it has been one of the easiest service to maintain (I mean, it's just a single statically linked binary) and with close to zero configuration.
- AlexAltea 3 years ago
  
  Is it really "just a single statically linked binary"?
  I'd love to use Meilisearch as you describe, but their so-called SDKs are just for clients, so you still need the Meilisearch server listening on localhost.
  I would love to see something like SQLite based on Meilisearch (i.e. a fully self-contained search library like https://github.com/mchaput/whoosh). Do you know if such a thing exists?
  - paraboul 3 years ago
    
    I was referring to the server daemon, not the client libraries.
    But ofc, it's a process not an embeddable library, so you can't just link it against your app like you would with SQLite or rocksdb.
    Although it looks like it's build around their core library "milli" though (https://github.com/meilisearch/milli/), so probably something doable in the future?
    
    AlexAltea 3 years ago
    
    Thank you very much, that's precisely what I was looking for!
tpayetOP 3 years ago

That's the point! We don't ambition to compete with Elastic on everything (logs, analytics, etc). We are doing search for front-end users with a strong focus on relevancy, speed & developer experience. You can read a bit more on our documentation https://docs.meilisearch.com/learn/what_is_meilisearch/compa...
- sidmitra 3 years ago
  
  A quick question, are there any limits around number of separate indexes we can have with meilisearch? I'm thinking atleast say 20-30K separate indexes to start with.
  My use case is that i want to start creating some indexes that are "per-user" and some "per-company" where a company(customer) might have many users. This is to do some sort of double tenant isolation. I will create different keys that have permission to specific indexes and deliver those to the user somehow. My current solution does hacky things with Elasticsearch like adding query filters by user/company-id attributes in the background automatically. But since meilisearch would be customer facing, i need stronger guarantees around permissions per index.
  I tried this out a year ago on Meilsearch locally, but haven't stress tested it by creating thousands of them like production.
  Or is there a better way to do this. This is also a reason where memory-only systems like Typesense didn't make sense to me. I'm fine with taking a performance hit by going to disk to pull the right index. Not every index will be used all the time. I might also look at sharding/partitioning features if present.
  - dureuill 3 years ago
    
    Hello!
    > A quick question, are there any limits around number of separate indexes we can have with meilisearch?
    Yes! In v1.0, about 180 indexes under Linux in the same instance[1]. The good news is that I'm personally working on lifting this limitation for v1.1 (planned to release in the beginning of April), which should be able to accommodate an unlimited number of indexes[2] (disk space permits, of course).
    Note that having many indexes does have an impact on performance and will keep doing so even after v1.1.
    > Or is there a better way to do this.
    If it works for your use case, you can try using a single index (or a few indexes) with tenant tokens[3] for multitenancy.
    Hope this helps :-)
    [1]: https://docs.meilisearch.com/learn/advanced/known_limitation...
    [2]: https://github.com/meilisearch/meilisearch/issues/3382
    [3]: https://docs.meilisearch.com/learn/security/tenant_tokens.ht...

cies 3 years ago

I think multi-lingual stemming is the point where I see this as a real ES competitor. Still they've come a long way, and burning too much RAM on ES is not the way fwd either.

drifteaur 3 years ago

I've had a great experience with Meilisearch, it was very easy to set up.
But I'm not sure what's behind the claim that "it supports all languages", aside from handling unicode? Does it support stemming at all? Does it have customized stop words per language?
- qdequelen 3 years ago
  
  To answer your question precisely, we handle all the space-separated languages and have specific tokenizers for Chinese, Japanese, Korean, Thai, and Hebrew. We plan to add more languages in the future.

ckok 3 years ago

Does it have any kind of master/slave or replication abilities? Couldn't find anything in the docs.

tpayetOP 3 years ago

Hello! No we don't yet, we are considering it though.

slig 3 years ago

Is there a way to somehow find related documents to a specific document?

snowpid 3 years ago

Can you rewrite it in Rust?

curquiza 3 years ago

We will think about it :D

muhammadusman 3 years ago

how does this compare to Typesense? I'd like to see which one uses fewer resources for similar performance

qdequelen 3 years ago

Hey muhammadusman, I'm the Meilisearch's CEO. We have a complete comparison table. Note that it represents our point of view. https://docs.meilisearch.com/learn/what_is_meilisearch/compa...
Both Meilisearch and Typesense are really different regarding resource consumption and performance. I would say that where Typesense would have a better indexing performance (Meilisearch has recently improved indexation speed), Meilisearch will guarantee a much faster search performance while keeping impressive relevancy. Regarding the consumption, as Typesense is entirely on RAM and Meilisearch is using memory mapping, Meilisearch would take more disk space but less RAM.

garbagecoder 3 years ago

What language you write a program in is not a feature, definitely not a headline one.

pkolaczk 3 years ago

Languages are not tools. Languages are materials. If you buy a house you are quite likely interested in what materials were used to build it. There are different features you'd expect from a wooden house vs concrete house.
timeon 3 years ago

Maybe not for you. But for my RSS filter it definitely is.
groestl 3 years ago

Well, choice of language carries _a lot_ of implicit information for which you'd need many more words.

Settings

Meilisearch 1.0 – Open-source search engine built in Rust

Keyboard Shortcuts