Bleve – A modern text indexing library for Go

107 points by tuneladora 11 years ago · 24 comments

Reader

alrs 11 years ago

Wishful thinking: I hope this prefaces an Elasticsearch rewrite in Go. I'm happy to be rid of the JVM wherever I find it.

No hard feelings, but we just spent 20 years crowbaring Microsoft into irrelevance. It's time to get started on Larry.

threeseed 11 years ago

Why on earth would you rewrite ElasticSearch in Go ?
You would have woeful monitoring support compared to the huge array of tooling for JVM, an average GC implementation and almost zero integration with most enterprise applications e.g. Hadoop. All so what, you can go from one multinational (Oracle) to another (Google).
- krakensden 11 years ago
  
  I could totally imagine wanting to rewrite ElasticSearch, and go seems like a reasonable language to do it in. ES seems to handle downtime extremely poorly, and is very difficult to make "safe" for people to write arbitrary queries against it. And by difficult I mean there is circuit breaker and nought else.
- throwawaykf05 11 years ago
  
  >All so what, you can go from one multinational (Oracle) to another (Google).
  Or from one Larry to another.
  - alrs 11 years ago
    
    Valid, though I never have to click through any kind of EULA crap to use Go.
    So long as Go is supported in gcc, I'm not particularly worried.
justinsb 11 years ago

Presumably you just want to replace the _Oracle_ JVM? Does OpenJDK meet your needs? How about Dalvik / ART or RoboVM?
I'd love to hear what you're trying to achieve here ... I have often bemoaned the JVM, but have also found that it's often better than many alternatives.
- threeseed 11 years ago
  
  There is a whole range of JVMs: http://en.wikipedia.org/wiki/List_of_Java_virtual_machines
- alrs 11 years ago
  
  Whenever you get in trouble with Elasticsearch, the first question a consultant will ask is, "why aren't you running the Oracle JVM?"
  Technically it supports OpenJDK. In practice, circumstance usually drags you to Oracle.
bkeroack 11 years ago

The problem with Elasticsearch is not the JVM.
A sample of the problems I have:
a) Broken, hand-rolled consensus/consistency model [1]
b) Designing cluster layout is a bit of a "black art" due to missing/incomplete documentation and complexity of cluster options.
c) Promiscuous network activity with default settings: spin up a new node and it will cluster with any existing ES nodes automatically.
That's just the beginning. ES is a bit of an operations nightmare to deal with--it's an example of enormous complexity just for a fairly simple service (text search).
1. http://aphyr.com/posts/317-call-me-maybe-elasticsearch
pjmlp 11 years ago

If only Go offered the same abstraction capabilities of any JVM language.
swah 11 years ago

Aren't we going to fight Google as well?

SaberTail 11 years ago

Is the name meant to bring to mind a large explosion?

http://en.wikipedia.org/wiki/Boiling_liquid_expanding_vapor_...

mschoch 11 years ago

Author here, will try to answer questions as best I can from my phone.
Yes, I think I was watching one of those amazing explosions shows on Netflix. I liked the way it sounded, and it suggested some simple logo ideas.
- drsintoma 11 years ago
  
  Have you run any benchmarks? Do you have any rough estimations on how does it compare to Lucene in terms of speed and memory usage?
  - mschoch 11 years ago
    
    We haven't yet. Our philosophy has been to get the features and API right first.
    Though there is some low hanging fruit I'd like to tackle this week. First there will be some bleve only indexing and querying benchmarks to validate those improvements. Then we should be in better shape to do comparisons against lucene.
    
    arafalov 11 years ago
    
    Is the goal to compete with Lucene or with Solr/ElasticSearch layer as well? It was a little hard to tell from the samples if - for example - configuration is coded in (like in Lucene) or has externalized XML/REST interface (like in Solr/ES). Same, with whether you are planning to scale beyond single embedded instance and need to worry about scaling/cloud/etc.
    
    mschoch 11 years ago
    
    Thanks for taking an interest.
    I would say that we're most comparable to Lucene, given that we're a library not a server. But, we're also very much not a port of Lucene to go. That already exists, and someone has posted that link elsewhere in this thread. I've worked a fair amount with Elasticsearch, so a lot of the higher level API is inspired by it. And thats one of the biggest differences between Bleve and Lucene today. Lucene is pretty low-level. The bleve top-level API works a lot more like Elasticsearch (specifically there is a mapping, its serialized and stored in the index). There is a lower level that looks more like Lucene (everything is just documents and fields), and the top-level API builds on top of that.
    That is I would say the primary goal of Bleve, to be a great text indexing and search library for Go.
    Next you asked about a REST interface. Bleve ships with HTTP handlers in a sub-package. These are completely optional, you can use Bleve without them, or you could write your own if you don't like these. There is a not-quite-done-yet example app that uses these to make it look a lot like a single-node Elasticsearch. But, its just an example app, I have no immediate plans to try to turn this into a production app.
    Regarding scaling beyond a single embedded instance... Obviously there is a lot of interest in this area. The way I'm looking at this right now is that this is simply a layer above Bleve, much as Elasticsearch/Solr provide this as a layer above Lucene. There are a few enhancements we could add to streamline this, like allowing the Searches to operate across multiple indexes.
    My hope is that lots of apps spring up around Bleve. Maybe someone wires up the HTTP handlers and Raft and makes an Elasticsearch clone. Maybe people will do something I don't expect. Will we go there? Not right now, we've got our hands full. But if we do, it will be in a separate project, built on top of Bleve.

kawera 11 years ago

Looks very interesting. I've been playing with golucene[1], another alternative for those looking for a jvm-free indexing/search engine.

[1] https://github.com/balzaczyy/golucene

kapilvt 11 years ago

Looks Great! Looks like a loosely equivalent structure to lucene, and is a feature i've wanted in the golang ecosystem for a while.

kolev 11 years ago

Main website: http://www.blevesearch.com/

sciurus 11 years ago

Any plans to support the lucene query parser syntax?

mschoch 11 years ago

We have support for limited version of it right now. The syntax we support is documented here:
https://github.com/blevesearch/bleve/wiki/Query-String-Query
You can see an example query using it executed here:
http://wikisearch.blevesearch.com/search/?q=name:query%20%2B...
A few things that are missing from the syntax. Numeric ranges only support simple one-sided syntax (age:>25), not a two-sided range (age:[25 to 30]). Date ranges aren't supported yet. Complex boolean expressions using parens, AND and OR. Those are things that we support in structured queries, just haven't wired up the grammar for it yet. There are handful of others we don't support at all yet, like wildcard, and proximity searches.
The grammar is also still pretty simple, with an emphasis on making simple things work, not on things like escaping special characters within expressions.

Settings

Bleve – A modern text indexing library for Go

Keyboard Shortcuts