Settings

Theme

Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine

blog.vespa.ai

66 points by martinp 8 years ago · 5 comments

Reader

mkj 8 years ago

There's a lot in there.

Cluster file distribution with bittorrent https://github.com/vespa-engine/vespa/tree/master/filedistri...

toast0 8 years ago

If someone was familiar with Vespa in 2011, but hasn't had access to it until now, what's new since then?

  • tedd4u 8 years ago

    At Flickr, we worked closely with the Vespa team from 2011 through 2016 on a wide range of advancements:

       * partial document refeeding (i.e. expedite indexing a new field to 20+ billion documents without refeeding everything and staying online handling 100M+ free text queries a day)
       * visual similarity search - check out the tensor ranking features [1] [2]
       * online elasticity - add/remove replicas / shards online. A must when it could take weeks+ to re-feed from scratch. This is non-trivial to make work smoothly at scale. 
       * latency / tail-latency on complex queries. p90 reduction from 3,000 to 30 ms.
    
    This is a major gift to the open-source community of a battle-tested search engine that works reliably without babysitting with very large datasets, and simultaneous high query / high feed volumes. Huge debt of gratitude to the team in Trondheim and Verizon/Oath/Yahoo legal & management teams for making this happen. :+1:

    [1] http://docs.vespa.ai/documentation/tensor-intro.html [2] http://docs.vespa.ai/documentation/tensor-user-guide.html

  • RealJon 8 years ago

    Not precisely sure where we were in 2011, but I think these are the biggest ones that came after, off the top of my head (i.e sure to be missing something):

      - Merging content and index clusters to one to make index clusters elastic and auto-recovering on data loss.
      - Fully realtime writes.
      - Support more advanced machine-learned ranking through tensors.
      - Streaming (personal) search supporting a large write rate.
      - Document references.
      - WAND and RANK operators.
      - Rank features over multivalue text fields.
      - Predicate fields.
      - Lots and lots of performance work.
groodt 8 years ago

Powers bits of Flickr. Interesting.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection