Realtime funnel analysis using Solr and Cassandra

blog.getjaco.com

43 points by BlueHotDog2 10 years ago · 5 comments

Reader

>For funnel analysis, it’s not feasible to use this data model for getting back a summary of the funnel steps and the sessions matching it, since there’s no option in Solr to run a recursive query, which would allow to go over each session and check if it’s a match for the funnel.

I don't think this approach scales, even in an environment that supports recursive queries like PostgreSQL.

The more scalable approach would be to use either a commercial database systems with explicit support for pattern matching or encode conversion path as a string (ex: "top page -> product page with SKU=1337 -> Purchase" becomes "T_SKU1337_P") and use REGEX/GROUP BY.

In all cases, this sounds like a suboptimal use case for either Solr or Elasticsearch.

itayadler 10 years ago

Why do you think this approach isn't scalable? would love to hear your input on that. Also what commercial database systems do you think will be good for this?
- ktamura 10 years ago
  
  The suggested approach most likely requires a lot of recursive backtracking. Of course, there's an efficient way to implement this, and that's what most commercial databases' path analytics features do. Here's one example by Oracle: https://docs.oracle.com/database/121/DWHSG/pattern.htm
  I've always found it befuddling why so many developers want to use Solr/Elasticsearch for analytics heavylifting. It's probably because
  1. SQL is not the most intuitive (although most pervasive) API for data analysis
  2. Much of the data is already in Solr/Elasticsearch to make your data searchable/perform simple roll-ups and filtering, etc., so it'd be great if you can do more complex analytics against them as well
  AS to why Solr/Elasticsearch is not ideal: the existence of superior alternatives that is OLAP databases.

graffitici 10 years ago

Why do people use C* in addition to ES? It seems like in this case most of the data could directly be piped into ES?

I understand that ES can lose data, or have some data storage problems, but one could just as well store all the incoming data on Hadoop or so, without having to bother with C*, no?

itayadler 10 years ago

C* makes it much easier to manage a cluster of Solr as the data grows (specifically with DSE), as with the tight integration you get all the benefits of C*. (HA, eventual consistency, multi-dc replication..)

Settings

Realtime funnel analysis using Solr and Cassandra

Keyboard Shortcuts