Things I keep needing to say

4 min read Original article ↗

August 12, 2013

Some subjects just keep coming up. And so I keep saying things like:

Most generalizations about “Big Data” are false. “Big Data” is a horrific catch-all term, with many different meanings.

Most generalizations about Hadoop are false. Reasons include:

  • Hadoop is a collection of disparate things, most particularly data storage and application execution systems.
  • The transition from Hadoop 1 to Hadoop 2 will be drastic.
  • For key aspects of Hadoop — especially file format and execution engine — there are or will be widely varied options.

Hadoop won’t soon replace relational data warehouses, if indeed it ever does. SQL-on-Hadoop is still very immature. And you can’t replace data warehouses unless you have the power of SQL.

Note: SQL isn’t the only way to provide “the power of SQL”, but alternative approaches are just as immature.

Most generalizations about NoSQL are false. Different NoSQL products are … different. It’s not even accurate to say that all NoSQL systems lack SQL interfaces. (For example, SQL-on-Hadoop often includes SQL-on-HBase.)

“Big Data” doesn’t create rapid IT growth. If we only had traditional kinds of data, IT growth would be drastically negative, since Moore’s Law swamps traditional data growth. Whole new categories of data are always needed to fill the gap. And these days, they’re all categorized as “Big Data”.

The single central database is a myth. Things are never that simple, at least at large enterprises. Hence, in particular, the ideal EDW (Enterprise Data Warehouse) is a myth.

Analytic RDBMS and appliances aren’t necessarily expensive. Deals can be had. Yes, most vendors want at least a few hundred thousand dollars for most sales, but there are plenty of exceptions even to that rule. And at either large or small scales, things get very cheap, for example:

And Infobright is typically an economical option inbetween those extremes, if you’re cool with its focus on machine-generated data.

Columnar relational DBMS are relational. Examples include Sybase IQ, Vertica, ParAccel, Infobright and numerous others.

Yes, that’s a tautology. Even so, distressingly many people forget it, columnar RDBMS vendor employees not excepted.

Amazon Redshift proves very little about ParAccel. Amazon bought some stock in ParAccel, and got a cheap license to a subset of ParAccel’s code, perhaps in the same deal. Big whoop. Yes,

  • It is claimed that there are a lot of Redshift users, I presume low-end ones.
  • ParAccel is fast.*

But none of that speaks to some profound, ongoing Amazon/ParAccel/Actian relationship.

*I hear that ParAccel is usually faster than Vertica and other alternatives in POCs/benchmarks (Proofs of Concept). But I also hear that ParAccel’s installation complexity continues to be a POC problem.

New technology in old categories of application will only be adopted as quickly as firms replace their apps. Yes, that’s a tautology too. Even so, it puts an upper bound on, for example, the speed with which on-premises applications will be replaced by cloud alternatives.

SAP HANA is not yet a serious OLTP (OnLine Transaction Processing) DBMS. Yes,

But the stories of HANA sales and deployment momentum sure seem concentrated on analytic use cases. And by the way — even among analytic DBMS vendors, I don’t hear much emphasis on competing vs. HANA.

Current BI trends reflect 1990s deja vu. The hottest business intelligence products and vendors are adopted by departments, on the strength of their snazzy interfaces and short adoption cycles.* That’s exactly how BI spread in the 1990s, only now the word “visualization” gets used more.

*A common phrase for that is land-and-expand.

And finally,

I’m not impressed that your future products will in some small ways be superior to what your competitors have had in production for over a year.

Categories: Actian and Ingres, Amazon and its cloud, Benchmarks and POCs, Business intelligence, Cloud computing, Columnar database management, Data warehouse appliances, Data warehousing, Hadoop, HBase, In-memory DBMS, Infobright, Market share and customer counts, NoSQL, OLTP, ParAccel, Pricing, SAP AG, Sybase, Vertica Systems

Subscribe to our complete feed!

Comments