Settings

Theme

Using C for a specialized data store

pixenomics.tumblr.com

39 points by saulkw 14 years ago · 18 comments

Reader

JoachimSchipper 14 years ago

Umm, we're talking about just over 18MB here (1200 * 1000 pixels, 16 bytes/pixel, see http://pixenomics.tumblr.com/post/16895861678/how-to-send-1-...). That you can just dump over the wire as a binary blob. Why are we talking about this again? Use your favourite language, just keep it in a big blob in memory, and have fun.

  • saulkwOP 14 years ago

    Memory isn't an issue. It's processing the data and turning the storage into a format the client can read. A big blob isn't easy to send to the client unless it's an image or something and then it becomes an issue when you want to manipulate the data or process it.

    • blibble 14 years ago

      it's a 1000x1000 image... this is a trivial problem

      • saulkwOP 14 years ago

        Can you elaborate?

        • swah 14 years ago

          What he means is: you wouldn't look for a "solution" for writing a 1 mb text file to disk, because its quite trivial and fast in any language.

nknight 14 years ago

Seems to me they skipped right over the most obvious option: Redis.

It's quite fast, you can use a Redis string as a random-access array up to 512MB/each, and there are several good ways to handle persistence/backup. I don't think there was a need for them to write any C themselves.

  • riobard 14 years ago

    The 4th paragraph explained why they didn't go this way:

    > We were reluctant to use a NoSQL solution as this would require retrieving the pixels through a socket, storing it in memory and then processing them. It makes more sense to process it where it’s stored.

    • ori_b 14 years ago

      Maybe I don't understand the problem,but that sounds like some serious premature optimization. 1.2 mpix is not much data.

      • dagw 14 years ago

        According to the article their Node solution took 4 seconds to run (down from 7 seconds after some optimization) and their C solution 0.03 seconds. Now maybe they could have sped up their node code more, but those sort of improvements hardly count as premature optimization.

        • ori_b 14 years ago

          Since the usual expected slowdown for jit compiled scripts is somewhere on the order of 5 times (obviously, this is a very loose guess, and the number will vary by script, style, and workload), I wonder what they could have been doing to cause a 200x slowdown.

  • saulkwOP 14 years ago

    We were looking at that (as well as riak) but processing the data would require pulling all the data into PHP. I guess you could do the processing in C but it's then just as easy to store it there as well.

    • Mikushi 14 years ago

      Have you looked into the LUA scripting option for Redis? Allows for some processing to happen on the server side, and it's quite powerful.

    • nknight 14 years ago

      I'm not clear why you're worried about that. Is it the pulling, or the processing?

      The pulling shouldn't be an issue -- I don't know about PHP, but in pure Python, I can pull an arbitrary 10MB string from Redis in ~85-90ms. With hiredis (C extension), that falls to about 47ms.

      I can't speak to processing, since I don't know exactly what transformations you're performing.

      • saulkwOP 14 years ago

        It's more the iteration of each pixel and it's neighbor (of which there are 8) making it around 9.6 million iterations.

        We will probably head towards redis in the future when precise backups are essential. Undecided what will do this processing though.

        • nivertech 14 years ago

          We built GPU-accelerated NoSQL datastore. using it, this can be accelerated 100x, given you switch to binary pixel format.

          • fsaintjacques 14 years ago

            Why would you use a GPU-accelerated storage when latency is the main goal?

            • nivertech 14 years ago

              GPU do not accelerate raw storage retrieval, but processing, like queries and map reduce.

              Use APU / HPU, if PCIe latency is a problem.

              I understood that they running something like convolution (I.e, each pixel calculated from surrounding pixels) - this will be fast using OpenCL model).

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection