Using C for a specialized data store

pixenomics.tumblr.com

39 points by saulkw 14 years ago · 18 comments

Reader

Umm, we're talking about just over 18MB here (1200 * 1000 pixels, 16 bytes/pixel, see http://pixenomics.tumblr.com/post/16895861678/how-to-send-1-...). That you can just dump over the wire as a binary blob. Why are we talking about this again? Use your favourite language, just keep it in a big blob in memory, and have fun.

saulkwOP 14 years ago

Memory isn't an issue. It's processing the data and turning the storage into a format the client can read. A big blob isn't easy to send to the client unless it's an image or something and then it becomes an issue when you want to manipulate the data or process it.
- blibble 14 years ago
  
  it's a 1000x1000 image... this is a trivial problem
  - saulkwOP 14 years ago
    
    Can you elaborate?
    
    swah 14 years ago
    
    What he means is: you wouldn't look for a "solution" for writing a 1 mb text file to disk, because its quite trivial and fast in any language.

nknight 14 years ago

Seems to me they skipped right over the most obvious option: Redis.

It's quite fast, you can use a Redis string as a random-access array up to 512MB/each, and there are several good ways to handle persistence/backup. I don't think there was a need for them to write any C themselves.

riobard 14 years ago

The 4th paragraph explained why they didn't go this way:
> We were reluctant to use a NoSQL solution as this would require retrieving the pixels through a socket, storing it in memory and then processing them. It makes more sense to process it where it’s stored.
- ori_b 14 years ago
  
  Maybe I don't understand the problem,but that sounds like some serious premature optimization. 1.2 mpix is not much data.
  - dagw 14 years ago
    
    According to the article their Node solution took 4 seconds to run (down from 7 seconds after some optimization) and their C solution 0.03 seconds. Now maybe they could have sped up their node code more, but those sort of improvements hardly count as premature optimization.
    
    ori_b 14 years ago
    
    Since the usual expected slowdown for jit compiled scripts is somewhere on the order of 5 times (obviously, this is a very loose guess, and the number will vary by script, style, and workload), I wonder what they could have been doing to cause a 200x slowdown.
saulkwOP 14 years ago

We were looking at that (as well as riak) but processing the data would require pulling all the data into PHP. I guess you could do the processing in C but it's then just as easy to store it there as well.
- Mikushi 14 years ago
  
  Have you looked into the LUA scripting option for Redis? Allows for some processing to happen on the server side, and it's quite powerful.
  - saulkwOP 14 years ago
    
    That sounds like a good option. Thanks, will note it.
- nknight 14 years ago
  
  I'm not clear why you're worried about that. Is it the pulling, or the processing?
  The pulling shouldn't be an issue -- I don't know about PHP, but in pure Python, I can pull an arbitrary 10MB string from Redis in ~85-90ms. With hiredis (C extension), that falls to about 47ms.
  I can't speak to processing, since I don't know exactly what transformations you're performing.
  - saulkwOP 14 years ago
    
    It's more the iteration of each pixel and it's neighbor (of which there are 8) making it around 9.6 million iterations.
    We will probably head towards redis in the future when precise backups are essential. Undecided what will do this processing though.
    
    nivertech 14 years ago
    
    We built GPU-accelerated NoSQL datastore. using it, this can be accelerated 100x, given you switch to binary pixel format.
    
    fsaintjacques 14 years ago
    
    Why would you use a GPU-accelerated storage when latency is the main goal?
    
    nivertech 14 years ago
    
    GPU do not accelerate raw storage retrieval, but processing, like queries and map reduce.
    Use APU / HPU, if PCIe latency is a problem.
    I understood that they running something like convolution (I.e, each pixel calculated from surrounding pixels) - this will be fast using OpenCL model).

Settings

Using C for a specialized data store

Keyboard Shortcuts