Settings

Theme

Replica Strategy in Hdfs Is Not Good Enough

notcode.github.io

2 points by garfee 12 years ago · 1 comment

Reader

brugidou 12 years ago

Comparing to mongodb is a joke.

However some more advanced strategies should be applied for very large hdfs clusters. The rack aware strategy is actually better than what is described because the probability distribution is not perfectly uniform. It all depends on the hardware, the location... Etc. But with a very large number of blocks the probability of loosing data with 3 nodes failure is close to 1 unfortunately.

We could try to imagine a better strategy having replicas in cliques of nodes to mitigate the risk. Its a tradeoff of loosing more data with less probability or less data with high probability I guess? Haven't done the math :)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection