Settings

Theme

Extracting Subset of Common Crawl Data on Laptop

avilpage.com

1 points by chillaranand 3 years ago · 1 comment

Reader

chillaranandOP 3 years ago

Each Common crawl monthly data consists of ~100 TB. For some use cases, we don't need entire data set. We just need a subset of the data.

In this post, lets see how we can extract sub set of the data from our laptop itself.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection