GitHub - jake-low/hadoop.docker: Run a highly configurable, fully distributed Hadoop cluster with Docker!

1 min read Original article ↗

Run a Hadoop cluster in Docker

This repository is a collection of Dockerfiles which define images for various Hadoop ecosystem components. Using these images, you can easily launch a Hadoop cluster in any configuration you want, running anywhere (your laptop, a VM, or the cloud), in seconds.

Apache Hadoop is an open source framework which enables storage and processing of huge data sets. Hadoop is typically installed and run on commodity physical hardware. The term "Hadoop" has come to refer both to the core project, and to the ecosystem of data processing tools that exists around it.

Docker is an open source container engine. It allows you to package software and distribute it so that it can be run anywhere. Docker containers are frequently likened to lightweight virtual machines.

Usage

Please see the EXAMPLES directory for guidance on how to make use of these images.

Contributing

This project is in active development. Feature requests and/or PRs are welcome!