The usage of big data seems unavoidable in current internet applications and Hadoop should be the most popular implementation of big data. The Hadoop ecosystem is very rich and relatively new to me. I am setting up a path into the ecosystem starting with HDFS.
Set up HDFS with 2 VMs:
vm1: name node and data node
vm2: data nade.
Windows 10
VMware Workstation 14 Player
CentOS 7
JDK 1.8
Hadoop 2.7.7
set up ssh between vm1 and vm2 without requiring password,
use the same directory structure for both vms,
start-dfs.sh on name node actually start both vms