2019-12-10

The usage of big data seems unavoidable in current internet applications and Hadoop should be the most popular implementation of big data. The Hadoop ecosystem is very rich and relatively new to me. I am setting up a path into the ecosystem starting with HDFS.

HDFS Environment:

Set up HDFS with 2 VMs:

vm1: name node and data node
vm2: data nade.
Windows 10
VMware Workstation 14 Player
CentOS 7
JDK 1.8
Hadoop 2.7.7
set up ssh between vm1 and vm2 without requiring password,
use the same directory structure for both vms,
start-dfs.sh on name node actually start both vms

BD-1 : A Taste of HDFS of Hadoop

HDFS Environment:

HDFS console:

HDFS read and write:

Read:

Write:

HDFS access Java API: