BD-1 : A Taste of HDFS of Hadoop


The usage of big data seems unavoidable in current internet applications and Hadoop should be the most popular implementation of big data. The Hadoop ecosystem is very rich and relatively new to me. I am setting up a path into the ecosystem starting with HDFS.

HDFS Environment:

Set up HDFS with 2 VMs:

  • vm1: name node and data node

  • vm2: data nade.

  • Windows 10

  • VMware Workstation 14 Player

  • CentOS 7

  • JDK 1.8

  • Hadoop 2.7.7

  • set up ssh between vm1 and vm2 without requiring password,

  • use the same directory structure for both vms,

  • start-dfs.sh on name node actually start both vms

HDFS console:

HDFS read and write:

Read:

Write:

HDFS access Java API:

  • Eclipse on Windows
  • Maven
big data