Hadoop File System (HDFS)
HDFS is a Hadoop Distributed File System. The HDFS modules are split between partitions. Currently, there is an HDFS cluster within the Compute partition and the GPU partition.
Note: Permission to use HDFS is not granted automatically. You will need to request access before you can utilize HDFS.
HDFS Module
In this example, we will do simple file operations using hadoop fs
.
To list the contents of your directory within HDFS, first load the required HDFS module.
Command:
module load hdfs/hdfs-rc
Output:
## Assuming all went well, you will have no output.
## You can test to see if the module is loaded via
## `module list`
Note: currently, the module hdfs/hdfs-rc
auto loads in java/openjdk/java-1.7.0-openjdk
. If you require a different version of java, just module unload java
, then module load
the version of java you want.
Now, you can list the contents of HDFS.
Command:
hadoop fs -ls /
Output:
[user@lewis4-r710-login-node223 ~]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x - hdfs-resource hdfs-resource 0 2017-07-14 13:22 /group
drwxr-xr-x - hdfs-resource hdfs-resource 0 2017-06-16 11:31 /shared
drwxr-xr-x - hdfs-resource hdfs-resource 0 2018-02-13 14:45 /testing
drwx-wx-wx - user hdfs-resource 0 2017-06-29 11:23 /tmp
[user@lewis4-r710-login-node223 ~]$
To write a file into HDFS, use hadoop fs -put
.
$USER
in this example is your username on the cluster and $SOME_LOCAL_FILE
is the full path to a file on the local file system to be placed into HDFS.
For the output and command examples, they will be replaced with user
and awesomefile.json
respectively.
Command:
hadoop fs -put /home/user/awesomefile.json /group/rc/user/
Output:
## None, assuming all went well.
We can now list the file we just placed in HDFS using hadoop fs -ls /group/rc/user
Command:
hadoop fs -ls /group/rc/user
Output:
[user@lewis4-r710-login-node223 ~]$ hadoop fs -ls /group/rc/user
Found 3 items
-rw-r--r-- 3 user user 25 2018-02-13 15:28 /group/rc/user/awesomefile.json
To delete a file in HDFS, use hadoop fs -rm
Command:
hadoop fs -rm /group/rc/user/awesomefile.json
Output:
[user@lewis4-r710-login-node223 ~]$ hadoop fs -rm /group/rc/user/awesomefile.json
18/02/13 15:36:11 INFO fs.TrashPolicyDefault: Namenode trash configuration: ...
Deleted /group/rc/user/awesomefile.json
MRI-HDFS Modules
You can view information about the HDFS modules via module help
:
Example for Compute Partition:
[example@c12-rc4-head ~]$ module help mri/mri-hdfs
----------- Module Specific Help for 'mri/mri-hdfs' ---------------
The mri-hdfs module loads the required modules and sets the needed
environmental variables to access HDFS on the Compute Partition
Use this module within the Compute Partition only.
#------------------------------------------------------------------
# HDFS INFO
#------------------------------------------------------------------
Location : hdfs://r630-node66:9090/
WebUI URL : http://r630-node66:50070/
#------------------------------------------------------------------
Example for GPU Partition:
[example@c12-rc4-head ~]$ module help mri/mri-hdfs-gpu
----------- Module Specific Help for 'mri/mri-hdfs-gpu' -----------
The mri-hdfs-gpu module loads the required modules and sets the needed
environmental variables to access HDFS on the Compute Partition
Use this module within the Compute Partition only.
#------------------------------------------------------------------
# HDFS INFO
#------------------------------------------------------------------
Location : hdfs://r730-node74:9090/
WebUI URL : http://r730-node74:50070/
#------------------------------------------------------------------
Example Usage
All examples use the default amount of resources and assume a clean
environment without modules loaded. To clear any modules do module purge
.
In the following example, we are using srun
to submit an interactive job that
simply lists the contents of a directory within the HDFS cluster:
[example@c12-rc4-head ~]$ module load mri/mri-hdfs
[example@c12-rc4-head ~]$ srun -N 1 -p Compute hadoop fs -ls /
Found 3 items
drwxrwxr-x - sspark idas 0 2016-02-26 10:12 /idas
drwxr-xr-x - example users 0 2016-02-21 14:10 /example
drwx-wx-wx - example users 0 2016-02-21 14:10 /tmp
In this example, we are placing a file from our home directory into a folder on the HDFS cluster:
[example@c12-rc4-head ~]$ module load mri/mri-hdfs
[example@c12-rc4-head ~]$ srun -N 1 -p Compute hadoop fs -put littlelog.csv /example/
[example@c12-rc4-head ~]$ srun -N 1 -p Compute hadoop fs -ls /example
Found 1 items
-rw-r--r-- 3 example users 1399 2016-02-26 16:54 /example/littlelog.csv
The final srun
example, we will access the HDFS cluster and the file placed
in the last example using a spark-shell
(output trimmed to save space):
[example@c12-rc4-head ~]$ module load spark/spark-1.6.0-bin-hadoop2.6
[example@c12-rc4-head ~]$ srun -N 1 -p Compute --pty spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/
scala> val file = sc.textFile("hdfs://r630-node66:9090/example/littlelog.csv")
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27
scala> file.toArray.foreach(println)
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
20120315 01:17:06,99.122.210.248,[http://www.acme.com/SH55126545/VD55170364,{7AAB8415-E803-3C5D-7100-E362D7F67CA7},homestead,fl,usa](http://www.acme.com/SH55126545/VD55170364,{7AAB8415-E803-3C5D-7100-E362D7F67CA7},homestead,fl,usa)
20120315 01:34:46,69.76.12.213,[http://www.acme.com/SH55126545/VD55177927,{8D0E437E-9249-4DDA-BC4F-C1E5409E3A3B},coeur d alene,id,usa](http://www.acme.com/SH55126545/VD55177927,{8D0E437E-9249-4DDA-BC4F-C1E5409E3A3B},coeur d alene,id,usa)
20120315 17:23:53,67.240.15.94,[http://www.acme.com/SH55126545/VD55166807,{E3FEBA62-CABA-11D4-820E-00A0C9E58E2D},queensbury,ny,usa](http://www.acme.com/SH55126545/VD55166807,{E3FEBA62-CABA-11D4-820E-00A0C9E58E2D},queensbury,ny,usa)
20120315 17:05:00,67.240.15.94,[http://www.acme.com/SH55126545/VD55149415,{E3FEBA62-CABA-11D4-820E-00A0C9E58E2D},queensbury,ny,usa](http://www.acme.com/SH55126545/VD55149415,{E3FEBA62-CABA-11D4-820E-00A0C9E58E2D},queensbury,ny,usa)
20120315 01:27:53,98.234.107.75,[http://www.acme.com/SH55126545/VD55179433,{49E0D2EE-1D57-48C5-A27D-7660C78CB55C},sunnyvale,ca,usa](http://www.acme.com/SH55126545/VD55179433,{49E0D2EE-1D57-48C5-A27D-7660C78CB55C},sunnyvale,ca,usa)
20120315 02:09:38,75.85.165.38,[http://www.acme.com/SH55126545/VD55179433,{F6F8B460-4204-4C26-A32C-B93826EDCB99},san diego,ca,usa](http://www.acme.com/SH55126545/VD55179433,{F6F8B460-4204-4C26-A32C-B93826EDCB99},san diego,ca,usa)
scala>exit
[example@c12-rc4-head ~]$