Listing files
hdfs dfs -ls
hdfs dfs -ls /path
Manipulating with files
hdfs dfs -rm -r hdfs://path/to/file
Commands are similar to Unix FS-related ones: ls, mkdir, rmdir, cp, mv, rm...
Files replication factor
- can be set per file
- could not be set per directory for new files (= all new files will have the replication factor which is globally set for the whole HDFS)
- to examine current replication factor of a file, use hdfs dfs -ls and see that number after permissions column
- set the new factor: hdfs dfs -setrep 2 /path/to/file
Decommissioning the datanode
-
Put the node address into the file <HADOOP_CONF_DIR>/dfs.exclude
-
Refresh data nodes using command su hdfs -c "hdfs dfsadmin -refreshNodes"
-
Watch HDFS admin website (port 50070) for the progress of decomissioning.
More info on Hortonworks site.
Rebalancing HDFS
HDFS is rebalanced automatically when adding new data blocks, but when you change the capacity of data node, it will not be balanced automatically.