Recently, I meet a scenario where HDFS users want to migrate all the data of the old volumes to newly added volumes. Although HDFS now has a DiskBalancer tool, but it dosen't meet the requirement of us. So, we develop a new tool DiskMigration, which can migrate all the data in the current volumes to the new volumes and keep balance of data distribution at the same time. As follows: The result our tool can get: migrate & balance
We try to decide the data quota for new volumes by the their capacity, and distribute it by multi steps.
You only need to replace the hdfs.jar and hdfs-client.jar of you HDFS cluster.
hdfs diskbalancer -plan node1 -type diskMigrate
will create the migrate plan, and save it as json in HDFS:/system/diskbalancer/2017-Jul-25-13-46-17/node1.plan.json, for example.
hdfs diskbalancer -execute /system/diskbalancer/2017-Jul-25-13-46-17/node1.plan.json
will process background, you can also query its processing status by:
hdfs diskbalancer -query node1:9867
If it outputs 'Result: PLAN_DONE', the migrate has finished. The result can be showed by linux commandline:
df
There are two new empty disks
[root@node1 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 41152832 14576524 24479208 38% /
tmpfs 4030552 0 4030552 0% /dev/shm
/dev/vdb 103081248 62540 97775828 1% /tmp/hadoop1
/dev/vdc 103081248 4849888 92988480 5% /tmp/hadoop2
/dev/vdd 103081248 61112 97777256 1% /tmp/hadoop3
All the data of old disk has been migrated to new disks.
[root@node1 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vda1 41152832 14576528 24479204 38% /
tmpfs 4030552 0 4030552 0% /dev/shm
/dev/vdb 103081248 2325524 95512844 3% /tmp/hadoop1
/dev/vdc 103081248 61116 97777252 1% /tmp/hadoop2
/dev/vdd 103081248 2587360 95251008 3% /tmp/hadoop3
https://github.com/liumihust/ecs.hadoop/blob/master/DiskMigrationForHDFS.md