This is an example of how one can run a scheduled rolling HBase major compaction on a table with Oozie. The compactions are performed by an hbase shell
script which will major compact only one reigon per region-server at a time. Further, only regions which have a non-zero cost from a weight function will be compacted by default. Regions will be compacted in priority of those with the highest weight. For those using major compaction to increase data-locality compactions can be forced.
One will deploy the workflow, coordinator and files to HDFS. Then, one will submit the workflow.properties
to Oozie kicking off compactions.
- Modify
example_workflows/workflow.properties
andexample_workflows/coordinator.properties
to match your cluster configuration (look for the items in angle brackets). - Modify
example_workflows/coordinator.xml
to match your desired frequency of compaction. - Upload
example_workflows
to HDFS (E.g.hdfs dfs -copyFromLocal
example_workflows
NOTE: workflow.xml
has a hardcoded hbase
path of /usr/bin/hbase
One can follow the below steps to deploy the workflow:
- Submit the job via
oozie job -config example_workflows/workflow.properties -run
. - One can see that their table was compacted by looking in the action's YARN logs for the string "Done Compacting".
Example output from yarn logs -applicationId application_######
:
Stdoutput Regions to compact for table clay_test:
Stdoutput myhost1.example.com,60200,1556069347115 has 1 region(s) to compact
Stdoutput Compacting myhost1.example.com,60200,1556069347115 region 1d8d46167cdd550b4ac10363c0982191
Heart beat
Heart beat
Stdoutput myhost1.example.com,60200,1556069347115 region 1d8d46167cdd550b4ac10363c0982191
Stdoutput Done compacting in 68.4029998779297 seconds
One can follow the below steps to deploy the coordinator:
- Submit the job via
oozie job -config example_workflows/coordinators.properties -run
recording the coordinator ID returned. - Verify that only one workflow job is running via
oozie job -info <coordinator ID>
One may run the rolling_compaction.rb
script manually via:
$ export table_name="<your table>"
$ export force_compaction="true|false"
$ ./rolling_compaction.rb
If one has an hbase
binary not at /usr/bin/hbase
, one can run:
$ export table_name="<your table>"
$ export force_compaction="true|false"
$ <path to your hbase binary> shell ./rolling_compaction.rb