We use Oozie as management application for some of our data processing pipelines. Although the Oozie developers wrote a lot of documentation on Oozie, there are several features and usecases which are covered quite minimalistic by documentation. How to configure the launcher job, for instance, is something I was only able to learn from mailing lists.
The launcher job is used by Oozie to supervise some of its actions, e.g. java or mapreduce actions. The launcher job is executed as a Hadoop job with a single map task and zero reduce tasks.
In most cases we do not care much about the launcher. However, there are some situations in which we would like to have some influence on the execution of the launcher job. For example, we wanted to run the complete data processing pipeline with priority VERY_HIGH
. Java and mapreduce Oozie actions provide a configuration
element which can be populated with arbitrary (with the exception of namenode and jobtracker) Hadoop properties. However, Oozie applies these properties only to the real actions and not to the launcher application. For this purpose, one has to add an oozie.launcher.
prefix to the corresponding Hadoop properties.
For the purpose of prioritizing the data processing pipeline with configuration parameters we added the following XML blocks to the configuration
elements to all our mapreduce and java actions:
When starting the Oozie workflows, we provide appropriate properties files which contain aoozie.launcher.mapred.job.priority ${priority} mapred.job.priority ${priority}
priority
key with the desired priority setup.
Other useful applications of the oozie.launcher.
configuration prefix could be
- to run the launcher job in another queue than the workflow jobs itselves (
oozie.launcher.mapred.job.queue.name
, see OOZIE-9) or - to use special java options like increased heap space settings for java actions (
oozie.launcher.mapred.child.java.opts
)