Tuesday, February 7, 2012

Configure Oozie's Launcher Job

We use Oozie as management application for some of our data processing pipelines. Although the Oozie developers wrote a lot of documentation on Oozie, there are several features and usecases which are covered quite minimalistic by documentation. How to configure the launcher job, for instance, is something I was only able to learn from mailing lists.

The launcher job is used by Oozie to supervise some of its actions, e.g. java or mapreduce actions. The launcher job is executed as a Hadoop job with a single map task and zero reduce tasks. In most cases we do not care much about the launcher. However, there are some situations in which we would like to have some influence on the execution of the launcher job. For example, we wanted to run the complete data processing pipeline with priority VERY_HIGH. Java and mapreduce Oozie actions provide a configuration element which can be populated with arbitrary (with the exception of namenode and jobtracker) Hadoop properties. However, Oozie applies these properties only to the real actions and not to the launcher application. For this purpose, one has to add an oozie.launcher. prefix to the corresponding Hadoop properties.

For the purpose of prioritizing the data processing pipeline with configuration parameters we added the following XML blocks to the configuration elements to all our mapreduce and java actions:


    oozie.launcher.mapred.job.priority
    ${priority}


    mapred.job.priority
    ${priority}

When starting the Oozie workflows, we provide appropriate properties files which contain a priority key with the desired priority setup.

Other useful applications of the oozie.launcher. configuration prefix could be

  • to run the launcher job in another queue than the workflow jobs itselves (oozie.launcher.mapred.job.queue.name, see OOZIE-9) or
  • to use special java options like increased heap space settings for java actions (oozie.launcher.mapred.child.java.opts)

Thursday, February 2, 2012

Base64 Decoding with Eclipse

There are very few things in software development that are equally annoying as localization topics, especially dealing with dates in different timezones and/or - and here is my all time favorite - encoding issues.

We have a lot of data and use a bunch of different technologies, languages, and platforms to process the data. With regard to encoding topics, this does not help much either. Someone in the company decided it could be a good idea to encode critical data, especially strings that are not under our direct control, with Base64 encoding. In this way, data exchange between different platforms and languages can be restricted to exchange (relatively simple) ASCII data.

And thus, we now have to deal a lot with Base64 encoded data. During creation of unit tests, debugging, or manual validation of productive data, there is a frequent need to decode Base64 literals. Most times I used one of the many free online tools for this purpose. Although these tools do what they promise, the associated workflow is kind of messy: step through a unit test in Eclipse, copy some Base64 string into clipboard, switch to the browser, find and open one of these conversion tools - if not already open, convert the string, copy the result, and take it back into Eclipse.

However, after a little preparation, leaving Eclipse is completely unnecessary. There is an Eclipse feature called External Tool Configurations which allows to execute arbitrary commands directly from Eclipse. On the other hand there is Groovy with its famous -e option to execute code in-line. Combining these two, it is possible to execute some Groovy helper code directly from Eclipse. With the help of meta programming Groovy extended Java's String class with several features, one of them a build-in Base64 decoding method. The remainder of this post describes how to configure a simple Base64 decoding tool in Eclipse.

  1. Open the External Tool Configuration dialog:
  2. Create a new configuration, give it a name, specify the path to the Groovy executable, and finally insert the code.
    The Arguments: text area contains the following code:
    -e "def input = '${string_prompt:Base64 decoding}';  
    println new String(input.decodeBase64())"
    
    The Eclipse variable ${string_prompt} makes a popup dialog appear which promts for an input value.
  3. Save the configuration

Base64 decoding can now be executed the following way:

  1. Select the newly created Tool
  2. Insert the string to convert and start conversion
  3. Read the result from the Console view