Sqoop to HBase with Oozie

Importing data directly into HBase via an Oozie workflow can be confusing. In this post, we’ll investigate using Sqoop to import data from MySQL into HBase executed by Oozie.

THE BASICS

Using Sqoop to import data into HBase via Oozie can be accomplished in 3 steps:

  1. Copy your hbase-site.xml, MySQL JDBC jar, and HBase jars into HDFS
  2. Create an Oozie workflow that has a Sqoop action, includes the hbase-site.xml in the “job-xml” element of the action, includes the mysql JDBC jar in a “file” element of the action, and includes the required HBase JARs in an “file” element of the action.
  3. Run your Oozie workflow

EXAMPLE

Here is an example workflow:

<workflow-app name="sqoop-import-hbase" xmlns="uri:oozie:workflow:0.4">
  <start to="sqoop-import"/>
  <action name="sqoop-import">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <job-xml>/user/admin/hbase-site.xml</job-xml>
      <command>import --connect jdbc:mysql://example.com/sqoop  --username test --password test --table test --hbase-table test --column-family cf --hbase-row-key id --hbase-create-table</command>
      <file>/user/admin/mysql-connector-java-5.1.23.jar#mysql-connector-java-5.1.23.jar</file>
      <file>/user/admin/hbase-client-0.98.6-cdh5.3.0.jar#hbase-client-0.98.6-cdh5.3.0.jar</file>
      <file>/user/admin/hbase-common-0.98.6-cdh5.3.0.jar#hbase-common-0.98.6-cdh5.3.0.jar</file>
      <file>/user/admin/hbase-protocol-0.98.6-cdh5.3.0.jar#hbase-protocol-0.98.6-cdh5.3.0.jar</file>
      <file>/user/admin/htrace-core-2.04.jar#htrace-core-2.04.jar</file>
      <file>/user/admin/hbase-server-0.98.6-cdh5.3.0.jar#hbase-server-0.98.6-cdh5.3.0.jar</file>
      <file>/user/admin/hbase-hadoop-compat.jar#hbase-hadoop-compat.jar</file>
      <file>/user/admin/high-scale-lib-1.1.1.jar#high-scale-lib-1.1.1.jar</file>
    </sqoop>
    <ok to="end"/>
    <error to="kill"/>
  </action>
  <kill name="kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <end name="end"/>
</workflow-app>

And its corresponding hbase-site.xml:

<configuration>
  <property>
    <name>hbase.master.ipc.address</name>
    <value>0.0.0.0</value>
  </property>
  ...
</configuration>

KEY POINTS

  1. hbase-site.xml should be uploaded to HDFS and included in the workflow.xml. In the above example, thehbase-site.xml file has been uploaded to /user/admin/hbase-site.xml in HDFS.
  2. The MySQL JDBC jar should be uploaded to HDFS and included in the workflow.xml. In the above example, the MySQL JDBC jar file has been uploaded to /user/admin/mysql-connector-java-5.1.23.jar.
  3. Make sure to include the share lib by setting oozie.use.system.libpath to “true” in the oozie job configuration.
  4. Some HBase JARs should be included with the Oozie workflow. When I was testing this out, the following JARs were needed:
    • hbase-client.jar
    • hbase-common.jar
    • hbase-protocol.jar
    • htrace-core.jar
    • hbase-server.jar
    • hbase-hadoop-compat.jar
    • high-scale-lib.jar

TROUBLE SHOOTING

Q: CLASSNOTFOUND EXCEPTION IN THE TASK LOGS OF THE LAUNCHER JOB:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], main() threw exception, org/cliffc/high_scale_lib/Counter
java.lang.NoClassDefFoundError: org/cliffc/high_scale_lib/Counter

A: A JAR file is missing. It needs to be uploaded to HDFS and included in the Oozie workflow via the “file” element.

Q: MISSING DRIVER CLASS IN LAUNCHER JOB LOGS:

ERROR org.apache.sqoop.Sqoop  - Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver

A: Driver can’t be found. Including the JDBC driver jar in a “file” element of your workflow should fix this.

Q: GSS INITIATE FAILED EXCEPTION IN THE TASK LOGS OF THE LAUNCHER JOB:

SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

A: Oozie credentials were not provided in a secure cluster. Try adding credentials to the workflow:

<workflow-app name="sqoop-import-hbase" xmlns="uri:oozie:workflow:0.4">
  <credentials>
    <credential name="hbase" type="hbase">
    </credential>
  </credentials>
  ...
  <action name="sqoop-import" cred="hbase">
    ...
  </action>
  ...
</workflow-app>

You’ll also need to provide credentials in the job properties that are submitted with the workflow:

credentials={"hbase": {"xml_name": "hbase", "properties": []}}

SUMMARY

Hopefully this helps you schedule your sqoop to HBase jobs.

The code examples are available at https://github.com/ingesttips/examples.

E: team@ingest.tips!

T: @abeaamase

Tweet about this on TwitterShare on FacebookShare on LinkedIn


'Sqoop to HBase with Oozie' have 2 comments

  1. February 26, 2015 @ 5:32 pm Nagaraj

    Excellent explanation. I followed the steps and it is working.

    Reply


Would you like to share your thoughts?

Your email address will not be published.