Playing with Kite in Sqoop2

Kite is a high-level data layer for Hadoop. Kite’s API is built around datasets. A dataset is a consistent interface for working with your data. Datasets are uniquely identified by URIs, e.g. dataset:hive:hive_db/hive_table. You have control of implementation details, such as whether to use Avro or Parquet format, HDFS or HBase storage, and snappy compression…

Sqoop2 integration with Sentry

Sqoop2 finally supports Role Based Access Control (RBAC) as described in this blog post. Similarly, Apache Sentry added bindings for Sqoop2 to provided RBAC as a service. Installing Sentry Sqoop integration will be released as part of Sentry 1.6.0. Until then, this feature is available in trunk: $ git clone https://github.com/apache/incubator-sentry.git $ mvn clean install –DskipTests…

Role Based Access Control in Sqoop2

Brief Introduction Sqoop 2 has recently added several security features in Sqoop 1.99.6 release, that enables its use in environments where security concerns have to be addressed, this includes: Simple authorization 3rd party authorization through Sentry This blog post will detail how to setup Sqoop2 with role based access control. Role based access control development in Sqoop2 was a…

PostgreSQL repository added to Sqoop2

If you’ve been watching Sqoop2 development the past few months, you’ve probably noticed a lot has changed. In spirit of growth and change, I’ve added another (not embedded) repository: PostgreSQL.   This is really important for Sqoop2 for a few reasons: PostgreSQL is a mature database that has HA deployments Apache Derby is no longer your…

Kerberos support in Sqoop2

Sqoop2 has a new security framework, which includes support for: Simple authentication Kerberos authentication This blog post will detail how to setup Sqoop2 with Kerberos. Bringing Kerberos support in Sqoop2 was a co-engineering effort of Intel and Cloudera. TLDR Setup Set the following configuration properties in sqoop.properties. Make sure the principals and keytab provided exist. org.apache.sqoop.authentication.kerberos.principal=sqoop/_HOST@ org.apache.sqoop.authentication.kerberos.keytab=/home/kerberos/sqoop.keytab org.apache.sqoop.authentication.kerberos.http.principal=HTTP/_HOST@ org.apache.sqoop.authentication.kerberos.http.keytab=/home/kerberos/sqoop.keytab…

Sqoop 1.99.4 Release

Introduction Sqoop 1.99.4 is the first release of Sqoop2 in roughly a year. It has gone through a few significant changes and is starting to look more like a generic data transfer tool. New features include pulling HDFS integration out into a connector, the intermediate data format, and a configuration verification tool. Also, there are a…


Sqoop 1 or Sqoop 2?

Sqoop1 and Sqoop2 are completely different code paths and, as such, have very different feature sets [1]. So, how do we know when we’re using Sqoop1 or Sqoop2? Here’s a quick list of ways: Different Usecases 1. Sqoop2 has a UI devoted to it, but Sqoop1 does not The Hue project has a wonderful UI for…