Articles by Abraham Elmahrek

Abraham Elmahrek

I'm a software engineer interested in all things Big Data. I've worked on attribution systems, big data interfaces, and ingest pipelines. I'm a bit of an open book. If you have any questions, feel free to reach out.

Repository of alcohol

PostgreSQL repository added to Sqoop2

If you’ve been watching Sqoop2 development the past few months, you’ve probably noticed a lot has changed. In spirit of growth and change, I’ve added another (not embedded) repository: PostgreSQL.   This is really important for Sqoop2 for a few reasons: PostgreSQL is a mature database that has HA deployments Apache Derby is no longer your…

Chess Casey vs Ippo

Managing passwords in Sqoop

Sqoop makes it easy to transfer data in and out of Hadoop. In this post, we’ll cover the different options available for managing passwords, with the exception of data source specific integration such as oracle wallet. Motivation Here’s a basic Sqoop command: sqoop import –connect jdbc:mysql:// –username sqoop –password sqoop –table tbl The username and password are both…

2014-12-25 16.50.40

Kerberos support in Sqoop2

Sqoop2 has a new security framework, which includes support for: Simple authentication Kerberos authentication This blog post will detail how to setup Sqoop2 with Kerberos. Bringing Kerberos support in Sqoop2 was a co-engineering effort of Intel and Cloudera. TLDR Setup Set the following configuration properties in Make sure the principals and keytab provided exist. org.apache.sqoop.authentication.kerberos.principal=sqoop/_HOST@ org.apache.sqoop.authentication.kerberos.keytab=/home/kerberos/sqoop.keytab org.apache.sqoop.authentication.kerberos.http.principal=HTTP/_HOST@ org.apache.sqoop.authentication.kerberos.http.keytab=/home/kerberos/sqoop.keytab…

La Playa Tamarindo

Sqoop 1.99.4 Release

Introduction Sqoop 1.99.4 is the first release of Sqoop2 in roughly a year. It has gone through a few significant changes and is starting to look more like a generic data transfer tool. New features include pulling HDFS integration out into a connector, the intermediate data format, and a configuration verification tool. Also, there are a…