Lake Tahoe cliff face

A Round Trip From MySQL to Hive

Today we will show you, how to transfer data between MySQL and Hive using Sqoop. The instructions in this blog are performed on a single node Hadoop cluster with HDFS and HiveServer2 installed. Before getting started, please make sure that you have a Hadoop cluster already. You can either setup a Hadoop cluster and Hive from scratch…

Chess Casey vs Ippo

Managing passwords in Sqoop

Sqoop makes it easy to transfer data in and out of Hadoop. In this post, we’ll cover the different options available for managing passwords, with the exception of data source specific integration such as oracle wallet. Motivation Here’s a basic Sqoop command: sqoop import –connect jdbc:mysql://example.com/sqoop –username sqoop –password sqoop –table tbl The username and password are both…

La Playa Tamarindo

Sqoop 1.99.4 Release

Introduction Sqoop 1.99.4 is the first release of Sqoop2 in roughly a year. It has gone through a few significant changes and is starting to look more like a generic data transfer tool. New features include pulling HDFS integration out into a connector, the intermediate data format, and a configuration verification tool. Also, there are a…


Parquet Support Arriving in Sqoop

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases, enterprise data warehouses and NoSQL systems. Sqoop can process data in various file formats, including CSV files, sequence files or Avro data files. With the growth of Parquet popularity, there are strong requirements of…


Sqoop 1 or Sqoop 2?

Sqoop1 and Sqoop2 are completely different code paths and, as such, have very different feature sets [1]. So, how do we know when we’re using Sqoop1 or Sqoop2? Here’s a quick list of ways: Different Usecases 1. Sqoop2 has a UI devoted to it, but Sqoop1 does not The Hue project has a wonderful UI for…