Articles by Qian Xu

Tel Aviv Near Cafe Suzanna

Playing with Kite in Sqoop2

Kite is a high-level data layer for Hadoop. Kite’s API is built around datasets. A dataset is a consistent interface for working with your data. Datasets are uniquely identified by URIs, e.g. dataset:hive:hive_db/hive_table. You have control of implementation details, such as whether to use Avro or Parquet format, HDFS or HBase storage, and snappy compression…

Lake Tahoe cliff face

A Round Trip From MySQL to Hive

Today we will show you, how to transfer data between MySQL and Hive using Sqoop. The instructions in this blog are performed on a single node Hadoop cluster with HDFS and HiveServer2 installed. Before getting started, please make sure that you have a Hadoop cluster already. You can either setup a Hadoop cluster¬†and Hive¬†from scratch…


Parquet Support Arriving in Sqoop

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases, enterprise data warehouses and NoSQL systems. Sqoop can process data in various file formats, including CSV files, sequence files or Avro data files. With the growth of Parquet popularity, there are strong requirements of…