What is Confusing About Kafka?

I think Kafka is pretty simple. There are topics, producers write messages to topics and consumers read them. The rest is implementation details. My favorite reaction from my colleagues when they learn Kafka is “Is that it? I thought its difficult and put aside an entire week to learn this! Why does everyone make such…


Engineering in the Wild

1. Get rid of your manager 2. Proper design review requires the right environment 3. Always RTFM 4. Investigate the problem first … snow! 5. Always explore alternative solutions 6. Choose the path of least resistance 7. Find your drug of choice 8. Never code alone 9. Some Algorithms are more challenging: Paxos in real…

Chess Casey vs Ippo

Managing passwords in Sqoop

Sqoop makes it easy to transfer data in and out of Hadoop. In this post, we’ll cover the different options available for managing passwords, with the exception of data source specific integration such as oracle wallet. Motivation Here’s a basic Sqoop command: sqoop import –connect jdbc:mysql://example.com/sqoop –username sqoop –password sqoop –table tbl The username and password are both…


Kite Adds JSON Support

Kite’s CSV format support is one of its most popular features. It provides a quick way to get CSV data into a recommended format (Avro or Parquet), without writing an Avro schema by hand or deal directly with file layout. In the recent 0.18.0 release, Kite adds the same level of support for JSON. Kite…

La Playa Tamarindo

Parquet row group size

Lately, we’ve had a lot of people asking about the configuration settings available when you store data in Parquet format. This is a great question and I want to go over a few of the basics about the format to answer it. Row groups Even though Parquet is a column-oriented format, the largest sections of data…