I spent few days in the last two weeks attending Strata London and then BigAnalytics Israel conferences. I took some notes on what’s happening, for readers whose conference budget was spent less wisely.
Strata London took place only 3 month after Strata San Jose, so it was a bit surprising to discover that when you cross the pond, the conversation is completely different. At least I hope that crossing the pond was the reason, I refuse to consider the option that our community and ecosystem changes their interests dramatically every 3 month.
BigAnalytics Israel is a new conference, this was the first year and I have to thank Ben Lorica for letting me know about this awesome event. The event sold out at around 1000 attendees, and the place was PACKED. It is the first time I’m presenting my work in my home country and I was super stoked to discover how much interest there is in big data. Tons of startup and some of the larger enterprises were in attendance and everyone was excited about new developments in the field.
So, what are topics of interest across the pond?
- Real time data pipelines is HUGE. Everyone wants to talk about it and figure out how they can do it. Architectures are still in flux, not just in Strata, we see this internally for Cloudera and with our customers too – there are a number of plausible architectures and we have yet to figure out the best one. My white board has about 10 different diagrams right now, and I suspect many of your white boards are in similar conditions. Ted Malaska and I will discuss the different architectures we came up with at Hadoop Summit next month.
- Lambda vs. Kappa is an annoyingly big topic of debate. IMO, if you ask Lambda or Kappa, you ask the wrong question. The right questions are:
- Do we need a batch layer? What value does it add?
- Is there duplicate functionality between our batch and streaming systems? If there is, how do we make sure there’s a single code base for each functionality?
- Can we recover from errors? Can we experiment with new processing algorithms and methods?
- Real time anomaly detection is HUGE. It was big 6 month ago, and only gotten worse. So many companies are doing some variant of this system – Financial fraud, gaming, health care, bio-tech, manufacturing, security – the possibilities are endless. This is large part of what drives the real-time pipelines craze, old-school Splunk-like log processing drives the rest. Thanks for companies like Rocana (formerly ScalingData), the line between log processing and anomaly detection is growing thinner.
- People are REALLY concerned about the lack of updates on HDFS, or conversely the lack of fast SQL analytics in HBase. It looks like what everyone really wants is something that is just like Oracle only built for unstructured data, scales really well, is open source and costs almost nothing.
- CERN are using Sqoop, Parquet and Impala! And they may add Kafka soon. How awesome is that? My code running somewhere near the LHC!
- Talking about how Hadoop should be more like Oracle – it looks like everyone wants better instrumentation in Spark. This should allow better diagnosis of bottlenecks and easier performance tuning.
- Apache Flink is slowly becoming a thing. Superficially, it is a lot like Spark. When you dive in, it is a constant stream of data that can emulate micro-batches by pushing checkpoints through the stream. I forgot the name of the algorithm, but it looks pretty nifty. Whether the world needs a pretty nifty system that looks a lot like Spark is another question.
- I talked to a lot of Israeli startups this week. The differences from Silicon Valley startups are very clear – Israeli startups raise less money and are therefore very very frugal. They basically buy no support for anything in their stack – MySQL, Cassandra, ELK, Hadoop, Kafka, whatever – they will self-support (with some help from consultants) to the bitter end. Being frugal, resourceful and trusting your own skills completely is very Israeli attitude, so I’m not even slightly surprised.
It was a good conference! Hadoop Summit is just around the corner and I’m curious whether we’ll see similar themes again or something completely different.