What is Confusing About Kafka?

I think Kafka is pretty simple. There are topics, producers write messages to topics and consumers read them. The rest is implementation details. My favorite reaction from my colleagues when they learn Kafka is “Is that it? I thought its difficult and put aside an entire week to learn this! Why does everyone make such a big deal out of Kafka?” The best systems are deceptively simple – like chess, take an hour to learn and years to master.

But there’s a big difference between “deceptively simple” and “deceive yourself into believing your system is simple when its actually pretty challenging”.

At a recent call, Neha said “The most confusing behavior we have is how producing to a topic can return errors for few seconds after the topic was already created”. As she said that, I remembered that indeed, this was once very confusing, but then I got used to it.  Which got us thinking: What other things that Kafka does are very confusing to new users, but we got so used to them that we no longer even see the issue?

So, we conducted a highly unscientific twitter survey and got the following results. I’m publishing them here with some comments:

  1. Sending messages after creating a topic doesn’t work. What?! (by @nehanarkhede)
  2. (Partition) Reassignments could be simplified. just set the repl factor for a topic, autofit mode will generate and apply new schema. (by @stevenleroux)
    Note: This should be addressed in KAFKA-1678. And there are tools that help: https://github.com/mesos/kafka#rebalancing-brokers-in-the-cluster
  3. Controller : async ops, get a better insight view of current operations & statuses. e.g. deleted topics in zk, reassigns cp status (by @stevenleroux)
  4. When high level consumer commits offset, it commits for all partitions it’s processing at once. Hard to externalize offset. (by @weschow)
  5. Does “the Java consumer API” count? (by @angrynoah)
    Note: This will be resolved in the next release with the new consumer API
  6. The sticky partitioning producer. (by @miguno) and “messages are not randomly distributed to all partitions when they are key-less!” (by @gbuisson)
    Note: This is fixed in the new producer, available in Kafka 0.8.2.0
  7. Inability of brokers to bind to both external/internal IPs at the same time (by @vanyatka)
  8. issues with hostnames in inter-node comm. when they don’t match up. Not really warned about in the docs, really confusing to debug (by @odwyerrob)
    Note: Docs could definitely be better here. We explain this in the FAQ, but users only find this after lots of confusion.
  9. Mirror-maker does not stop consuming when target cluster is down (by @erik_van_oosten)
  10. inability to delete a topic is the biggest WAT for devs at Chartbeat (by @djerrynyc)
    Note: We thought this is fixed in 0.8.2.0, but apparently there are still some issues
  11. How broker discovery works, e.g. that bootstrap broker list is static, or requires VIP. (@miguno)
  12. That reads can only be done against the partiton leader but not also against ISRs. (@miguno, speaking for his users)
  13. The effects of consumer rebalancing: “why is my thread suddenly (not) seeing this data? (@miguno, speaking for his users yet again)
  14. At least lately, it’s the offset request by timestamp. What you get back is not what you’re expecting. (by @bonkoif and @mthssdrbrg)
    Note: These are both Kafka experts. If these guys are confused, it must be bad!
  15. The Trial? (by @oraclenerd, there has to be one in the crowd)

Now we know what we need to improve!

Have a pet peeve thats not covered here? Please leave a comment (or a JIRA: https://issues.apache.org/jira/browse/KAFKA).

 

 

 

Tweet about this on TwitterShare on FacebookShare on LinkedIn

'What is Confusing About Kafka?' have 9 comments

  1. March 27, 2015 @ 2:54 pm Marcos

    Great post, Gwen. Keep doing the hard work.

    Reply

  2. March 28, 2015 @ 11:17 am David

    As an architect of over 20 years in system development / design, I think the reason Kafka raises a lot of questions if because it will be thrown out in meetings and presentations as something we Simply Must Have! and touted by more evangelical members of dev teams as revolutionary, but ultimately it’s just another messaging platform. I say “just”… that’s unfair. It has many great features and a place among the pantheon of messaging software, certainly, but ultimately it doesn’t do anything substantively new. However there is a level of breathlessness that surrounds a lot of the “big data” (for want of a better term) related technologies, which leads I think to the uninitiated saying over and over, “but what IS it? what IS this incredible new thing that is so different?” and it takes a while for them to figure out there’s nothing revolutionary about the concept. It’s a messaging platform, that’s it.

    It’s at that point they relax, take a breath, and can start using it in earnest.

    Reply

    • March 29, 2015 @ 4:14 am Gwen Shapira

      As a messaging platform, it is indeed – just another messaging platform with slightly different semantics and benefits. No biggy.

      I think a large part of the excitement is Kafka as a reliable data source for stream processing, which I think is indeed a game-changer. The other source of excitement is how well Kafka fits as a data source for Hadoop batch processing – no other messaging system supports this use-case, which is becoming increasingly important.

      I mostly agree that the excitement seems overblown, but its part of a cycle, right? Every 3-6 month we find something new to get all breathless about :)
      As an architect of 20 years you know how trend-driven our industry is.

      Reply

      • March 29, 2015 @ 7:16 am David

        Indeed, Kafka has some impressive capabilities in the areas you mention. And who doesn’t like to have their game changed once in a while? As an architect of 20 years I recognise the value of that :)

        Reply

  3. April 15, 2015 @ 5:51 pm James Cheng

    Gwen,

    Re #4 (When high level consumer commits offset, it commits for all partitions it’s processing at once. Hard to externalize offset. (by @weschow)),

    I’m trying to understand this:

    Say I have 2 topics, 3 partitions each. And I use a high level consumer:
    consumer.createMessageStreams({ ‘topicA’ : 3, ‘topicB’ : 3 })

    That will return me 6 streams. If I then call consumer.commitOffsets(), it will checkpoint my state across all 6 of the streams? And the checkpoint will be what, the offset of the last message that was received from the stream iterator? i.e. The offset of the most recent message returned by iter.next()?

    Reply

    • May 12, 2015 @ 10:04 am Gwen Shapira

      As far as I can tell, it will commit the offset of the most recent message read by iter.next() for each thread, so it isn’t terrible unless your thread is doing some buffering of its own (ours do).

      Reply

  4. May 2, 2015 @ 5:14 am Jaikiran

    My experience so far with Kafka has been that the way the Java API is currently exposed (version 0.8.2.x), especially on the consumer side of things, adds a lot of confusion.

    I’m also curious about #1 and #13 points in that list you published. Is there any place I can read about these issues, especially #1? I do know that I had run into #1 but hadn’t been able to figure out if that was a problem with my sample code or something else.

    Reply

    • May 12, 2015 @ 10:06 am Gwen Shapira

      I don’t know of anything published on #1 and #13. Maybe I should write something :)
      Meanwhile, feel free to email users@kafka.apache.org with such questions.

      Completely agree that exposing an API before the code is ready was a very confusing move!

      Reply

  5. June 25, 2015 @ 5:50 pm Jeff Gong

    Great post Gwen! Been reading a lot of these as I’ve been learning Kafka and they’ve been really helpful. As a new user, I’ve particularly been stumped by point #13. Rebalancing is something I just don’t get about Kafka (well I understand why it’s necessary, but not much more than that) and the latency it causes during my own tests have stumped me. Could you perhaps write something about that or help me understand it? I think it’s an issue that a lot of people running into after looking across various forums. Thanks!

    Reply


Would you like to share your thoughts?

Your email address will not be published.