I think Kafka is pretty simple. There are topics, producers write messages to topics and consumers read them. The rest is implementation details. My favorite reaction from my colleagues when they learn Kafka is “Is that it? I thought its difficult and put aside an entire week to learn this! Why does everyone make such a big deal out of Kafka?” The best systems are deceptively simple – like chess, take an hour to learn and years to master.
But there’s a big difference between “deceptively simple” and “deceive yourself into believing your system is simple when its actually pretty challenging”.
At a recent call, Neha said “The most confusing behavior we have is how producing to a topic can return errors for few seconds after the topic was already created”. As she said that, I remembered that indeed, this was once very confusing, but then I got used to it. Which got us thinking: What other things that Kafka does are very confusing to new users, but we got so used to them that we no longer even see the issue?
So, we conducted a highly unscientific twitter survey and got the following results. I’m publishing them here with some comments:
- Sending messages after creating a topic doesn’t work. What?! (by @nehanarkhede)
- (Partition) Reassignments could be simplified. just set the repl factor for a topic, autofit mode will generate and apply new schema. (by @stevenleroux)
Note: This should be addressed in KAFKA-1678. And there are tools that help: https://github.com/mesos/kafka#rebalancing-brokers-in-the-cluster
- Controller : async ops, get a better insight view of current operations & statuses. e.g. deleted topics in zk, reassigns cp status (by @stevenleroux)
- When high level consumer commits offset, it commits for all partitions it’s processing at once. Hard to externalize offset. (by @weschow)
- Does “the Java consumer API” count? (by @angrynoah)
Note: This will be resolved in the next release with the new consumer API
- The sticky partitioning producer. (by @miguno) and “messages are not randomly distributed to all partitions when they are key-less!” (by @gbuisson)
Note: This is fixed in the new producer, available in Kafka 0.8.2.0
- Inability of brokers to bind to both external/internal IPs at the same time (by @vanyatka)
- issues with hostnames in inter-node comm. when they don’t match up. Not really warned about in the docs, really confusing to debug (by @odwyerrob)
Note: Docs could definitely be better here. We explain this in the FAQ, but users only find this after lots of confusion.
- Mirror-maker does not stop consuming when target cluster is down (by @erik_van_oosten)
- inability to delete a topic is the biggest WAT for devs at Chartbeat (by @djerrynyc)
Note: We thought this is fixed in 0.8.2.0, but apparently there are still some issues
- How broker discovery works, e.g. that bootstrap broker list is static, or requires VIP. (@miguno)
- That reads can only be done against the partiton leader but not also against ISRs. (@miguno, speaking for his users)
- The effects of consumer rebalancing: “why is my thread suddenly (not) seeing this data? (@miguno, speaking for his users yet again)
- At least lately, it’s the offset request by timestamp. What you get back is not what you’re expecting. (by @bonkoif and @mthssdrbrg)
Note: These are both Kafka experts. If these guys are confused, it must be bad!
- The Trial? http://en.wikipedia.org/wiki/The_Trial (by @oraclenerd, there has to be one in the crowd)
Now we know what we need to improve!
Have a pet peeve thats not covered here? Please leave a comment (or a JIRA: https://issues.apache.org/jira/browse/KAFKA).