This page contains features available in the Community Edition or higher. Create a free account to access them. More details about the specific offerings of the Community Edition can be found in the Editions and Pricing page.

Use Merge Point To Normalize Kafka Topics

This page describes how to reduce the amount of Kafka Topics in order to lower the overhead by using the merge point feature.

3 minute read

Kafka excels at processing a high volume of messages but can encounter difficulties with excessive topics, which may lead to insufficient memory. The optimal Kafka setup involves minimal topics, utilizing the event key for logical data segregation.

On the contrary, MQTT shines when handling a large number of topics with a small number of messages. But when bridging MQTT to Kafka, the number of topics can become overwhelming. Specifically, with the default configuration, Kafka is able to handle around 100-150 topics. This is because there is a limit of 1000 partitions per broker, and each topic requires has 6 partitions by default.

So, if you are experiencing memory issues with Kafka, you may want to consider combining multiple topics into a single topic with different keys. The diagram below illustrates how this principle simplifies topic management.

graph LR event1(Topic: umh.v1.acme.anytown.foo.bar
Value: 1) event2(Topic: umh.v1.acme.anytown.foo.baz
Value: 2) event3(Topic: umh.v1.acme.anytown
Value: 3) event4(umh.v1.acme.anytown.frob
Value: 4) event1 --> bridge event2 --> bridge event3 --> bridge event4 --> bridge bridge{{Topic merge point: 3}} subgraph Topic: umh.v1.acme gmsg1(Key: anytown.foo.bar
Value: 1) gmsg2(Key: anytown.foo.baz
Value: 2) gmsg3(Key: anytown
Value: 3) gmsg4(Key: anytown.frob
Value: 4) end bridge --> gmsg1 bridge --> gmsg2 bridge --> gmsg3 bridge --> gmsg4

Before you begin

This tutorial is for advanced users. Contact us if you need assistance.

You need to have a UMH cluster. If you do not already have a cluster, you can create one by following the Getting Started guide.

You also need to access the system where the cluster is running, either by logging into it or by using a remote shell.

There are two configurations for the topic merge point: one in the Companion configuration for Benthos data sources and another in the Helm chart for data bridges.

Data Sources

To adjust the topic merge point for data sources, modify mgmtcompanion-config configmap. This can be easily done with the following command:

sudo $(which kubectl) edit configmap mgmtcompanion-config -n mgmtcompanion --kubeconfig /etc/rancher/k3s/k3s.yaml

This command opens the current configuration in the default editor, allowing you to set the umh_merge_point to your preferred value:

data:
  umh_merge_point: <numeric-value>

Ensure the value is at least 3 and update the lastUpdated field to the current Unix timestamp to trigger the automatic refresh of existing data sources.

Data Bridge

For data bridges, the merge point is defined individually in the Helm chart values for each bridge. Update the Helm chart installation with the new topicMergePoint value for each bridge. See the Helm chart documentation for more details.

Setting the topicMergePoint to -1 disables the merge feature.

Last modified February 5, 2025: temporarily store the requirements locally (#320) (9685467)