Kafka is no longer exclusively the domain of high-velocity Big Data use cases. Today, it is utilized on by workloads and companies of all sizes, supporting asynchronous communication between even small groups of microservices.
But this expanded usage has led to problems with cost creep that threaten many companies’ bottom lines. And due to the complexity of the platform, many leaders are unsure how best to manage their costs without compromising data quality.
This article reveals exactly why Kafka costs spike – and how you can keep them under control.
Why Kafka Optimization Matters in 2024
A series of changes have sparked a renewed requirement for Kafka optimization:
- Higher Data Usage: In the past, roughly 80% of data would either be archived or deleted – with just 20% stored and routinely used. However, with the growing use of AI and other data-intensive solutions, this has flipped in recent years. Today, around 80% of data is stored, analyzed, fetched, and queried – with just 20% archived.
- Increased Data Complexity: While companies used to rely on simpler data formats like JSON and XML, these are now being replaced with Protocol Buffers (Protobuf) and Avro. These new formats feature different types of indentation and more complicated events – placing greater demands on your system.
- Faster Dataflow: Not only is there a larger volume of more complex data, but most use cases require it to be delivered faster and with higher performance.
Kafka is the centerpiece that enables this new scale of high-speed, high-quality data streamlining. However, Kafka costs are a constant challenge for all companies that use it – because costs can suddenly spike and get out of hand far faster than other infrastructure components and traditional databases.
What Leads to Sudden Kafka Costs Increases?
Kafka usage is charged based on the volume of compute power and data storage. As a result, there are a handful of common factors that lead to higher costs:
- Increased Data Volume: A sudden increase in the volume of data input will inevitably increase costs. This often occurs when Kafka is introduced, and suddenly, all developers throughout an organization start using it.
- Number of consumer groups: While consumers within the same group read a single unique message from the topic, consumer groups duplicate that message. So, as you add more consumer groups to a single topic, the data input will be multiplied by the number of groups.
- Latency Requirements: As companies strive to lower latency and increase throughput, they typically increase the number of brokers they use and make those brokers more robust. But this requires more CPU and memory – both of which increase Kafka cost.
- Retention Policies: Poorly optimized retention policies lead to unnecessary increases in static storage.
- Number of Partitions: The number of partitions in the system directly impacts CPU cycles, CPU utilization, memory utilizatio, and static storage.
- Number of Connections: While there is a system in place to kill idle connections in Kafka, cutting a large volume of connections at once can cause a CPU spike and increase costs.
Strikingly, many of these factors also play a key role in damaging Kafka’s performance. From “one-size-fits-all” configurations that cause streaming delays to poorly optimized consumer groups, organizations can take a series of steps to reduce their Kafka spend and improve performance.
1. Set Appropriate Data Retention Periods
Don’t use a single policy for topics; this can lead to data being stored for either too long (which causes wasted spend) or too briefly (which may impact future performance.)
Instead, profile each topic individually to find access patterns and create custom policies to optimally reflect these trends. While this can take a lot of manual effort, it is more than worthwhile for the subsequent cost and performance improvements you will produce.
2. Tiered Storage
A recent platform update called Tiered Storage offers another way to avoid needless data retention. Using your understanding of individual topics, you can offload cold offsets to cheap object storage to avoid wasting excess storage capacity on topics that don’t require it.
3.Ditch JSONs
Companies can reduce up to 50% of their payload while improving performance simply by using binary formats, such as Protobuf and Avro. Better still, this switch will not at all jeopardize your clients’ CPU utilization.
4. Locate Inactive Partitions and Topics
Inactive partitions and topics can significantly affect memory, storage and CPU utilization. Even for self-hosted Kafka solutions, they can reduce the internal compute usage of brokers and impact performance.
Companies should, therefore, take proactive steps to identify and eliminate these inactive partitions and topics – generating immediate savings and helping to avoid underutilizing resources.
5. Use a Managed Solution
One of the largest Kafka costs is actually the human labor required to manage the platform. With fewer man-hours, less management effort, and auto-scaling capacity, a managed Kafka solution is an instant cost saver and performance enhancer.
Optimize Your Kafka Costs with GlobalDots
There is no silver bullet for Kafka optimization. While the steps we’ve discussed will help most organizations cut costs and improve performance, your unique challenges and requirements will determine their real impact.
That is why so many companies choose to work with GlobalDots: a true innovation partner with over 20 years’ of expertise, we battle-test every product and strategy on the market – then we work closely with you to select the most impactful approach for your organization.