Introduction Have you ever wondered about real-time data streaming? If not, let's embark on a journey to explore the Kafka system and explore its strengths and weaknesses. Before jumping into learning about it, first, we should talk about Big Data, as we are surrounded by an enormous amount of data and the volume is huge; such data...
Kafka is a distributed streaming platform designed for real-time data pipelines, stream processing, and data integration. AWS lambda, on the other hand, is a serverless compute service that executes your code in response to events, managing the underlying compute resources for you. In organizations where Kafka plays a central role in...
Introduction As companies grow and their data streaming needs change, it's important to optimize resources to stay efficient and control costs. AWS has rolled out a new feature for Amazon Managed Streaming for Apache Kafka (Amazon MSK) that enables the removal of brokers from MSK clusters. This capability enables you to adjust the size...
Introduction In the world of data management, companies seek to streamline operations and enhance scalability. One key journey involves migrating self-managed Apache Kafka clusters from AWS EC2 to Amazon MSK. We executed such a migration for a client with zero downtime, offering insights and strategies in this blog. Motivations Behind...
Introduction Apache Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. We have components generating events (Producers) and components that consume those events (Consumers). Consumers label themselves with a consumer group name so that each record published on a...
In today's fast-paced digital landscape, businesses thrive or falter based on their ability to harness and make sense of data in real time. Apache Kafka, an open-source distributed event streaming platform, has emerged as a pivotal tool for organizations aiming to excel in the world of data-driven decision-making.In this blog post, we'll...
Introduction In the dynamic realm of data integration, schema registries are crucial, ensuring data coherence, harmony, and structure. Amidst notable contenders, Confluent Schema Registry and AWS Glue Schema Registry shine as prime choices for efficient schema management. With businesses aiming to enhance operations within the extensive...
Apache Kafka is an open-source distributed event streaming platform. That uses Publish & subscribe mechanism to stream the records. Download & Installing the Apache Kafka: Prerequisite: You should have Java (JDK) installed on your Windows machine. Step -1: Download the binary version from the kafka official download page...
In one of the recent use case, we had to implement a complex event processing in real time mode. Storm is used as real time processing engine, but since It doesn't provide batching of events therefore we took upon Esper to do the required job. Esper can be thought as a complex event processing (CEP) component generally used for event...