Comparing Push Notification Systems: Uber’s RAMEN vs. Netflix’s RENO
In the world of large-scale technology, push notification systems are crucial for maintaining a seamless user experience. Uber and Netflix, two giants in the tech industry, have engineered sophisticated systems to handle the massive scale and complexity of delivering real-time notifications to millions of users globally. This article dives into the engineering marvels behind Uber’s RAMEN (Realtime Asynchronous Messaging Network) and Netflix’s RENO (Rapid Event Notification System), comparing their architecture, design principles, and unique challenges.
Uber’s Push Notification System: RAMEN
Initially, Uber relied on a polling-based approach to update its user interfaces, where the app would continuously pull updates from the server. However, this method led to several issues, including:
- Increased Battery Drain: Constant polling consumed significant battery power.
- App Sluggishness: Frequent server requests made the app slower.
- Network Congestion: Continuous polling caused unnecessary load on the network.
Read More: Simplified approach towards Mobile Application Testing
To overcome these challenges, Uber transitioned to a push-based framework, guided by the following principles:
- Easier Migration: From polling to push.
- Ease of Development: A system that simplifies the developer experience.
- Reliability: Ensuring the system consistently delivers notifications.
- Wire Efficiency: Minimizing data transfer between server and mobile apps.
Answer was a protocol, that would minimize the amount of data transfer between server and mobile apps. In 2015, they started working on this problem by introducing an indigenous protocol. Options to implement the same were to utilize HTTP/1.1 with long polling, Web Sockets, or finally Server-Sent events (SSE). They went ahead using SSE. This led to the birth of RAMEN (Realtime Asynchronous Messaging Network). RAMEN was a simple elegant protocol over SSE.
As is evident from the diagram, the client starts receiving messages with seq=0 at the beginning of the session. It keeps getting the messages and stores the highest seq number. If a message with seq=3 fails, it again starts the connection setup and starts receiving messages from the last stored highest seq number+1. This also was true in the case of sending a heartbeat every 4 seconds and if no response, then resuming with the highest seq number+1. This system at peak, pushed over 70,000 QPS push messages per second to three different types of apps by maintaining up to 600,000 concurrent streaming connections. With the increase in scale of users and load, Uber in 2017, started a revamp of the RAMEN server. It went to a new state like below:
However, this system also had its side of limitations. Loss of acknowledgments, Poor connection stability, and Transport limitations were a few of them. Uber in 2019 decided to move towards a migration of RAMEN with the use of grpc as it has support for many different RPC methods and has interoperability with the QUIC transport layer protocol.
Let’s jump to Netflix’s Push Notification System: RENO
Similar to what Uber did, Netflix also had a lot of use cases, where they had to push a tremendous volume of notifications to 220 million active members across the globe to keep their experience consistent. Their solution, RENO (Rapid Event Notification System), was designed with several key factors in mind:
- Single Events Source: Ensuring a centralized source for events.
- Event Prioritization: Categorizing events into high, medium, and low priority to manage traffic effectively.
- Hybrid Communication Model: Combining different communication methods to optimize delivery.
- Targeted Delivery: Tailoring notifications to specific user segments.
Managing High RPS (Requests Per Second): Scaling the system to handle a large volume of requests without compromising performance.
RENO Architecture Breakdown
- Event Triggers: These can be member actions or system-driven updates that require refreshing the user’s experience.
- Event Management Engine: Netflix uses a framework called Manhattan, which listens to events and forwards them to queues.
- Event Priority-Based Queues: Amazon SQS queues are used to shard traffic based on priority, ensuring critical events are handled first.
- Event Priority-Based Clusters: AWS Instance Clusters process events by subscribing to the corresponding priority queues.
- Outbound Messaging System: This system delivers notifications to mobile devices. For web and other streaming devices, Netflix uses its own Zuul Push solution to maintain persistent connections.
Commonalities and Conclusion
Both RAMEN and RENO share several key design principles:
- Priority Queues: Both systems use queues to manage traffic based on message importance.
- Cassandra for Persistence: High-scale read/write operations are handled by Cassandra in both cases.
- Event Deduplication: Each system merges duplicate events to avoid unnecessary processing.
Comparative analysis before we conclude
Scalability: RAMEN handles up to 70,000 QPS and 600,000 concurrent connections.
RENO manages notifications for 220 million users with high RPS, using priority queues to ensure timely delivery.
Data Handling: RAMEN struggled with connection stability but improved with gRPC.
RENO integrates a hybrid communication model and Cassandra for high-scale data operations.
Event Management: RAMEN uses SSE and requires connection re-establishment on failure.
RENO employs a centralized event management engine and sophisticated event prioritization.
Both Uber’s RAMEN and Netflix’s RENO showcase the incredible engineering capabilities required to build scalable, reliable, and efficient push notification systems. While they share several architectural similarities, each system also presents unique solutions tailored to the specific needs of their respective platforms. As these systems continue to evolve, they stand as testaments to the innovation driving the technology that keeps our apps running smoothly.
For those interested in deep-diving into the specifics, here are the links to the detailed blogs from Uber and Netflix: Uber RAMEN | Netflix RENO.