Back To Schedule
Elliott Hall - Intermediate

Tuesday May 19th, 15:30-16:15

CyberWeek challenge via Akka Sharded Cluster

About this Session

This talk is about our experience with Akka Sharded Cluster used to support the throughput requirement of CyberWeek 2019 in Zalando SE and not an introduction to Akka Sharded Cluster.

Popcorn, an Akka Sharded Cluster system is a part of core platform of Zalando SE serving offers to customers on the shop. During CyberWeek 2018, a system in the core platform became a bottleneck in processing Price, Stock and Product detail events hampering customer experience by delayed discounts on the shop.

In 2019, Product Offer Platform, a core department in Zalando SE did a successful PoC on running an Akka Sharded Cluster on Kubernetes which evolved into what is called Popcorn today. There were multiple challenges in migrating the existing system.

The main challenge was to migrating existing system in incremental steps which prevented us from doing a big bang. Hence, We decided to continue with PostgreSQL as our storage system to avoid any extra operational overhead.

  • The uneven size of the events ranging from 5KB to 100KB making workload of system very dynamic and unpredictable.
  • Prioritization of event types while polling messages from AWS SQS.
  • BackPressure implementation to avoid overwhelming and eventually choking Popcorn with events.
  • Keeping the order of events

All of these challenges were overcome by using tools and libraries provided by Lightbend for example, Alpakka AWS SQS, Akka Management, etc. With Popcorn we were able to increase our throughput of 100KB events by 100 times and overall throughput from 13.5K events/sec to 100K events/sec supporting Zalando in achieving its 2019 CyberWeek targets.

Future plans are to further improve the throughput by moving to CQRS pattern and using Akka Persistence and replacing PostgreSQL with Cassandra and on operational front we have to improve our Split Brain mitigation and autoscaling strategies.

Speaker(s)

Rohit Sharma
@programmerohit

Rohit Sharma is working as a Lead Engineer at Zalando SE, Berlin, Germany. He is currently working with Scala, Akka, AWS and Kuberntes to build Zalando's core platform and leading a team of 3 Software Engineers.

He has been a software developer for 5 years experienced in Insurance, Social Media and E-Commerce industry.

Abdelrahman Barakat

Abdelrahman Barakat is a lead engineer with more than a decade working on highly-available distributed systems. At Zalando SE, Abdelrahman works as a part of a core backend team, where he helped scale one of the systems to deliver more than seven times the throughput by using Akka Cluster Sharding.

Abdelrahman holds a post-graduate degree in Computer Science, however, he started as a self-taught developer.

SHARE THIS SESSION