Amazon Robotics Achieves 35% Boost with DynamoDB Integration
Amazon Robotics achieves worldwide scale and improves engineering efficiency by 35% with Amazon DynamoDB.
Learn how to integrate proven supply chain practices inspired by industry leaders like Amazon and Walmart to enhance efficiency and productivity within your own operations
〰️
Learn how to integrate proven supply chain practices inspired by industry leaders like Amazon and Walmart to enhance efficiency and productivity within your own operations 〰️
Amazon Robotics (AR) designs advanced robotic solutions so the Amazon fulfillment network can meet delivery promises for millions of customers every day. AR builds critical software that controls over a half a million mobile robots used in hundreds of Amazon sites spanning North America, Europe, Asia, and Australia. With a focus on engineering efficiency, the AR Movement Sciences and Scheduling (MOSS) team turned to Amazon DynamoDB to store millions of real-time work requests that orchestrate mobile robot motion.
In this post, we discuss how the MOSS team migrated from a self-managed database to DynamoDB, a fully managed, multi-Region, multi-active NoSQL database that delivers single-digit-millisecond performance at any scale. This change improved the team’s operational and engineering efficiency by 35%, allowing their engineers to focus on innovating for customers instead of provisioning, patching, and managing servers.
Before DynamoDB: High operational load
Prior to migrating to DynamoDB, MOSS software engineers spent roughly a third of their time managing, tuning, and troubleshooting database clusters, deployed across more than 2,000 Amazon Elastic Compute Cloud (Amazon EC2) instances. These tasks included the following:
Adjusting capacity to keep up with ongoing growth, seasonal changes, and increasing business demands of Amazon’s global fulfillment network
Monitoring resource utilization and manually replacing hosts to maintain cluster heath
Implementing and testing data backup and restore procedures
Updating the underlying operating system of their EC2 instances as well as software application dependencies with the latest security patches and OS versions
Planning and testing database version upgrades and other maintenance-related tasks
Why choose a managed service like DynamoDB?
The MOSS team sought several crucial benefits by moving from a self-managed database solution to an AWS-managed service like DynamoDB:
Simplified infrastructure management – AWS handles all database infrastructure and maintenance tasks, including provisioning and patching
Effortless scaling – You can scale globally with high availability, durability, and fault tolerance, eliminating the need for manual capacity planning and adjustments
Secure data storage and backup – You can benefit from the fine-grained access control in DynamoDB using AWS Identity and Access Management (IAM) policies and encryption at rest, data backups, and point-in-time recovery (PITR)
Learn how to integrate proven supply chain practices inspired by industry leaders like Amazon and Walmart to enhance efficiency and productivity within your own operations
〰️
Learn how to integrate proven supply chain practices inspired by industry leaders like Amazon and Walmart to enhance efficiency and productivity within your own operations 〰️
DynamoDB benefits: Adaptive scaling and transparent partition splitting
DynamoDB is designed for global reach and allows you to effortlessly scale your mission-critical workloads. As an AWS-managed solution, it automatically adjusts your application’s read and write capacities so you don’t need to be concerned with infrastructure or provisioning decisions as your workloads change or traffic patterns fluctuate.
So how does DynamoDB help you scale? First, it supports adaptive capacity. This means it will allocate capacity from other partitions in the event a partition becomes hot (takes on a higher volume of read and write traffic compared to other partitions). DynamoDB also supports partition splitting for heat, meaning the service will split a hot partition into two and will continue to do so until traffic is served without throttling. Partition splitting, just like adaptive capacity allocation, happens transparently, with no action or steps required by you.
Migration approach
To undertake this migration, the MOSS team only had to redesign the persistence layer of their service, which handles connections to the database. This approach allowed them to safely swap out their legacy database for DynamoDB without affecting any core service logic and gave them granular control during the migration to DynamoDB.
The database migration employed a primary-secondary strategy to ensure a smooth transition with zero downtime for clients. The procedure included the following steps:
The team introduced a DynamoDB table, which became the secondary database, and ensured all hosts could connect to it.
They promoted DynamoDB to the primary database, managing all new item insertions. Their legacy database, now secondary, was tasked only with updating items previously inserted when it was primary.
When all updates were handled by DynamoDB, they removed their legacy database as an endpoint.
The following diagram illustrates the different stages in this process.
Key learnings from testing
The MOSS team conducted extensive testing to confirm DynamoDB could meet their high-throughput and scaling needs. The following are some key learnings:
Call strategy – To manage high concurrency demands and improve service reliability, the MOSS team uses parallel batch calls with synchronized retries. This approach consolidates failed items, significantly reducing the total number of necessary server connections. Alternately, non-consolidating strategies can be used in situations requiring fast retries, such as transient network or brief server outages.
Scaling considerations – Both provisioned and on-demand DynamoDB tables can provide high throughput for any service. With on-demand mode, DynamoDB instantly accommodates workloads as they ramp up or down to any previously reached traffic level. If you anticipate your traffic will exceed a previous peak by two times the volume within 30-minute period, you can pre-warm your table by setting it to provisioned mode at the anticipated traffic level and then switch to on-demand mode later. This will mitigate the risk of requests being throttled.
Partition sharding – To optimize query performance and minimize traffic throttling, the team employs two sharding techniques: automatic and manual. For non-sequential partition keys (such as unordered data), they use deterministic uniformly random hashes to evenly distribute data across the partitions and for splits to occur uniformly across the partitions. For sequential partition keys (such as time series data), they use manual sharding to ensure even distribution of traffic across their partitions.
Conclusion
Amazon Robotics achieved a 35% efficiency lift by migrating a critical resource to DynamoDB, freeing up almost 9,000 hours annually for their engineers to focus on driving innovation for customers instead of managing backend infrastructure. Looking ahead, Amazon Robotics plans to modernize and migrate additional workloads to AWS-managed services and will continue improving the efficiency of their teams.