Amazon is working on ways to eliminate the need for barcodes

Why multimodal identification is a crucial step in automating item identification at Amazon scale.


The barcode has been in use for nearly 50 years. It’s ubiquitous and all but infallible.

For Amazon, that’s not quite good enough.

When an item comes into an Amazon fulfillment center, employees use barcodes to verify its identity at several different points along its journey to a delivery vehicle. Each time, the item has to be picked up and the barcode located and scanned. Sometimes, the barcode is damaged or even missing.

That process is repeated millions of times across a massive catalogue of items of varying shapes and sizes, and it can’t easily be automated. Right now, there isn’t a robot versatile enough to manipulate any item that may come into a warehouse and then scan it.

The solution? Augment or even eliminate the barcode. Or, better still, eliminate the reliance on awkward and inefficient manual item identification altogether.

That’s what Amazon is researching using multimodal identification, or MMID. This process uses multiple modalities of information — for example, extracting the appearance and dimensions of an item from an image of that item — to automate identification.

An exterior view of the MMID process


The project is already proving its worth in fulfillment centers in Hamburg and Barcelona, where it’s being used on conveyor belts to flag trays with what Amazon calls virtual-physical mismatches — instances where the items in a tray don’t match the ones listed by the inventory system. While such mismatches are rare, at Amazon’s scale, they add up.

“Our north star vision is to use this in robotic manipulation” says Nontas Antonakos, an applied science manager in Amazon’s computer vision group in Berlin who led the MMID team when the concept was initially conceptualized and deployed. “Solving this problem, so robots can pick up items and process them without needing to find and scan a barcode, is fundamental. It will help us get packages to customers more quickly and accurately. And MMID is a cornerstone for achieving this.

Developing MMID

The team wanted to start by teaching an algorithm to match an item with its photograph. But there hadn’t been a consistent effort to take images of items as they appeared in fulfillment centers, so training data wasn’t available. The first step was simply to take pictures of products as they moved along conveyor belts in fulfillment centers, building up a library of images.

Each image was then translated into a descriptive list of numbers, or a vector. The item’s dimensions also became a vector. Researchers then developed machine learning algorithms to extract those vectors and to match them with the corresponding vectors of candidate items. Leveraging the power of deep learning, the team was pleasantly surprised to see match rates of 75% to 80% from the initial experiments.

“It was a big leap for us,” says Antonakos. “We realized we had something worth investing in.” After extensive scientific investments, MMID currently achieves match rates near 99%.

Using modalities to generate a digital fingerprint


Those high match rates are due, in part, to the fact that Amazon’s inventory systems know exactly where each item is at each step of the fulfillment process. The algorithm does not need to match an item against Amazon’s entire catalogue of hundreds of millions of products — currently an impossible task. Each item comes from a particular tote, and each tote contains a few dozen products. So, the algorithm only has to match an item against the contents of a single tote.

The MMID technology was first piloted in a fulfillment center in Szczecin, Poland, with a camera positioned above a single conveyor line taking pictures of “singulated” trays — trays that contain only one item. The singulated trays are ideal because it’s easier to identify a lone item than try to disambiguate multiple items and then attempt to identify each one.

Moreover, the singulated trays appear early enough in the fulfillment process that “we avoid this case where the items have made it all the way to the end of the process and someone has to deal with the error,” says Doug Morrison, a Robotics AI applied scientist who has been deeply involved in the project for the past two years. “We can then just recycle the incorrect item back into the system to its correct location.”

Utilizing the MMID sensor platform at this stage also has the advantage of being non-intrusive: if the system detects a mismatch, the error can be addressed. If there is no mismatch, the process doesn’t disrupt the line.

Looking ahead

Meanwhile, cameras are continually adding to the library of images with each item that rolls by.

“This is all the data we use later on to improve the system,” says Anton Milan, an Amazon senior applied scientist who was the science lead on the project for much of the first two years. “We get the data for free, and we don’t interrupt any processes.”

That learning process is essential. For example, the initial launch of MMID encountered an unexpected challenge owing, in part, to a Prime Day promotion. Several hundred Echo Dots were leaving the fulfillment center each hour, and they came in two colors: grey and blue. The algorithm couldn’t tell them apart.

“Aside from the barcode we couldn’t see, the packaging was nearly identical,” says Morrison. “There was a tiny image of a blue or grey dot, and our system got confused.”

An interior view of the MMID process


That led to a new and important feature: a confidence score that accompanies every identification. A high score signals a potential mismatch and is the equivalent of, “Don’t let the tray go through,” says Morrison, whereas a low score equates to, “I’m not sure about this one, don’t take any action.”

In the future, MMID might be integrated into other components of the fulfillment process, though there are obstacles to doing so. On a conveyor belt, the lighting and the speed of the item are relatively controlled and constant. If a person is picking up an item, there are a lot more variables to performing identification in-hand. The employee’s hand might make item detection more challenging depending on how they hold it. In addition, if an item is being passed from someone’s left hand to the right, it has to be identified faster. Robotics researchers are working to address these challenges.

“This vision, of using MMID throughout the whole fulfillment process, to speed up and enable robotic automation, is going to be reached,” says Antonakos, “and when it is, it will be another step forward in our journey to get packages to customers more quickly and more accurately.”


Previous
Previous

Amazon showcased 6 warehouse robotic innovations in 2022

Next
Next

Amazon Robotics is working on elimating the need for barcodes