“`html
Prepared for that much-anticipated summer getaway? Initially, you will need to place all necessary items for your journey into a suitcase, ensuring that everything fits snugly without damaging anything delicate.
Since humans have excellent visual and spatial reasoning abilities, this typically presents a simple challenge, although it might require some adjustments to fit everything in.
Conversely, for a robot, it poses a highly intricate planning issue that necessitates simultaneous consideration of numerous actions, limitations, and mechanical functions. Discovering a viable solution may take the robot an extended period — if it can devise one at all.
Researchers at MIT and NVIDIA Research have created a groundbreaking algorithm that significantly accelerates the robot’s planning procedure. Their technique allows a robot to “anticipate” by analyzing thousands of potential solutions in parallel and subsequently refining the best options to satisfy the requirements of the robot and its surroundings.
Rather than evaluating each possible action sequentially, as many current methods do, this new technique examines thousands of actions at the same time, resolving multistep manipulation tasks in mere seconds.
The researchers utilize the immense computing power of specialized processors known as graphics processing units (GPUs) to achieve this acceleration.
Within a factory or warehouse, their approach could empower robots to swiftly ascertain how to manipulate and tightly arrange items of varying shapes and sizes without causing damage, tipping anything over, or colliding with barriers, even in constrained spaces.
“This would be invaluable in industrial environments where time is of the essence and you must find an effective resolution as quickly as possible. If your algorithm requires minutes to develop a plan instead of seconds, that translates into a financial loss for the business,” notes MIT graduate student William Shen SM ’23, lead author of the paper discussing this approach.
Joining him on the paper are Caelan Garrett ’15, MEng ’15, PhD ’21, a senior research scientist at NVIDIA Research; Nishanth Kumar, an MIT graduate student; Ankit Goyal, a NVIDIA research scientist; Tucker Hermans, a NVIDIA research scientist and associate professor at the University of Utah; Leslie Pack Kaelbling, the Panasonic Professor of Computer Science and Engineering at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Tomás Lozano-Pérez, an MIT professor of computer science and engineering and a CSAIL member; and Fabio Ramos, principal research scientist at NVIDIA and a professor at the University of Sydney. The findings will be showcased at the Robotics: Science and Systems Conference.
Planning in parallel
The researchers’ algorithm is crafted for a domain known as task and motion planning (TAMP). The objective of a TAMP algorithm is to devise a task plan for a robot, which consists of a high-level action sequence, accompanied by a motion plan that entails low-level action parameters, such as joint angles and gripper orientation, necessary to fulfill that high-level plan.
To formulate a plan for packing objects into a box, a robot must consider various factors, such as the final alignment of packed items so that they fit together, as well as how it will grasp and manipulate them with its arm and gripper.
This must be done while also determining how to avoid collisions and adhere to any user-defined constraints, such as a specific order for packing items.
Given the numerous potential sequences of actions, randomly sampling possible solutions and testing them one by one could be exceedingly time-consuming.
“It’s a vast search space, and many actions the robot performs in that space don’t yield any constructive outcomes,” Garrett adds.
Instead, the researchers’ algorithm, dubbed cuTAMP, which is boosted using a parallel computing framework called CUDA, simulates and fine-tunes thousands of solutions simultaneously. It accomplishes this by merging two strategies, sampling and optimization.
Sampling involves selecting a solution to test. However, instead of randomly sampling solutions, cuTAMP restricts the range of possible solutions to those most likely to meet the constraints of the problem. This refined sampling method allows cuTAMP to broadly investigate potential solutions while streamlining the sampling space.
“When we aggregate the findings of these samples, we obtain a considerably better starting point than if we had sampled randomly. This significantly enhances our ability to find solutions more expediently during optimization,” Shen remarks.
Once cuTAMP has generated that set of samples, it executes a parallelized optimization process that calculates a cost, reflecting how effectively each sample avoids collisions and fulfills the robot’s motion requirements, along with any user-defined goals.
It concurrently updates the samples, selects the most promising candidates, and reiterates the process until it converges on a successful solution.
Harnessing accelerated computing
The researchers capitalize on GPUs, specialized processors that are immensely more effective for parallel processing and tasks than general-purpose CPUs, to increase the number of solutions they can evaluate and optimize concurrently. This optimization maximized the efficiency of their algorithm.
“By utilizing GPUs, the computational expense of optimizing one solution is equivalent to optimizing hundreds or thousands of solutions,” Shen elaborates.
When they assessed their method on Tetris-like packing challenges in simulations, cuTAMP took only a few seconds to identify successful, collision-free plans that might require significantly more time with sequential planning methods.
And when implemented on a real robotic arm, the algorithm consistently identified a solution in under 30 seconds.
The system operates across various robots and has been tested on a robotic arm at MIT as well as a humanoid robot at NVIDIA. Since cuTAMP is not reliant on machine learning, it requires no training data, making it possible to deploy it in numerous scenarios readily.
“You can present it with a completely new task and it will demonstrably resolve it,” Garrett states.
The algorithm is versatile enough to apply to situations beyond packing, such as a robot operating tools. A user could incorporate diverse skill types into the system to automatically enhance a robot’s capabilities.
In the future, the researchers aim to leverage large language models and vision language models within cuTAMP, allowing a robot to devise and implement a plan that achieves specific objectives based on vocal commands from a user.
This work is partially funded by the National Science Foundation (NSF), Air Force Office for Scientific Research, Office of Naval Research, MIT Quest for Intelligence, NVIDIA, and the Robotics and Artificial Intelligence Institute.
“`