expanding-robot-perception

Machines have advanced significantly since the Roomba. Nowadays, drones are beginning to deliver items directly to doorsteps, autonomous vehicles are maneuvering along certain roadways, robotic canines are assisting emergency personnel, and numerous other automatons are executing backflips while contributing on manufacturing floors. Nevertheless, Luca Carlone believes the finest innovations are still on the horizon.

Carlone, who has recently attained tenure as an associate professor in MIT’s Department of Aeronautics and Astronautics (AeroAstro), oversees the SPARK Lab, where he and his pupils are closing a crucial divide between humans and machines: perception. The research team engages in both theoretical and experimental studies, all aimed at enhancing a robot’s understanding of its surroundings in manners akin to human perception. As Carlone frequently states, perception encapsulates more than mere detection.

Although robots have progressed tremendously in their capability to perceive and recognize objects in their environments, they still have significant gaps in their understanding of higher-level interactions with their surroundings. As humans, we intuitively perceive objects not only by their shapes and labels but also by their physical properties — how they can be manipulated and moved — and how they relate to one another, their broader context, and ourselves.

This level of human-like perception is what Carlone and his team aspire to instill in robots, enabling them to interact safely and effectively with individuals in various unstructured settings such as homes and workplaces.

Since becoming a part of the MIT faculty in 2017, Carlone has guided his team in developing and implementing perception and scene-recognition algorithms for a range of applications, including autonomous underground search-and-rescue units, drones that can pick up and manipulate items swiftly, and self-driving vehicles. These advancements may also benefit domestic robots that can follow commands given in natural language and even foresee human needs based on higher-contextual indicators.

“Perception is a significant hurdle to having robots assist us in real-world applications,” Carlone asserts. “If we can integrate elements of cognition and reasoning into robot perception, I am confident they can provide substantial benefits.”

Widening Perspectives

Carlone was born and raised close to Salerno, Italy, near the picturesque Amalfi coast, where he was the youngest of three sons. His mother, a retired elementary school educator, taught mathematics, while his father, a retired history professor and publisher, embraced a meticulous approach in his historical studies. The brothers may have absorbed their parents’ perspectives subconsciously, as all three pursued engineering — the older two specializing in electronics and mechanical engineering, while Carlone gravitated towards robotics, known then as mechatronics.

However, it wasn’t until late in his undergraduate journey that he found his passion for the field. Carlone attended the Polytechnic University of Turin, where he initially focused on theoretical frameworks, specifically control theory — a domain that leverages mathematics to devise algorithms that autonomously regulate the functioning of physical systems such as power grids, aircraft, vehicles, and robots. Then, during his final year, he enrolled in a course centered around robotics that examined advancements in manipulation and how robots can be programmed to operate.

“It was love at first sight. Utilizing algorithms and mathematics to develop a robot’s intelligence and enable it to move and interact with its environment is incredibly rewarding,” Carlone declares. “I instantly decided this would be my lifelong pursuit.”

He then participated in a dual-degree program at the Polytechnic University of Turin and the Polytechnic University of Milan, where he acquired master’s degrees in mechatronics and automation engineering, respectively. As part of this initiative, known as Alta Scuola Politecnica, Carlone also took management courses, requiring him to collaborate with students from diverse academic domains to conceptualize, design, and market a new product. Carlone’s team crafted a touchless table lamp that could respond to user-driven hand gestures. This endeavor encouraged him to consider engineering from various viewpoints.

“It was akin to learning different languages,” he remarks. “It was an early lesson in understanding the necessity to look beyond engineering confines and contemplate how technical projects can impact the real world.”

The Upcoming Generation

Carlone remained in Turin to finalize his PhD in mechatronics. During this period, he had the liberty to select his thesis topic, which he approached, as he recalls, “somewhat naively.”

“I pursued an area that the community considered comprehensively understood, and where many researchers believed there was nothing new to explore,” Carlone explains. “I underestimated the established nature of the topic, thinking I could still contribute new insights, and fortunately, I was able to do just that.”

The focus of his research was “simultaneous localization and mapping,” or SLAM — the challenge of creating and updating a map of a robot’s surroundings while concurrently tracking the robot’s position within that map. Carlone devised a method to reframe this challenge, allowing algorithms to produce more accurate maps without needing an initial estimate, which was standard for most SLAM techniques at that time. His work opened new avenues in a domain where many roboticists believed further improvements were impossible.

“SLAM involves understanding the spatial arrangement of objects and how a robot traverses these spaces,” Carlone details. “Now I’m part of a community questioning what the next generation of SLAM will look like.”

In pursuit of this answer, he accepted a postdoc role at Georgia Tech, where he immersed himself in coding and computer vision — a discipline that may have been influenced by a personal encounter with visual impairment: While completing his PhD in Italy, he experienced a medical issue that severely impacted his vision.

“For a year, I was at risk of possibly losing an eye,” Carlone shares. “That experience prompted me to consider the significance of vision, and artificial vision.”

He received excellent medical attention, and the condition was entirely resolved, allowing him to continue his research. At Georgia Tech, his advisor, Frank Dellaert, introduced him to coding in computer vision and how to construct elegant mathematical models for intricate three-dimensional challenges. His advisor was among the first to develop an open-source SLAM library, known as GTSAM, which Carlone quickly recognized as a vital tool. More broadly, he realized that making software accessible to all researchers unlocked incredible potential for advancements in robotics overall.

“Historically, advancements in SLAM have been quite sluggish, as groups kept their codes proprietary, forcing every research group to essentially start from square one,” Carlone remarks. “Then, the emergence of open-source pipelines transformed the landscape, largely propelling the progress seen over the past decade.”

Spatial AI

After Georgia Tech, Carlone transitioned to MIT in 2015 as a postdoc in the Laboratory for Information and Decision Systems (LIDS). During his tenure, he partnered with Sertac Karaman, an aeronautics and astronautics professor, to develop software that enables small drones to navigate their environments with minimal onboard power. A year later, he was elevated to research scientist, and in 2017, Carlone accepted a faculty role in AeroAstro.

“One aspect I fell in love with at MIT is that every decision is governed by questions like: What are our values? What is our mission? It’s never about small gains. The focus is genuinely about enhancing society,” Carlone notes. “This mindset has been truly invigorating.”

Currently, Carlone’s group is formulating methods to represent a robot’s environment, extending beyond merely defining geometric shapes and semantics. He is employing deep learning and extensive language models to create algorithms that enable robots to comprehend their surroundings through a more advanced lens, so to speak. Over the past six years, his lab has launched more than 60 open-source repositories, utilized by thousands of researchers and professionals globally. His predominant research aligns with a rapidly developing field dubbed “spatial AI.”

“Spatial AI is essentially SLAM taken to a whole new level,” Carlone asserts. “In essence, it involves allowing robots to think and perceive the world akin to humans, in helpful ways.”

This is an ambitious endeavor that could have far-reaching implications, facilitating the development of more intuitive, interactive robots that can assist in homes, workplaces, on roadways, and in remote, potentially hazardous locations. Carlone emphasizes that there remains considerable work to achieve a level of robot perception comparable to that of humans.

“I have twin daughters who are two years old, and I observe them effortlessly managing objects, carrying numerous toys, navigating through messy rooms, and quickly acclimatizing to new spaces. Robot perception has not yet reached the level of capability a toddler exhibits,” Carlone states. “However, we now possess new tools at our disposal. The future is promising.”


Leave a Reply

Your email address will not be published. Required fields are marked *

Share This