Exploring the Depths: Unveiling Ocean Mysteries through AI-Powered Underwater Photography

In the Northeastern region of the United States, the Gulf of Maine signifies one of the most biologically rich marine environments globally — inhabited by whales, sharks, jellyfish, herring, plankton, and countless other species. However, despite this ecosystem fostering abundant biodiversity, it is experiencing swift environmental transformations. The Gulf of Maine is escalating in temperature faster than 99 percent of the planet’s oceans, leading to ramifications that are still emerging.

A fresh research endeavor being developed at MIT Sea Grant, named LOBSTgER — short for Learning Oceanic Bioecological Systems Through Generative Representations — merges artificial intelligence and underwater photography to document marine life that is vulnerable to these transformations and present it to the public through innovative visual formats. Co-led by underwater photographer and visiting artist at MIT Sea Grant Keith Ellenbogen, alongside MIT mechanical engineering PhD student Andreas Mentzelopoulos, the initiative examines how generative AI can enhance scientific storytelling by utilizing field-derived photographic data.

Just as the 19th-century camera revolutionized our capacity to document and unveil the natural world — capturing life with unparalleled detail and revealing remote or concealed environments — generative AI opens a novel frontier in visual storytelling. Like early photography, AI creates a creative and conceptual domain, questioning our definitions of authenticity and how we convey scientific and artistic viewpoints.

In the LOBSTgER initiative, generative models are trained solely on a curated collection of Ellenbogen’s original underwater photographs — every image designed with artistic purpose, technical accuracy, precise species identification, and clear geographical context. By establishing a high-quality dataset anchored in real-world observations, the project guarantees that the generated imagery sustains both visual fidelity and ecological significance. Furthermore, LOBSTgER’s models are constructed using custom code developed by Mentzelopoulos to shield the process and outputs from any potential biases stemming from external data or models. LOBSTgER’s generative AI builds on actual photography, broadening the researchers’ visual lexicon to enrich the public’s connection to the natural environment.

At its core, LOBSTgER functions at the convergence of art, science, and technology. The project draws from the visual vernacular of photography, the observational precision of marine science, and the computational prowess of generative AI. By merging these fields, the team is not only crafting new methods for visualizing marine life — they are also rethinking how environmental narratives can be presented. This integrative strategy renders LOBSTgER both a research instrument and a creative exploration — one that embodies MIT’s enduring tradition of interdisciplinary innovation.

Underwater photography in New England’s coastal waters is notoriously challenging. Restrictions in visibility, swirling sediment, air bubbles, and the unpredictable movements of marine creatures all present ongoing hurdles. For several years, Ellenbogen has navigated these obstacles and is creating a thorough record of the area’s biodiversity through the initiative, Space to Sea: Visualizing New England’s Ocean Wilderness. This extensive dataset of underwater images forms the foundation for training LOBSTgER’s generative AI models. The images cover diverse perspectives, lighting conditions, and animal behaviors, resulting in a visual archive that is both artistically captivating and biologically precise.

LOBSTgER’s custom diffusion models are designed to replicate not just the biodiversity documented by Ellenbogen but also the artistic style he employs to capture it. By assimilating thousands of genuine underwater images, the models internalize intricate details such as natural light gradients, species-specific colorations, and even the atmospheric texture created by suspended particles and refracted sunlight. The outcome is imagery that not only appears visually authentic but also conveys a sense of immersion and emotional impact.

The models can both create new, synthetic, yet scientifically accurate images unconditionally (i.e., requiring no user input/guidance) and enhance real photographs conditionally (i.e., image-to-image generation). By incorporating AI into the photographic workflow, Ellenbogen will utilize these tools to recover details in murky water, modify lighting to highlight key subjects, or even simulate scenes nearly impossible to capture in the field. The team also believes this strategy may assist other underwater photographers and image editors confronting similar challenges. This blended method aims to expedite the curation process and enable storytellers to construct a more thorough and cohesive visual narrative of life beneath the waves.

In one significant series, Ellenbogen captured high-resolution photographs of lion’s mane jellyfish, blue sharks, American lobsters, and ocean sunfish (Mola mola) while free diving in coastal waters. “Obtaining a high-quality dataset is not straightforward,” Ellenbogen remarks. “It necessitates multiple dives, missed chances, and erratic conditions. But these challenges are part of what renders underwater documentation both demanding and fulfilling.”

Mentzelopoulos has created original code to train a family of latent diffusion models for LOBSTgER rooted in Ellenbogen’s images. Developing such models necessitates a substantial level of technical proficiency, and training models from the ground up is a complicated process requiring hundreds of hours of computation and meticulous hyperparameter adjustments.

The project reflects a parallel process: field documentation through photography and model development through iterative training. Ellenbogen engages in the field, capturing rare and fleeting encounters with marine species; Mentzelopoulos operates in the lab, translating those moments into machine-learning contexts that can expand and reinterpret the visual language of the ocean.

“The objective isn’t to supplant photography,” Mentzelopoulos comments. “It’s to build upon and enhance it — making the invisible visible, and helping people perceive environmental complexity in a way that resonates both emotionally and intellectually. Our models aim to encapsulate not merely biological realism but the emotional resonance that can stimulate real-world engagement and action.”

LOBSTgER points toward a hybrid future that melds direct observation with technological interpretation. The team’s long-term aspiration is to craft a comprehensive model that can visualize a broad spectrum of species located in the Gulf of Maine and, ultimately, apply analogous methods to marine ecosystems worldwide.

The researchers propose that photography and generative AI form a continuum, rather than a conflict. Photography captures what exists — the texture, light, and animal behavior during actual encounters — while AI extends that perception beyond what is observable, toward what could be understood, inferred, or envisioned based on scientific data and artistic insight. Collectively, they provide a robust framework for relaying science through image creation.

In a region where ecosystems are transforming swiftly, the process of visualizing transcends mere documentation. It evolves into a tool for awareness, engagement, and, ultimately, conservation. LOBSTgER is still in its early stages, and the team anticipates sharing more discoveries, images, and insights as the project progresses.

Answer from the lead image: The left image was created using LOBSTgER’s unconditional models, while the right image is genuine.

For further details, contact Keith Ellenbogen and Andreas Mentzelopoulos.

Leave a Reply Cancel reply