vana-is-letting-users-own-a-piece-of-the-ai-models-trained-on-their-data

In February 2024, Reddit entered into a $60 million agreement with Google permitting the search behemoth to utilize data from the platform for the training of its artificial intelligence models. Notably, Reddit users, whose data were being sold, were absent from these discussions.

The agreement mirrored the landscape of the contemporary internet: Major tech firms possess nearly all our digital information and dictate its usage. Not surprisingly, numerous platforms commercialize their data, with the quickest-growing method today being the sale to AI corporations, which are also substantial tech entities employing this data to develop increasingly robust models.

The decentralized platform Vana, which originated as a class project at MIT, strives to return authority to the users. The organization has established a completely user-owned framework that enables individuals to upload their information and dictate its utilization. AI developers can propose ideas to users for new models, and if users consent to share their data for training purposes, they gain proportional ownership of the models.

The intent is to provide everyone a stake in the AI systems that will progressively influence our society while simultaneously unveiling new datasets to propel technological advancement.

“This information is essential for developing superior AI systems,” states Vana co-founder Anna Kazlauskas ’19. “We’ve developed a decentralized structure to acquire better data — which currently resides within large tech firms — while still allowing users to maintain ultimate ownership.”

From economics to the blockchain

Numerous high school students adorn their bedroom walls with images of celebrities or sports figures. Kazlauskas, however, had a photograph of former U.S. Treasury Secretary Janet Yellen.

Arriving at MIT, Kazlauskas was certain she would become an economist, yet she became one of five students to join the MIT Bitcoin club in 2015, which directed her towards the realms of blockchains and cryptocurrency.

From her dorm in MacGregor House, she began mining the cryptocurrency Ethereum, even occasionally rummaging through campus dumpsters for discarded computer chips.

“It ignited my curiosity about everything related to computer science and networking,” Kazlauskas explains. “This involved, from a blockchain viewpoint, distributed systems and their potential to redistribute economic power to individuals, along with artificial intelligence and econometrics.”

Kazlauskas encountered Art Abal, who was enrolled at Harvard University at the time, in the former Media Lab class Emergent Ventures, and the duo resolved to explore novel methods of acquiring data to train AI systems.

“Our inquiry was: How could a vast number of individuals contribute to these AI systems via a more distributed network?” Kazlauskas recollects.

Kazlauskas and Abal aimed to challenge the established norm, where the majority of models are trained by scraping publicly available information online. Major tech firms frequently also purchase extensive datasets from other enterprises.

The founders’ strategy matured over the years and was influenced by Kazlauskas’ tenure at the financial blockchain corporation Celo post-graduation. Nevertheless, Kazlauskas credits her experience at MIT with shaping her perspective on these challenges, and the instructor for Emergent Ventures, Ramesh Raskar, continues to assist Vana in addressing AI research inquiries today.

“Having an open-ended opportunity to create, innovate, and investigate was incredible,” Kazlauskas reflects. “That ethos at MIT is truly significant. It’s about constructing solutions, discovering what succeeds, and persistently refining.”

At present, Vana capitalizes on a relatively obscure law that permits users of most major tech platforms to directly export their data. Users can upload that information into secure digital wallets within Vana and allocate it to train models as they deem appropriate.

AI developers can propose concepts for new open-source models, and individuals can pool their data to assist in model training. In the blockchain arena, these data collections are known as data DAOs, which stands for decentralized autonomous organization. Data can also be employed to develop customized AI models and agents.

In Vana, data is utilized in a manner that upholds user privacy since the platform does not disclose identifiable information. Once the model is developed, users retain ownership so that every time it is utilized, they receive compensation proportionate to the contribution their data made during its training.

“From a developer’s viewpoint, you can now create these hyper-personalized health applications that consider precisely what you consumed, how you slept, and the manner in which you exercised,” Kazlauskas states. “Such applications are unfeasible today due to the closed ecosystems established by major tech firms.”

Crowdsourced, user-owned AI

Last year, a machine-learning engineer suggested using Vana user data to train an AI model capable of generating Reddit posts. Over 140,000 Vana users contributed their Reddit information, which encompassed posts, comments, messages, and more. Users determined the conditions under which the model could be utilized, retaining ownership of the model post-creation.

Vana has facilitated analogous initiatives with user-provided data from the social media platform X; sleep data from sources such as Oura rings; and beyond. Collaborative efforts also merge data pools to establish broader AI applications.

“Imagine users possessing Spotify data, Reddit data, and fashion data,” Kazlauskas elucidates. “Typically, Spotify won’t engage in collaboration with such entities, and there’s even regulatory barriers to that. However, users can do so if they provide consent, enabling these cross-platform datasets to create remarkably powerful models.”

Vana boasts over 1 million users and more than 20 active data DAOs. Over 300 additional data pools have been proposed by users within Vana’s system, and Kazlauskas anticipates many will enter production this year.

“I believe there is significant potential in generalized AI models, personalized healthcare, and new consumer applications, as it is challenging to amalgamate all that data or gain access to it initially,” Kazlauskas asserts.

The data pools are empowering user groups to achieve what even the most formidable tech corporations struggle with currently.

“Today, big tech companies have constructed these data barriers, rendering the best datasets inaccessible to anyone,” Kazlauskas observes. “It’s a collective action dilemma, where my individual data lacks significant value, but a data pool with tens of thousands or millions of participants holds immense worth. Vana enables the formation of those pools. It’s beneficial for all: Users can reap the rewards of the AI boom because they own the models. Consequently, you avoid a scenario where a sole corporation dominates an all-powerful AI model. Enhanced technology emerges, with mutual benefits for everyone.”


Leave a Reply

Your email address will not be published. Required fields are marked *

Share This