Universal Framework for Anomaly Detection

“`html

Sarah Alnegheimish’s scholarly pursuits lie at the convergence of machine learning and systems engineering. Her goal: to enhance the accessibility, transparency, and trustworthiness of machine learning systems.

Alnegheimish is a doctoral candidate in Principal Research Scientist Kalyan Veeramachaneni’s Data-to-AI group within MIT’s Laboratory for Information and Decision Systems (LIDS). Here, she dedicates the majority of her efforts to the development of Orion, an open-source, user-friendly machine learning framework and time series library adept at detecting anomalies autonomously in extensive industrial and operational environments.

Formative Influences

As the offspring of a university educator and a teacher trainer, she understood from a young age that wisdom was meant to be disseminated freely. “I believe that growing up in a household where education was immensely appreciated is a significant reason why I aim to make machine learning tools attainable.” Alnegheimish’s own interactions with open-source materials further fueled her ambition. “I began to see accessibility as essential for adoption. To achieve a meaningful impact, emerging technology must be accessible and evaluable by those who need it. That is the fundamental aim of pursuing open-source development.”

Alnegheimish completed her undergraduate studies at King Saud University (KSU). “I was in the inaugural cohort of computer science majors. Prior to this program’s inception, the only computing option available was IT [information technology].” Being part of the first cohort was exhilarating, yet it brought unique challenges. “All faculty were introducing new material. Achieving success necessitated an independent learning approach. That was when I first encountered MIT OpenCourseWare: as a resource for self-education.”

Shortly after graduation, Alnegheimish became a researcher at the King Abdulaziz City for Science and Technology (KACST), Saudi Arabia’s national research facility. Through the Center for Complex Engineering Systems (CCES) at KACST and MIT, she began her research collaboration with Veeramachaneni. When she applied to MIT for graduate studies, his research group was her primary choice.

Developing Orion

Alnegheimish’s master’s thesis concentrated on time series anomaly detection — recognizing unexpected behaviors or patterns within data that can furnish users with vital information. For example, unusual patterns in network traffic data may indicate cybersecurity risks, abnormal sensor readings in heavy machinery can forecast potential failures, and monitoring patient vitals can mitigate health complications. It was within her master’s research that Alnegheimish first initiated the design of Orion.

Orion employs statistical and machine learning-based models that are continually logged and maintained. Users do not need to possess machine learning expertise to utilize the framework. They can analyze signals, compare anomalous detection techniques, and examine anomalies in a comprehensive program. The framework, code, and datasets are all open-sourced.

“With open source, accessibility and transparency are inherently achieved. You have unrestrained access to the code, allowing you to examine how the model operates by scrutinizing the code. We have augmented transparency with Orion: We annotate every step in the model and present it to the user.” Alnegheimish asserts that this transparency fosters users’ initial trust in the model before they witness its reliability firsthand.

“We aim to consolidate all these machine learning algorithms into a single repository so anyone can utilize our models directly,” she explains. “It’s not just intended for the partners we collaborate with at MIT. Many public users are employing it. They come to the library, install it, and apply it to their own data. It’s proving to be an excellent resource for individuals to discover some of the most recent methodologies for anomaly detection.”

Repurposing Models for Anomaly Detection

In her doctoral research, Alnegheimish is delving deeper into novel methods for anomaly detection via Orion. “When I initially embarked on my research, all machine learning models had to be trained from square one on your data. Now we live in an era where pre-trained models can be utilized,” she observes. Utilizing pre-trained models conserves both time and computational expenses. However, the challenge lies in the fact that time series anomaly detection is a new task for these models. “Historically, these models have been designed to forecast, not to detect anomalies,” Alnegheimish shares. “We are extending their capacities through prompt-engineering, without adding further training.”

Given that these models already encapsulate the dynamics of time-series data, Alnegheimish believes they possess the essential elements to facilitate anomaly detection. So far, her current findings lend support to this hypothesis. While they do not yet outperform the success rates of models specifically trained on tailored datasets, she remains optimistic they will achieve this in the future.

Designing for Accessibility

Alnegheimish elaborates on the measures she has undertaken to enhance Orion’s accessibility. “Before my arrival at MIT, I used to think that the crucial phase of research was exclusively focused on creating the machine learning model or enhancing its existing capabilities. Over time, I recognized that the most effective way to ensure your research is accessible and adaptable for others is by establishing systems that facilitate that access. During my graduate journey, I adopted the strategy of developing my models and systems concurrently.”

A key component of her system development involved identifying the appropriate abstractions to integrate with her models. These abstractions deliver a universal format for all models using simplified components. “Every model will entail a sequence of steps transitioning from raw input to the desired output. We’ve standardized the input and output, which permits flexibility and fluidity in the middle. Thus far, all the models we’ve tested have been capable of retrofitting into our abstractions.” The abstractions she employs have demonstrated stability and dependability over the past six years.

The significance of concurrently building systems and models is evident in Alnegheimish’s mentoring experiences. She had the chance to assist two master’s students pursuing their engineering degrees. “All I showcased to them was the system itself and the accompanying documentation for its use. Both students successfully created their own models using the abstractions we adhere to. It reaffirmed that we are on the right path.”

Alnegheimish also explored the possibility of employing a large language model (LLM) as an interface between users and the system. The LLM agent she has deployed can connect to Orion without users needing to grasp the intricate details of its workings. “Consider ChatGPT. You have no awareness of the underlying model, yet it’s highly accessible to all.” In her software, users only need to know two commands: Fit and Detect. Fit enables users to train their model, while Detect allows them to identify anomalies.

“The ultimate aim of my efforts is to render AI more accessible to everyone,” she states. So far, Orion has amassed over 120,000 downloads, and more than a thousand users have marked the repository as one of their preferred selections on Github. “Traditionally, the impact of research was quantified through citations and paper publications. Now, real-time adoption is achievable through open source.”

“`

Leave a Reply Cancel reply