The neural network artificial intelligence models implemented in areas such as medical image analysis and voice recognition execute tasks on extraordinarily intricate data frameworks that necessitate a substantial amount of processing power. This is one of the factors contributing to the high energy consumption of deep-learning models.
To enhance the efficiency of AI models, researchers at MIT devised an automated system that allows developers of deep learning algorithms to concurrently exploit two varieties of data redundancy. This helps diminish the computational load, bandwidth, and memory storage required for machine learning tasks.
Current methods for refining algorithms can be unwieldy and generally permit developers to leverage either sparsity or symmetry — two distinct forms of redundancy present in deep learning data structures.
By allowing a developer to create an algorithm from the ground up that simultaneously utilizes both redundancies, the MIT researchers’ strategy accelerated computational speeds by almost 30 times in specific experiments.
Owing to the system’s incorporation of a user-friendly programming language, it could enhance machine-learning algorithms for an extensive array of applications. The system could also assist researchers who may not be adept in deep learning but wish to enhance the efficiency of AI algorithms they employ for data processing. Furthermore, the system could have applications in scientific computation.
“For a long period, capturing these data redundancies has demanded significant implementation effort. Instead, a scientist can direct our system regarding what they wish to compute in a more abstract manner, without necessarily instructing the system on the precise computation method,” explains Willow Ahrens, an MIT postdoctoral researcher and co-author of a study on the system, which will be presented at the International Symposium on Code Generation and Optimization.
She is joined on the study by lead author Radha Patel ’23, SM ’24, and senior author Saman Amarasinghe, a professor within the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Eliminating computation
In machine learning, data are frequently represented and processed as multidimensional arrays termed tensors. A tensor resembles a matrix, which is a rectangular arrangement of values organized on two axes, rows and columns. However, unlike a two-dimensional matrix, a tensor can possess several dimensions or axes, making tensors more complicated to manage.
Deep-learning models execute operations on tensors using repeated matrix multiplication and addition — this mechanism is how neural networks discern intricate patterns in data. The vast number of calculations that must be carried out on these multidimensional data structures necessitates an immense amount of computation and energy.
However, due to the arrangement of data within tensors, engineers can frequently increase the speed of a neural network by eliminating redundant calculations.
For example, if a tensor signifies user review data from an e-commerce platform, since not every user reviewed every product, it is likely that most values in that tensor are zero. This type of data redundancy is referred to as sparsity. A model can conserve time and computational resources by solely storing and processing non-zero values.
Additionally, sometimes a tensor exhibits symmetry, meaning the upper half and lower half of the data structure are identical. In such cases, the model only needs to process one half, thereby reducing the computational workload. This type of data redundancy is known as symmetry.
“However, when you attempt to incorporate both of these optimizations, the scenario becomes rather complex,” Ahrens states.
To streamline the process, she and her team developed a new compiler, which is a software program that transforms complex code into a simpler language that can be executed by a machine. Their compiler, named SySTeC, can enhance computations by automatically leveraging both sparsity and symmetry within tensors.
They initiated the process of creating SySTeC by identifying three essential optimizations achievable through symmetry.
First, if the output tensor of the algorithm is symmetric, it only needs to compute one half. Second, if the input tensor is symmetric, the algorithm only needs to read one half. Lastly, if intermediate results of tensor operations are symmetric, the algorithm can bypass redundant calculations.
Concurrent optimizations
To utilize SySTeC, a developer inputs their program, and the system automatically optimizes their code for all three types of symmetry. Subsequently, the second phase of SySTeC performs additional transformations to exclusively store non-zero data values, refining the program for sparsity.
Ultimately, SySTeC produces ready-to-implement code.
“In this manner, we obtain the advantages of both optimizations. An interesting aspect of symmetry is that as your tensor dimensions increase, you can realize even greater savings in computation,” Ahrens notes.
The researchers demonstrated speed improvements of nearly a factor of 30 with code generated automatically by SySTeC.
Given the system’s automated nature, it could prove especially beneficial in scenarios where a researcher aims to process data using an algorithm they are developing from scratch.
In the future, the researchers aspire to incorporate SySTeC into existing sparse tensor compiler systems to create a seamless user interface. Moreover, they wish to utilize it for optimizing code for more intricate programs.
This research is, in part, funded by Intel, the National Science Foundation, the Defense Advanced Research Projects Agency, and the Department of Energy.