Researchers in the Department of Electrical and Computer Engineering (ECE) have developed a new framework for building deep neural networks via grammar-guided network generators. In experimental testing, the new networks — called AOGNets — have outperformed existing state-of-the-art frameworks, including the widely used ResNet and DenseNet systems, in visual recognition tasks.
“AOGNets have better prediction accuracy than any of the networks we’ve compared it to,” says Dr. Tianfu Wu, an assistant professor in ECE and corresponding author of a paper on the work. “AOGNets are also more interpretable, meaning users can see how the system reaches its conclusions.”
The new framework uses a compositional grammar approach to system architecture that draws on best practices from previous network systems to more effectively extract useful information from raw data.
“We found that hierarchical and compositional grammar gave us a simple, elegant way to unify the approaches taken by previous system architectures, and to our best knowledge, it is the first work that makes use of grammar for network generation,” Wu says.
To test their new framework, the researchers developed AOGNets and tested them against three image classification benchmarks: CIFAR-10, CIFAR-100 and ImageNet-1K.
“AOGNets obtained significantly better performance than all of the state-of-the-art networks under fair comparisons, including ResNets, DenseNets, ResNeXts and DualPathNets,” Wu says. “AOGNets also obtained the best model interpretability score using the network dissection metric in ImageNet. AOGNets further show great potential in adversarial defense and platform-agnostic deployment (mobile vs. cloud).”
The researchers also tested the performance of AOGNets in object detection and instance semantic segmentation, on the Microsoft COCO benchmark, using the vanilla Mask R-CNN system.
“AOGNets obtained better results than the ResNet and ResNeXt backbones with smaller model sizes and similar or slightly better inference time,” Wu says. “The results show the effectiveness of AOGNets learning better features in object detection and segmentation tasks.”
These tests are relevant because image classification is one of the core basic tasks in visual recognition, and ImageNet is the standard large-scale classification benchmark. Similarly, object detection and segmentation are two core high-level vision tasks, and MS-COCO is one of the most widely used benchmarks.
“To evaluate new network architectures for deep learning in visual recognition, they are the golden testbeds,” Wu says. “AOGNets are developed under a principled grammar framework and obtain significant improvement in both ImageNet and MS-COCO, thus showing potentially broad and deep impacts for representation learning in numerous practical applications.
First author of the paper is Xilai Li, a Ph.D. student in ECE. A patent application is submitted for the work. The authors are interested in collaborating with potential academic and industry partners.