神经架构搜索研究指南 -1

从训练到用不同的参数做实验,设计神经网络的过程是劳力密集型的,非常具有挑战性,而且常常很麻烦。但是想象一下,如果能够将这个过程实现自动化呢?将这种想象转变为现实,就是本指南的核心内容。

我们将探索一系列的研究论文,这些论文试图解决具有挑战性的自动化神经网络设计任务。在本指南中,我们假设读者尝试过使用 KerasTensorFlow 等框架从头开始设计神经网络。

1. 基于强化学习的神经结构搜索

2016:Neural Architecture Search with Reinforcement Learning

本文利用递归神经网络 (RNN) 生成神经网络的模型描述。为了提高 RNN 在验证集上的精度,作者对 RNN 进行了强化学习训练,该方法在 CIFAR-10 数据集上的错误率为 3.65。

本文提出的神经结构搜索是基于梯度的。本文提出的方法是基于以下考虑:神经网络的结构和连通性可以用变长串来描述。被称为控制器的神经网络用于生成这样的字符串。然后,字符串指定的子网络根据真实数据进行训练,并在验证集上得到初始的准确度度量。然后,使用这个准确度数据计算策略梯度,再用后者更新控制器。因此,具有较高准确度的结构可以得到较高的选中概率。

An overview of Neural Architecture Search
An overview of Neural Architecture Search

神经网络结构搜索中,该控制器用于生成神经网络的结构超参数。在下图中,控制器用于生成一个卷积神经网络。控制器预测滤波器高度、滤波器宽度和步长。预测由 softmax 分类器执行,然后作为输入,输入到下一个时间步。一旦控制器完成了生成结构的过程,带有这个结构的神经网络就会建立起来,并用它进行训练。

How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input.
How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input.

于子网络的训练需要花费数小时的时间,为了加快控制器的学习过程,作者采用了分布式训练和异步参数更新的方法。

 Distributed training for Neural Architecture Search. We use a set of S parameter servers to store and send parameters to K controller replicas. Each controller replica then samples m architectures and run the multiple child models in parallel. The accuracy of each child model is recorded to compute the gradients with respect to θc, which are then sent back to the parameter servers.
Distributed training for Neural Architecture Search. We use a set of S parameter servers to store and send parameters to K controller replicas. Each controller replica then samples m architectures and run the multiple child models in parallel. The accuracy of each child model is recorded to compute the gradients with respect to θc, which are then sent back to the parameter servers.

该模型与其他模型的错误率对比如下:

Performance of Neural Architecture Search and other state-of-the-art models on CIFAR-10
Performance of Neural Architecture Search and other state-of-the-art models on CIFAR-10.

2. 高效结构搜索的层次化表示

ICLR 2018:Hierarchical Representations for Efficient Architecture Search

该网络中提出的算法在 CIFAR-10 上实现了 3.6% 的 top-1 误差,在 ImageNet 上实现了 20.3% 的 top-1 误差。作者提出了一种描述神经网络结构的层次化表示方法,证明了用简单的随机搜索可以得到具有竞争力的图像分类网络结构,并提出了一种可扩展的进化搜索方法变体。

对于平面体系结构表示,他们研究了由单源、单汇聚(single-sink)计算图组成的神经网络体系结构家族,该计算图将源处的输入转换为汇聚处的输出。图中的每个节点都对应一个特征图,每个有向边都和某个操作关联,比如池化操作或者卷积操作。此操作转换输入节点中的特征图,并将其传递给输出节点。

An example of a three-level hierarchical architecture representation.
An example of a three-level hierarchical architecture representation.

对于层次化结构,将在不同层次上有若干个不同的 motifs。在较高层次的 motifs 构建过程中,较低层次的 motifs 被作为构建模组。

A ARCHITECTURE VISUALIZATION
A ARCHITECTURE VISUALIZATION

这是 CIFAR-10 测试集中不同模型的错误率:

 Classification error on the CIFAR-10 test set obtained using state-of-the-art models as well as the best-performing architecture found using the proposed architecture search framework. Existing models are grouped as (from top to bottom): handcrafted architectures, architectures found using reinforcement learning, and architectures found using random or evolutionary search.
Classification error on the CIFAR-10 test set obtained using state-of-the-art models as well as the best-performing architecture found using the proposed architecture search framework. Existing models are grouped as (from top to bottom): handcrafted architectures, architectures found using reinforcement learning, and architectures found using random or evolutionary search.