神经架构搜索研究指南 -2

系列文章:《神经架构搜索研究指南 -1

3. 可伸缩图像识别领域的可转移架构学习

2017:Learning Transferable Architectures for Scalable Image Recognition

在本文中,作者在一个小数据集上搜索结构上的一个组成模块,然后将该模块再转换到一个大数据集上。这是因为直接使用大型数据集将非常麻烦和耗时。

作者在 CIFAR-10 数据集上寻找最佳卷积层,并将其应用于 ImageNet 数据集。具体做法是将该层的更多副本堆叠在一起来实现的。每一层都有自己的参数,用于设计卷积架构。作者将这种体系结构称为 NASNet 架构。

他们还引入了正则化技术——ScheduledDropPath——来改进 NASNet 模型中的泛化性能。该方法的错误率为 2.4%。最大的 NASNet 模型平均精度达到 43.1%。

与前一篇文章一样,本文也使用了神经体系结构搜索 (NAS) 框架。本文的方案中,卷积网络的总体结构是人工预置好的。它们由重复几次的卷积单元组成。每个卷积层具有相同的结构,但权重不同。

该网络有两种类型的单元:返回相同维度特征图的卷积单元(Normal Cell),以及返回特征图的卷积单元(Reduction Cell)。后者特征图的高度和宽度在卷积输出时减少了一半。

 Scalable architectures for image classification consist of two repeated motifs termed Normal Cell and Reduction Cell. This diagram highlights the model architecture for CIFAR-10 and ImageNet. The choice for the number of times the Normal Cells that gets stacked between reduction cells, N, can vary in our experiments.
Scalable architectures for image classification consist of two repeated motifs termed Normal Cell and Reduction Cell. This diagram highlights the model architecture for CIFAR-10 and ImageNet. The choice for the number of times the Normal Cells that gets stacked between reduction cells, N, can vary in our experiments.

在本文提出的搜索空间中,每个单元接收两个初始隐藏状态作为输入,这两个初始隐藏状态是前两层或输入图像中的两个单元的输出。在给定这两个初始隐藏状态的情况下,控制器 RNN 递归地预测卷积单元结构的其余部分。

Controller model architecture for recursively constructing one block of a convolutional cell. Each block requires selecting 5 discrete parameters, each of which corresponds to the output of a softmax layer. Example constructed block shown on right. A convolutional cell contains B blocks, hence the controller contains 5B softmax layers for predicting the architecture of a convolutional cell. In our experiments, the number of blocks B is 5.
Controller model architecture for recursively constructing one block of a convolutional cell. Each block requires selecting 5 discrete parameters, each of which corresponds to the output of a softmax layer. Example constructed block shown on right. A convolutional cell contains B blocks, hence the controller contains 5B softmax layers for predicting the architecture of a convolutional cell. In our experiments, the number of blocks B is 5.

下面是 CIFAR-10 数据集上神经结构搜索的性能:

Performance of Neural Architecture Search and other state-of-the-art models on CIFAR-10. All results for NASNet are the mean accuracy across 5 runs.
Performance of Neural Architecture Search and other state-of-the-art models on CIFAR-10. All results for NASNet are the mean accuracy across 5 runs.

4. 参数共享的高效神经结构搜索

2018:Efficient Neural Architecture Search via Parameter Sharing

本文作者提出了一种称为高效神经结构搜索 (ENAS) 的方法。在这种方法中,控制器通过在大型计算图中搜索最优子图来发现神经网络结构。该控制器经过训练,可以选出在验证集上获得最佳准确度的子图。

然后训练所选子图对应的模型,使正则交叉熵损失最小化。参数通常在子模型之间共享,以便 ENAS 能够提供更好的性能。在 CIFAR-10 测试中,ENAS 的错误率为 2.89%,而神经结构搜索 (NAS) 的错误率为 2.65%。

本文强制所有子模型共享权值,以避免从零开始训练每个子模型达到收敛,从而提高了 NAS 的效率。
本文用单个有向无环图(DAG)表示 NAS 的搜索空间。通过引入一个具有 N 个节点的 DAG,设计出了递归单元,该单元表示局部计算,图中的边表示 N 个节点之间的信息流。

ENAS 的控制器是一个 RNN,它决定在 DAG 中的每个节点上执行哪些计算以及激活哪些边。控制器网络是一个包含 100 个隐藏单元的 LSTM。

An example of a recurrent cell in our search space with 4 computational nodes. Left: The computational DAG that corresponds to the recurrent cell. The red edges represent the flow of information in the graph. Middle: The recurrent cell. Right: The outputs of the controller RNN that result in the cell in the middle and the DAG on the left. Note that nodes 3 and 4 are never sampled by the RNN, so their results are averaged and are treated as the cell’s output.
An example of a recurrent cell in our search space with 4 computational nodes. Left: The computational DAG that corresponds to the recurrent cell. The red edges represent the flow of information in the graph. Middle: The recurrent cell. Right: The outputs of the controller RNN that result in the cell in the middle and the DAG on the left. Note that nodes 3 and 4 are never sampled by the RNN, so their results are averaged and are treated as the cell’s output.
The graph represents the entire search space while the red arrows define a model in the search space, which is decided by a controller. Here, node 1 is the input to the model whereas nodes 3 and 6 are the model’s outputs.
The graph represents the entire search space while the red arrows define a model in the search space, which is decided by a controller. Here, node 1 is the input to the model whereas nodes 3 and 6 are the model’s outputs.

在 ENAS 中,需要学习两组参数:控制器 LSTM 的参数和子模型的共享参数。在训练的第一阶段,对子模型的共享参数进行训练。在第二阶段,对控制器 LSTM 的参数进行训练。这两个阶段在 ENAS 的训练期间交替进行。

An example run of a recurrent cell in our search space with 4 computational nodes, which represent 4 layers in a convolutional network. Top: The output of the controller RNN. Bottom Left: The computational DAG corresponding to the network’s architecture. Red arrows denote the active computational paths. Bottom Right: The complete network. Dotted arrows denote skip connections.
An example run of a recurrent cell in our search space with 4 computational nodes, which represent 4 layers in a convolutional network. Top: The output of the controller RNN. Bottom Left: The computational DAG corresponding to the network’sarchitecture. Red arrows denote the active computational paths. Bottom Right: The complete network. Dotted arrows denote skip connections.

以下是 ENAS 在 CIFAR-10 数据集上的表现情况:

Classification errors of ENAS and baselines on CIFAR-10. In this table, the first block presents DenseNet, one of the state-ofthe-art architectures designed by human experts. The second block presents approaches that design the entire network. The last block presents techniques that design modular cells which are combined to build the final network.
Classification errors of ENAS and baselines on CIFAR-10. In this table, the first block presents DenseNet, one of the state-ofthe-art architectures designed by human experts. The second block presents approaches that design the entire network. The last block presents techniques that design modular cells which are combined to build the final network.