当前位置 博文首页 > 粽子小黑的博客:[RelativeNAS] Relative Neural Architecture S

    粽子小黑的博客:[RelativeNAS] Relative Neural Architecture S

    作者:[db:作者] 时间:2021-08-11 12:51

    Relative Neural Architecture Search via Slow-Fast Learning

    First author:Tan Hao [PDF]

    NAS: Neural Architecture Search 神经架构搜素

    automating the design of artificial neural networks

    Motivation

    To benefit form the merits while overcoming the deficiencies of the differentiable NAS and population-based NAS.

    Deficiencies

    Differentiable NAS

    Search by gradient can be ineffective due to the lack of proper diversity

    Population-Based NAS

    Search efficiency is poor due to the stochastic crossover/mutation and a large number of performance evaluations.

    Method

    continuous encoding scheme

    Spired by H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in Proceedings of the International Conference on Learning Representations, 2019.

    Cell-based architecture

    Two types of cells: the normal cell and the reduction cell(down sampling to reduce the size of the feature)

    Encodes the node and the operation separately(represented by a real value interval)

    The network is a DAG(Directed acyclic graph)

    different with DARTS: 1) no requirement of differentiability 2) directly encode the operations between pairwise nodes into real values

    Pros: provide more flexibility and versatility

    networks can achieve promising performance and high transferability for different tasks by adjusting the total number of cells in the final architecture

    slow-fast learning paradigm

    inspire by *Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He*; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6202-6211 , which proposed use Fast pathway(high frame rate) and Slow pathway(low frame rate) in model for video recognition

    in each pair of architecture vectors, conside the one with worse performance as slow-learning and the one with better performance as fast-learning.

    architecture vectors are upated by pseudo-gradient mechanism which detemined by slow-learning and fast-learning

    At each generation, the population is randomly divided into N / 2 N/2 N/2 pairs, α p , s g \boldsymbol{\alpha}_{p, s}^{g} αp,sg? is updated by learning form α p , f g \boldsymbol{\alpha}_{p, f}^{g} αp,fg? with:

    Δ α p , s g = λ 1 ( α p , f g ? α p , s g ) + λ 2 Δ α p , s g ? 1 \Delta \boldsymbol{\alpha}_{p, s}^{g}=\lambda_{1}\left(\boldsymbol{\alpha}_{p, f}^{g}-\boldsymbol{\alpha}_{p, s}^{g}\right)+\lambda_{2} \Delta \boldsymbol{\alpha}_{p, s}^{g-1} Δαp,sg?=λ1?(αp,fg??αp,sg?)+λ2?Δαp,sg?1?

    Due to pseudo-gradient based mechanism, the RelativeNAS is applicable any other generic continuously encoded search space

    noval performance estimation strategy

    adopt an operation collection as a weight set to estimate the performances

    the weight set is not directly trained but update in an online manner

    RelativeNAS is intuitively feasible to use performance estimations to obtain the approximate validation losses of the candidate architectures.

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uZwGrnhE-1627613441421)(/Users/coffechoz/Documents/Markdown/NAS/RelativeNAS.imgs/image-20210727111950806.png)]

    Pros: save substantial computation costs

    Result

    Pick CIFAR-10 as dataset

    • It only takes about nine hours with a unique 1080Ti or seven hours with a Tesla V100 to complete the above search procedure.

    • RelativeNAS + Cutout has low Test Error(2.34%) and middle Params(3.93M) , efficient

    Transferability Analyses

    • Intra-task Transferability: CIFAR-100, ImageNet
    • Inter-task Transferability: Object Detection, Semantic Segmentation, Keypoint Detection

    Conclusion

    This work highlight the merits of differentiable NAS and combining population-based NAS, to be more effective and more efficient. Moreover, the proposed slow-fast learning paradigm can be also potentially applicable to other generic learning/optimization tasks.

    cs