Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate model selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model’s architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read- writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-100 and CIFAR-10, achieving competitive performance with similarly-sized hand-designed networks.
Andrew Brock's background is in computational heat transfer and fluid dynamics for hybrid rocketry, control systems engineering for full-body haptic feedback, and social dance instruction. He holds an MSME from Cal Poly SLO.