summary
This paper investigates the foundational questions in BNN by using full-batch Hamiltonian Monte Carlo (HMC) on modern architectures. The primary goal of this paper is to construct accurate samples from the posterior to understand the properties of BNN, without considering computational requirements and practicality. After showing the effective way to employ full batch HMC on modern neural architectures, the authors find that (1) BNNs can achieve significant performance gains over standard training and deep ensembles, but less robust to domain shift; (2) a single long HMC chain can provide a comparable performance to multiple shorter chains; (3) cold posterior effect is largely an artifact of data augmentation. (4) BMA performance is robust to the choice of prior scale; (5) while cheaper alternatives such as deep ensembles and SGMCMC can provide good generalization, their predictive distributions are distinct from HMC.