论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start




  • Scalable Bayesian Optimization Using Deep Neural Networks
  • Scalable Hyperparameter Transfer Learning

论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start
论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start
论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start

论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start
论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start
此时算法复杂度进一步从N(D3+D2)下降为ND2+D3 (这里N=T)(算法复杂度待核算)

论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start

如上,子任务数据集Di被同时输入前馈网络NN,这基于目标拟合函数的基向量(或潜在表示向量)具有一定相似性的前提,我们在NN中采用共享权重,并将多输出结果分别传递到各自的BO-warm start估计函数中,计算GP参数,计算误差,计算偏导,再将NN的偏导权重(求和或加权后)反馈回NN网络,完成一次训练,结果是NN的潜在表示系数被学习出来,各自子任务的GP参数也被学习出来。

论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start
论文解读:Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start



  1. Would you mind sharing your idea about why your algorithm work more robustly comparing with the one in [21], as you mention in the sixth line on page 3?
  • What we meant there is that L-BFGS does not come with hyperparameters such as the SGD stepsize (note that [21] uses SGD). This is an advantage as you would have to set the stepsize for each specific BO problem, and if you just fix the stepsize to be the same for all BO problems, then you algorithm may not perform as robustly.
  1. I wonder if I miss the proof that ABLR can scale linearly, or you think it is a prerequisite knowledge that did not mentioned it. Would you like to point out where I can find it?
  • The idea is that instead of inverting an N x N matrix when computing the predictive mean and variance, you invert a D x D matrix, so the scaling is D^3 instead of N^3. This can be observed directly from looking at equations (1), (2), (3), where the most expensive operation is the matrix inversion.

Thanks for Valerio Perrone's answer to the questions purposed on this page.


下一篇:python – 从pymc3中的推断参数预测