之前在测试NN中各个层的时间的时候,遇到一个非常奇怪的问题,分别使用Caffe自己的gpu方法和cuDNN方法,在卷积上性能差异非常大,但是在pooling层上基本没有变化。抽空检查了代码之后,发现是layer_factory模式导致的问题。下面就以下几个方面来进行
1.工厂模式
2.layer_factory详解
3.layer_factory中坑
4.问题影响分析
1.工厂模式
工厂模式是设计模式中的一种,面向的业务大概是在编码时不能预见需要创建那种类的实例,系统不依赖产品类如何被创建、组合和表达的细节,工厂模式的弊端是扩展比较少的项目中比较合适。
工厂模式有三种角色:
工厂类角色:根据逻辑产生具体的产品
抽象产品角色:具体产品的父类,一把由Java中的接口或者C++中的抽象类来实现
具体产品角色:产品实例
2.layer_factory详解
众所周知,Caffe1.0版本中,目前有三大类算子:CPU版本、Caffe自己实现的CUDA版本的和CuDNN版本的。layer_factory文件负责组装Caffe中算子,工厂模式的意思就是根据用户的设置,在执行时,选择相应版本的算子进行。
以下参考至http://zhuanlan.zhihu.com/hacker-and-painter/20456649
layer_factory.hpp是layer_factory的头文件
/** * @brief A layer factory that allows one to register layers. * During runtime, registered layers could be called by passing a LayerParameter * protobuffer to the CreateLayer function: * * LayerRegistry<Dtype>::CreateLayer(param); * * There are two ways to register a layer. Assuming that we have a layer like: * * template <typename Dtype> * class MyAwesomeLayer : public Layer<Dtype> { * // your implementations * }; * * and its type is its C++ class name, but without the "Layer" at the end * ("MyAwesomeLayer" -> "MyAwesome"). * * If the layer is going to be created simply by its constructor, in your c++ * file, add the following line: * * REGISTER_LAYER_CLASS(MyAwesome); * * Or, if the layer is going to be created by another creator function, in the * format of: * * template <typename Dtype> * Layer<Dtype*> GetMyAwesomeLayer(const LayerParameter& param) { * // your implementation * } * * (for example, when your layer has multiple backends, see GetConvolutionLayer * for a use case), then you can register the creator function instead, like * * REGISTER_LAYER_CREATOR(MyAwesome, GetMyAwesomeLayer) * * Note that each layer type should only be registered once. */ #ifndef CAFFE_LAYER_FACTORY_H_ #define CAFFE_LAYER_FACTORY_H_ #include <map> #include <string> #include "caffe/common.hpp" #include "caffe/proto/caffe.pb.h" namespace caffe { template <typename Dtype> class Layer; //LayerResistry的功能很简单,就是将类和对应的字符串类型放入到一个map当中去,以便灵活调用。主要就是注册类的功能 template <typename Dtype> class LayerRegistry { public: // 函数指针Creator,返回的是Layer<Dtype>类型的指针 typedef shared_ptr<Layer<Dtype> > (*Creator)(const LayerParameter&); // CreatorRegistry是字符串与对应的Creator的映射 typedef std::map<string, Creator> CreatorRegistry; static CreatorRegistry& Registry() { static CreatorRegistry* g_registry_ = new CreatorRegistry(); return *g_registry_; } // Adds a creator. // 根据类型和函数指针,加入到表中 static void AddCreator(const string& type, Creator creator) { CreatorRegistry& registry = Registry(); CHECK_EQ(registry.count(type), 0) << "Layer type " << type << " already registered."; registry[type] = creator; } // Get a layer using a LayerParameter. //给定层的类型,创建层 static shared_ptr<Layer<Dtype> > CreateLayer(const LayerParameter& param) { LOG(INFO) << "Creating layer " << param.name(); // 从参数中获得类型字符串 const string& type = param.type(); // 检查是否查找到给定type的Creator CreatorRegistry& registry = Registry(); CHECK_EQ(registry.count(type), 1) << "Unknown layer type: " << type << " (known types: " << LayerTypeList() << ")"; // 调用对应的层的Creator函数 return registry[type](param); } private: // Layer registry should never be instantiated - everything is done with its // static variables. // 禁止实例化,因为该类都是静态函数,所以是私有的 LayerRegistry() {} //返回层的类型列表 static string LayerTypeList() { // 获得注册表 CreatorRegistry& registry = Registry(); string layer_types; // 遍历注册表压入layer_types字符串容器 for (typename CreatorRegistry::iterator iter = registry.begin(); iter != registry.end(); ++iter) { if (iter != registry.begin()) { layer_types += ", "; } layer_types += iter->first; } return layer_types; } }; // LayerRegisterer // 自己定义层的注册器 // 以供后面的宏进行使用 template <typename Dtype> class LayerRegisterer { public: // 层的注册器的构造函数 LayerRegisterer(const string& type, shared_ptr<Layer<Dtype> > (*creator)(const LayerParameter&)) { // LOG(INFO) << "Registering layer type: " << type; // 还是调用的层注册表中的加入Creator函数加入注册表 LayerRegistry<Dtype>::AddCreator(type, creator); } }; //为了方便作者还弄了个宏便于注册自己写的层类 // 生成g_creator_f_type(type, creator<Dtype>)的两个函数 (double和float类型) #define REGISTER_LAYER_CREATOR(type, creator) \ static LayerRegisterer<float> g_creator_f_##type(#type, creator<float>); \ static LayerRegisterer<double> g_creator_d_##type(#type, creator<double>) \ /* 注册自己定义的类,类名为type, 假设比如type=bias,那么生成如下的代码 下面的函数直接调用你自己的类的构造函数生成一个类的实例并返回 CreatorbiasLayer(const LayerParameter& param) 下面的语句是为你自己的类定义了LayerRegisterer<float>类型的静态变量g_creator_f_biasLayer(float类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表) static LayerRegisterer<float> g_creator_f_biasLayer(bias, CreatorbiasLayer) 下面的语句为你自己的类定义了LayerRegisterer<double>类型的静态变量g_creator_d_biasLayer(double类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表) static LayerRegisterer<double> g_creator_d_biasLayer(bias, CreatorbiasLayer) */ #define REGISTER_LAYER_CLASS(type) \ template <typename Dtype> \ shared_ptr<Layer<Dtype> > Creator_##type##Layer(const LayerParameter& param) \ { \ return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param)); \ } \ REGISTER_LAYER_CREATOR(type, Creator_##type##Layer) } // namespace caffe #endif // CAFFE_LAYER_FACTORY_H_
经过上边的阐述之后,实现部分(这部分和1.0版本有出入,大的方面不影响)
layer_factory.hpp:
1 // Make sure we include Python.h before any system header 2 // to avoid _POSIX_C_SOURCE redefinition 3 #ifdef WITH_PYTHON_LAYER 4 #include <boost/python.hpp> 5 #endif 6 #include <string> 7 8 #include "caffe/layer.hpp" 9 #include "caffe/layer_factory.hpp" 10 #include "caffe/proto/caffe.pb.h" 11 #include "caffe/vision_layers.hpp" 12 13 #ifdef WITH_PYTHON_LAYER 14 #include "caffe/python_layer.hpp" 15 #endif 16 17 namespace caffe { 18 19 // 写一个获取卷积层实例的函数 20 // Get convolution layer according to engine. 21 template <typename Dtype> 22 shared_ptr<Layer<Dtype> > GetConvolutionLayer( 23 const LayerParameter& param) { 24 // 从参数中获取是使用什么引擎进行计算CUDNN还是CAFFE还是DEFAULT 25 // engine可从caffe.proto中看出是枚举类型的 26 ConvolutionParameter_Engine engine = param.convolution_param().engine(); 27 if (engine == ConvolutionParameter_Engine_DEFAULT) { 28 engine = ConvolutionParameter_Engine_CAFFE; 29 #ifdef USE_CUDNN 30 engine = ConvolutionParameter_Engine_CUDNN; 31 #endif 32 } 33 if (engine == ConvolutionParameter_Engine_CAFFE) { 34 // 直接初始化Caffe的卷积层 35 return shared_ptr<Layer<Dtype> >(new ConvolutionLayer<Dtype>(param)); 36 #ifdef USE_CUDNN 37 } else if (engine == ConvolutionParameter_Engine_CUDNN) { 38 // 初始化CUDNN的卷积层 39 return shared_ptr<Layer<Dtype> >(new CuDNNConvolutionLayer<Dtype>(param)); 40 #endif 41 } else {// 否则就是出错了 42 LOG(FATAL) << "Layer " << param.name() << " has unknown engine."; 43 } 44 } 45 // 注册该卷积层,类型名为Convolution,获取卷积层的实例为GetConvolutionLayer函数 46 REGISTER_LAYER_CREATOR(Convolution, GetConvolutionLayer); 47 48 // 获取池化层的实例,同卷积层的逻辑 49 // Get pooling layer according to engine. 50 template <typename Dtype> 51 shared_ptr<Layer<Dtype> > GetPoolingLayer(const LayerParameter& param) { 52 PoolingParameter_Engine engine = param.pooling_param().engine(); 53 if (engine == PoolingParameter_Engine_DEFAULT) { 54 engine = PoolingParameter_Engine_CAFFE; 55 #ifdef USE_CUDNN 56 engine = PoolingParameter_Engine_CUDNN; 57 #endif 58 } 59 if (engine == PoolingParameter_Engine_CAFFE) { 60 return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param)); 61 #ifdef USE_CUDNN 62 } else if (engine == PoolingParameter_Engine_CUDNN) { 63 PoolingParameter p_param = param.pooling_param(); 64 if (p_param.pad() || p_param.pad_h() || p_param.pad_w() || 65 param.top_size() > 1) { 66 LOG(INFO) << "CUDNN does not support padding or multiple tops. " 67 << "Using Caffe's own pooling layer."; 68 return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param)); 69 } 70 return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param)); 71 #endif 72 } else { 73 LOG(FATAL) << "Layer " << param.name() << " has unknown engine."; 74 } 75 } 76 77 // 注册池化层 78 REGISTER_LAYER_CREATOR(Pooling, GetPoolingLayer); 79 80 // 注册ReLU层 81 // Get relu layer according to engine. 82 template <typename Dtype> 83 shared_ptr<Layer<Dtype> > GetReLULayer(const LayerParameter& param) { 84 ReLUParameter_Engine engine = param.relu_param().engine(); 85 if (engine == ReLUParameter_Engine_DEFAULT) { 86 engine = ReLUParameter_Engine_CAFFE; 87 #ifdef USE_CUDNN 88 engine = ReLUParameter_Engine_CUDNN; 89 #endif 90 } 91 if (engine == ReLUParameter_Engine_CAFFE) { 92 return shared_ptr<Layer<Dtype> >(new ReLULayer<Dtype>(param)); 93 #ifdef USE_CUDNN 94 } else if (engine == ReLUParameter_Engine_CUDNN) { 95 return shared_ptr<Layer<Dtype> >(new CuDNNReLULayer<Dtype>(param)); 96 #endif 97 } else { 98 LOG(FATAL) << "Layer " << param.name() << " has unknown engine."; 99 } 100 } 101 102 REGISTER_LAYER_CREATOR(ReLU, GetReLULayer); 103 104 // 注册sigmoid层 105 // Get sigmoid layer according to engine. 106 template <typename Dtype> 107 shared_ptr<Layer<Dtype> > GetSigmoidLayer(const LayerParameter& param) { 108 SigmoidParameter_Engine engine = param.sigmoid_param().engine(); 109 if (engine == SigmoidParameter_Engine_DEFAULT) { 110 engine = SigmoidParameter_Engine_CAFFE; 111 #ifdef USE_CUDNN 112 engine = SigmoidParameter_Engine_CUDNN; 113 #endif 114 } 115 if (engine == SigmoidParameter_Engine_CAFFE) { 116 return shared_ptr<Layer<Dtype> >(new SigmoidLayer<Dtype>(param)); 117 #ifdef USE_CUDNN 118 } else if (engine == SigmoidParameter_Engine_CUDNN) { 119 return shared_ptr<Layer<Dtype> >(new CuDNNSigmoidLayer<Dtype>(param)); 120 #endif 121 } else { 122 LOG(FATAL) << "Layer " << param.name() << " has unknown engine."; 123 } 124 } 125 126 REGISTER_LAYER_CREATOR(Sigmoid, GetSigmoidLayer); 127 128 // 注册softmax层 129 // Get softmax layer according to engine. 130 template <typename Dtype> 131 shared_ptr<Layer<Dtype> > GetSoftmaxLayer(const LayerParameter& param) { 132 SoftmaxParameter_Engine engine = param.softmax_param().engine(); 133 if (engine == SoftmaxParameter_Engine_DEFAULT) { 134 engine = SoftmaxParameter_Engine_CAFFE; 135 #ifdef USE_CUDNN 136 engine = SoftmaxParameter_Engine_CUDNN; 137 #endif 138 } 139 if (engine == SoftmaxParameter_Engine_CAFFE) { 140 return shared_ptr<Layer<Dtype> >(new SoftmaxLayer<Dtype>(param)); 141 #ifdef USE_CUDNN 142 } else if (engine == SoftmaxParameter_Engine_CUDNN) { 143 return shared_ptr<Layer<Dtype> >(new CuDNNSoftmaxLayer<Dtype>(param)); 144 #endif 145 } else { 146 LOG(FATAL) << "Layer " << param.name() << " has unknown engine."; 147 } 148 } 149 150 REGISTER_LAYER_CREATOR(Softmax, GetSoftmaxLayer); 151 152 // 注册tanh层 153 // Get tanh layer according to engine. 154 template <typename Dtype> 155 shared_ptr<Layer<Dtype> > GetTanHLayer(const LayerParameter& param) { 156 TanHParameter_Engine engine = param.tanh_param().engine(); 157 if (engine == TanHParameter_Engine_DEFAULT) { 158 engine = TanHParameter_Engine_CAFFE; 159 #ifdef USE_CUDNN 160 engine = TanHParameter_Engine_CUDNN; 161 #endif 162 } 163 if (engine == TanHParameter_Engine_CAFFE) { 164 return shared_ptr<Layer<Dtype> >(new TanHLayer<Dtype>(param)); 165 #ifdef USE_CUDNN 166 } else if (engine == TanHParameter_Engine_CUDNN) { 167 return shared_ptr<Layer<Dtype> >(new CuDNNTanHLayer<Dtype>(param)); 168 #endif 169 } else { 170 LOG(FATAL) << "Layer " << param.name() << " has unknown engine."; 171 } 172 } 173 174 REGISTER_LAYER_CREATOR(TanH, GetTanHLayer); 175 176 // 注册PYTHON层 177 #ifdef WITH_PYTHON_LAYER 178 template <typename Dtype> 179 shared_ptr<Layer<Dtype> > GetPythonLayer(const LayerParameter& param) { 180 Py_Initialize(); 181 try { 182 bp::object module = bp::import(param.python_param().module().c_str()); 183 bp::object layer = module.attr(param.python_param().layer().c_str())(param); 184 return bp::extract<shared_ptr<PythonLayer<Dtype> > >(layer)(); 185 } catch (bp::error_already_set) { 186 PyErr_Print(); 187 throw; 188 } 189 } 190 191 REGISTER_LAYER_CREATOR(Python, GetPythonLayer); 192 #endif 193 194 // Layers that use their constructor as their default creator should be 195 // registered in their corresponding cpp files. Do not register them here. 196 } // namespace caffe
3.layer_factory中坑
在现有的代码中,Pooling层的注册部分出现了这个代码:
// CuDNN assumes layers are not being modified in place, thus // breaking our index tracking for updates in some cases in Caffe. // Until there is a workaround in Caffe (index management) or // cuDNN, use Caffe layer to max pooling, or don't use in place // layers after max pooling layers if (param.pooling_param().pool() == PoolingParameter_PoolMethod_MAX) { return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param)); } else { return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param)); }
这就直接导致,只要你用的是MaxPool,使用的一定是Caffe自己实现的cu代码,永远无法使用cuDNN版本的代码,这就解释了我们之前测试MaxPool层性能一直没有变化的原因
4.问题影响分析
但是caffe的作者为什么不使用cuDNN的MaxPool呢,经过查询NVIDIA cuDNN的User Manual,我们发现,
4.144. cudnnPoolingForward
cudnnStatus_t cudnnPoolingForward( cudnnHandle_t handle, const cudnnPoolingDescriptor_t poolingDesc, const void *alpha, const cudnnTensorDescriptor_t xDesc, const void *x, const void *beta, const cudnnTensorDescriptor_t yDesc, void *y)
This function computes pooling of input values (i.e., the maximum or average of several adjacent values) to produce an output with smaller height and/or width.
Note: All tensor formats are supported, best performance is expected when usingHW-packedtensors. Only 2 and 3 spatial dimensions are allowed. Note: The dimensions of the output tensoryDesccan be smaller or bigger than the dimensions advised by the routinecudnnGetPooling2dForwardOutputDimorcudnnGetPoolingNdForwardOutputDim.Parameters
- handle
-
Input. Handle to a previously created cuDNN context.
- poolingDesc
-
Input. Handle to a previously initialized pooling descriptor.
- alpha, beta
-
Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Refer to this section for additional details.
- xDesc
-
Input. Handle to the previously initialized input tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.
- x
-
Input. Data pointer to GPU memory associated with the tensor descriptorxDesc.
- yDesc
-
Input. Handle to the previously initialized output tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.
- y
-
Output. Data pointer to GPU memory associated with the output tensor descriptoryDesc.
The possible error values returned by this function and their meanings are listed below.
Returns
- CUDNN_STATUS_SUCCESS
-
The function launched successfully.
- CUDNN_STATUS_BAD_PARAM
-
At least one of the following conditions are met:
- The dimensionsn,cof the input tensor and output tensors differ.
- Thedatatypeof the input tensor and output tensors differs.
- CUDNN_STATUS_NOT_SUPPORTED
-
The function does not support the provided configuration. See the following for some examples of non-supported configurations:
- ThewStrideof input tensor or output tensor is not 1.
- CUDNN_STATUS_EXECUTION_FAILED
-
The function failed to launch on the GPU
这个地方比较神奇的是只能传入两个参数,这就无法实现mask的更新,不太明白cuDNN设计者的思路,目前看,这个地方要想保持正确性,暂时应该是无法使用cuDNN的PoolingForward了。