本节介绍预测处理的流程。预测处理流程主要分为3部分,包括准备输入数据、执行、获取输出数据。
一、放入输入数据
简单的使用方法如下所示:
vector<string> input_names = predictor->GetInputNames(); unique_ptr<Tensor> input_t = predictor->GetInputHandle(input_names[0]); input_t->Reshape(input_shape); input_t->CopyFromCpu(input.data());
我们按照这个流程一步一步来深入
1、GetInputNames
这个调用有点绕,因为对外提供的头文件是paddle_infer作用域,因此这里的实际实现是先在paddle_infer下函数调用,然后调用了实际创建出来的AnalysisPredictor::GetInputNamse。
这一步是获取输入的节点名称。这里idx2feeds_是std::map<size_t, std::string>,保存的是模型文件中op->Type==feed的名称
// 接口类实现 namespace paddle_infer { std::vector<std::string> Predictor::GetInputNames() { return predictor_->GetInputNames(); } } // 实际实现 std::vector<std::string> AnalysisPredictor::GetInputNames() { std::vector<std::string> input_names; for (auto &item : idx2feeds_) { input_names.push_back(item.second); } return input_names; }
2、GetInputHandle
作用是根据节点名称获取到对应的内存区域。前文介绍过Scope中保存了所有节点的信息,这里就是拿到输入节点Scope的内存区域.这里executor保存的scope是predictor的sub_scope。
namespace paddle_infer { std::unique_ptr<Tensor> Predictor::GetInputHandle(const std::string &name) { return predictor_->GetInputTensor(name); } }
std::unique_ptr<ZeroCopyTensor> AnalysisPredictor::GetInputTensor( const std::string &name) { PADDLE_ENFORCE_NOT_NULL( executor_->scope()->FindVar(name), platform::errors::PreconditionNotMet( "The variable named %s is not found in the scope of the exector.", name)); // 拿到scope std::unique_ptr<ZeroCopyTensor> res( new ZeroCopyTensor(static_cast<void *>(executor_->scope()))); res->input_or_output_ = true; res->SetName(name); // 根据设备获取对应place if (platform::is_cpu_place(place_)) { res->SetPlace(PaddlePlace::kCPU); } else if (platform::is_xpu_place(place_)) { if (config_.lite_engine_enabled()) { // Currently, Paddle-Lite's XPU user interface only supports the transfer // of host data pointers. If it is currently used as a subgraph, execution // efficiency will be sacrificed, so it is temporarily set to cpu place. // And, the current lite engine of xpu must execute all parts of the // model. res->SetPlace(PaddlePlace::kCPU); } else { auto xpu_place = BOOST_GET_CONST(platform::XPUPlace, place_); res->SetPlace(PaddlePlace::kXPU, xpu_place.GetDeviceId()); } } else if (platform::is_npu_place(place_)) { auto npu_place = BOOST_GET_CONST(platform::NPUPlace, place_); res->SetPlace(PaddlePlace::kNPU, npu_place.GetDeviceId()); } else { auto gpu_place = BOOST_GET_CONST(platform::CUDAPlace, place_); res->SetPlace(PaddlePlace::kGPU, gpu_place.GetDeviceId()); } return res; }
3、ZeroCopyTensor::Reshape
这一步骤的作用就是操作输入tensort,重新确定输入数据的维度信息。这里我们会详细介绍一下tensor的操作。
3.1 基类及接口是paddle_infer::Tensor(paddle_tensor.h/inference/api/details/zero_copy_tensor.cc).ZeroCopyTensor(paddle_api.h)是paddle_infer::Tensor的子类,主要重写了copy相关的函数。会在下一小结具体讲述。
3.2 实际的reshape操作作用在Tensor::Reshape中,实际逻辑为从sub_scope中取出对应名称的Variable(framework/variable.h)并对其进行操作。
void Tensor::Reshape(const std::vector<int> &shape) { // 判断是否设置了name PADDLE_ENFORCE_EQ( name_.empty(), false, paddle::platform::errors::PreconditionNotMet( "Need to SetName first, so that the corresponding tensor can " "be retrieved.")); // 判断是否为input,只有input才能重新设置 PADDLE_ENFORCE_EQ(input_or_output_, true, paddle::platform::errors::PermissionDenied( "Can't reshape the output tensor, it is readonly")); // 获取scope,然后取出对应名称节点的变量并进行设置。这里使用的是sub_scope,其中保存的都是非永久性的节点 auto *scope = static_cast<paddle::framework::Scope *>(scope_); auto *var = scope->FindVar(name_); PADDLE_ENFORCE_NOT_NULL( var, paddle::platform::errors::PreconditionNotMet( "No tensor called [%s] in the runtime scope", name_)); auto *tensor = var->GetMutable<paddle::framework::LoDTensor>(); tensor->Resize(paddle::framework::make_ddim(shape)); }
3.3 var->GetMutable,这里实际在Variable中创建对应类型的存储数据。存储数据用LoDTensor(lod_tensor.h),创建一个LoDTensor的对象赋值给
template <typename T> T* GetMutable() { if (!holder_) { holder_.reset(new PlaceholderImpl<T>()); } else { PADDLE_ENFORCE_EQ( holder_->Type(), VarTypeTrait<T>::kId, platform::errors::InvalidArgument( "The Variable type must be %s, but the type it holds is %s.", ToTypeName(VarTypeTrait<T>::kId), ToTypeName(holder_->Type()))); } return static_cast<T*>(holder_->Ptr()); }
PlaceholderImpl是一个模板类,用于包装T,这样Variable类在构造时不需要包含模板,只需要把Placeholder指针作为成员变量即可std::shared_ptr<Placeholder> holder_;PlaceholderImpl构造时会保存obj指针,同时保存obj的类型序号,序号实际在proto::VarType中定义。对应关系实现已注册好。
// Placeholder hides type T, so it doesn't appear as a template // parameter of Variable. template <typename T> struct PlaceholderImpl : public Placeholder { static_assert( IsRegisteredVarType<T>(), "Not registered type. Please register T inside var_type_traits.h"); PlaceholderImpl() { this->Init(&obj_, VarTypeTrait<T>::kId); } private: T obj_; };
这里会检查T类型是否已注册,注册列表详见framework/var_type_traits.h
REG_PROTO_VAR_TYPE_TRAIT(LoDTensor, proto::VarType::LOD_TENSOR); REG_PROTO_VAR_TYPE_TRAIT(SelectedRows, proto::VarType::SELECTED_ROWS); REG_PROTO_VAR_TYPE_TRAIT(std::vector<Scope *>, proto::VarType::STEP_SCOPES); REG_PROTO_VAR_TYPE_TRAIT(LoDRankTable, proto::VarType::LOD_RANK_TABLE); REG_PROTO_VAR_TYPE_TRAIT(LoDTensorArray, proto::VarType::LOD_TENSOR_ARRAY); REG_PROTO_VAR_TYPE_TRAIT(platform::PlaceList, proto::VarType::PLACE_LIST); REG_PROTO_VAR_TYPE_TRAIT(ReaderHolder, proto::VarType::READER); REG_PROTO_VAR_TYPE_TRAIT(FeedList, proto::VarType::FEED_LIST); REG_PROTO_VAR_TYPE_TRAIT(FetchList, proto::VarType::FETCH_LIST); REG_PROTO_VAR_TYPE_TRAIT(int, proto::VarType::INT32); REG_PROTO_VAR_TYPE_TRAIT(float, proto::VarType::FP32); REG_PROTO_VAR_TYPE_TRAIT(Vocab, proto::VarType::VOCAB); REG_PROTO_VAR_TYPE_TRAIT(String, proto::VarType::STRING); REG_PROTO_VAR_TYPE_TRAIT(Strings, proto::VarType::STRINGS);
3.4 LoDTensor这里命名空间为paddle::framework,注意与之前paddle_infer::Tensor区分开。LoDTensor的父类为paddle::framework::Tensor(framework/tensor.h),Resize操作也是直接使用父类函数
Tensor& Tensor::Resize(const DDim& dims) { dims_ = dims; return *this; }
4. ZeroCopyTensor::CopyFromCpu
这一步真正进行内存拷贝。我们分4步详细介绍
template <typename T> void Tensor::CopyFromCpu(const T *data) { 1.EAGER_GET_TENSOR(paddle::framework::LoDTensor); PADDLE_ENFORCE_GE(tensor->numel(), 0, paddle::platform::errors::PreconditionNotMet( "You should call Tensor::Reshape(const " "std::vector<int> &shape)" "function before copying data from cpu.")); 2.size_t ele_size = tensor->numel() * sizeof(T); 3.if (place_ == PlaceType::kCPU) { auto *t_data = tensor->mutable_data<T>(paddle::platform::CPUPlace()); std::memcpy(static_cast<void *>(t_data), data, ele_size); 4.} else if (place_ == PlaceType::kGPU) { #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) paddle::platform::DeviceContextPool &pool = paddle::platform::DeviceContextPool::Instance(); paddle::platform::CUDAPlace gpu_place(device_); auto *t_data = tensor->mutable_data<T>(gpu_place); auto *dev_ctx = static_cast<const paddle::platform::CUDADeviceContext *>( pool.Get(gpu_place)); paddle::memory::Copy(gpu_place, static_cast<void *>(t_data), paddle::platform::CPUPlace(), data, ele_size, dev_ctx->stream()); #else PADDLE_THROW(paddle::platform::errors::Unavailable( "Can not create tensor with CUDA place because paddle is not compiled " "with CUDA.")); #endif } else if (place_ == PlaceType::kXPU) { ...// 昆仑xpu相关 } else if (place_ == PlaceType::kNPU) { ...// 华为昇腾相关 } else { PADDLE_THROW(paddle::platform::errors::InvalidArgument( "The analysis predictor supports CPU, GPU, NPU and XPU now.")); } }
4.1 取出scope对应var中创建的LoDTensor指针,赋值给tensor_
1调用入口 EAGER_GET_TENSOR(paddle::framework::LoDTensor); 2 调用FindTensor获取指针 #define EAGER_GET_TENSOR(tensor_type) \ if (!tensor_) { \ tensor_ = FindTensor<tensor_type>(); \ } \ auto *tensor = static_cast<tensor_type *>(tensor_); 3 实际逻辑,在scope对应var中使用GetMutable,由于Reshape时已经调用该接口进行了创建,而且本地调用类型与创建类型一致,会直接获取之前创建的LoDTensor对象指针。 template <typename T> void *Tensor::FindTensor() const { PADDLE_ENFORCE_EQ( name_.empty(), false, paddle::platform::errors::PreconditionNotMet( "Need to SetName first, so that the corresponding tensor can " "be retrieved.")); auto *scope = static_cast<paddle::framework::Scope *>(scope_); auto *var = scope->FindVar(name_); PADDLE_ENFORCE_NOT_NULL( var, paddle::platform::errors::PreconditionNotMet( "No tensor called [%s] in the runtime scope", name_)); auto *tensor = var->GetMutable<T>(); return tensor; }
4.2