PaddlePaddle inference 源码分析(四)

本节介绍预测处理的流程。预测处理流程主要分为3部分,包括准备输入数据、执行、获取输出数据。

一、放入输入数据

简单的使用方法如下所示:

vector<string> input_names = predictor->GetInputNames();
unique_ptr<Tensor> input_t = predictor->GetInputHandle(input_names[0]);
input_t->Reshape(input_shape);
input_t->CopyFromCpu(input.data());

我们按照这个流程一步一步来深入

1、GetInputNames

这个调用有点绕,因为对外提供的头文件是paddle_infer作用域,因此这里的实际实现是先在paddle_infer下函数调用,然后调用了实际创建出来的AnalysisPredictor::GetInputNamse。

这一步是获取输入的节点名称。这里idx2feeds_是std::map<size_t, std::string>,保存的是模型文件中op->Type==feed的名称

// 接口类实现
namespace paddle_infer {
std::vector<std::string> Predictor::GetInputNames() {
  return predictor_->GetInputNames();
}
}

// 实际实现
std::vector<std::string> AnalysisPredictor::GetInputNames() {
  std::vector<std::string> input_names;
  for (auto &item : idx2feeds_) {
    input_names.push_back(item.second);
  }
  return input_names;
}

2、GetInputHandle

作用是根据节点名称获取到对应的内存区域。前文介绍过Scope中保存了所有节点的信息,这里就是拿到输入节点Scope的内存区域.这里executor保存的scope是predictor的sub_scope。

namespace paddle_infer {
std::unique_ptr<Tensor> Predictor::GetInputHandle(const std::string &name) {
  return predictor_->GetInputTensor(name);
}
}
std::unique_ptr<ZeroCopyTensor> AnalysisPredictor::GetInputTensor(
    const std::string &name) {
  PADDLE_ENFORCE_NOT_NULL(
      executor_->scope()->FindVar(name),
      platform::errors::PreconditionNotMet(
          "The variable named %s is not found in the scope of the exector.",
          name));
  // 拿到scope
  std::unique_ptr<ZeroCopyTensor> res(
      new ZeroCopyTensor(static_cast<void *>(executor_->scope())));
  res->input_or_output_ = true;
  res->SetName(name);
  // 根据设备获取对应place
  if (platform::is_cpu_place(place_)) {
    res->SetPlace(PaddlePlace::kCPU);
  } else if (platform::is_xpu_place(place_)) {
    if (config_.lite_engine_enabled()) {
      // Currently, Paddle-Lite's XPU user interface only supports the transfer
      // of host data pointers. If it is currently used as a subgraph, execution
      // efficiency will be sacrificed, so it is temporarily set to cpu place.
      // And, the current lite engine of xpu must execute all parts of the
      // model.
      res->SetPlace(PaddlePlace::kCPU);
    } else {
      auto xpu_place = BOOST_GET_CONST(platform::XPUPlace, place_);
      res->SetPlace(PaddlePlace::kXPU, xpu_place.GetDeviceId());
    }
  } else if (platform::is_npu_place(place_)) {
    auto npu_place = BOOST_GET_CONST(platform::NPUPlace, place_);
    res->SetPlace(PaddlePlace::kNPU, npu_place.GetDeviceId());
  } else {
    auto gpu_place = BOOST_GET_CONST(platform::CUDAPlace, place_);
    res->SetPlace(PaddlePlace::kGPU, gpu_place.GetDeviceId());
  }
  return res;
}

3、ZeroCopyTensor::Reshape

这一步骤的作用就是操作输入tensort,重新确定输入数据的维度信息。这里我们会详细介绍一下tensor的操作。

3.1 基类及接口是paddle_infer::Tensor(paddle_tensor.h/inference/api/details/zero_copy_tensor.cc).ZeroCopyTensor(paddle_api.h)是paddle_infer::Tensor的子类,主要重写了copy相关的函数。会在下一小结具体讲述。

3.2 实际的reshape操作作用在Tensor::Reshape中,实际逻辑为从sub_scope中取出对应名称的Variable(framework/variable.h)并对其进行操作。

void Tensor::Reshape(const std::vector<int> &shape) {
  // 判断是否设置了name
  PADDLE_ENFORCE_EQ(
      name_.empty(), false,
      paddle::platform::errors::PreconditionNotMet(
          "Need to SetName first, so that the corresponding tensor can "
          "be retrieved."));
  // 判断是否为input,只有input才能重新设置
  PADDLE_ENFORCE_EQ(input_or_output_, true,
                    paddle::platform::errors::PermissionDenied(
                        "Can't reshape the output tensor, it is readonly"));
  // 获取scope,然后取出对应名称节点的变量并进行设置。这里使用的是sub_scope,其中保存的都是非永久性的节点
  auto *scope = static_cast<paddle::framework::Scope *>(scope_);
  auto *var = scope->FindVar(name_);
  PADDLE_ENFORCE_NOT_NULL(
      var, paddle::platform::errors::PreconditionNotMet(
               "No tensor called [%s] in the runtime scope", name_));
  auto *tensor = var->GetMutable<paddle::framework::LoDTensor>();
  tensor->Resize(paddle::framework::make_ddim(shape));
}

3.3 var->GetMutable,这里实际在Variable中创建对应类型的存储数据。存储数据用LoDTensor(lod_tensor.h),创建一个LoDTensor的对象赋值给

  template <typename T>
  T* GetMutable() {
    if (!holder_) {
      holder_.reset(new PlaceholderImpl<T>());
    } else {
      PADDLE_ENFORCE_EQ(
          holder_->Type(), VarTypeTrait<T>::kId,
          platform::errors::InvalidArgument(
              "The Variable type must be %s, but the type it holds is %s.",
              ToTypeName(VarTypeTrait<T>::kId), ToTypeName(holder_->Type())));
    }
    return static_cast<T*>(holder_->Ptr());
  }

PlaceholderImpl是一个模板类,用于包装T,这样Variable类在构造时不需要包含模板,只需要把Placeholder指针作为成员变量即可std::shared_ptr<Placeholder> holder_;PlaceholderImpl构造时会保存obj指针,同时保存obj的类型序号,序号实际在proto::VarType中定义。对应关系实现已注册好。

  // Placeholder hides type T, so it doesn't appear as a template
  // parameter of Variable.
  template <typename T>
  struct PlaceholderImpl : public Placeholder {
    static_assert(
        IsRegisteredVarType<T>(),
        "Not registered type. Please register T inside var_type_traits.h");
    PlaceholderImpl() { this->Init(&obj_, VarTypeTrait<T>::kId); }

   private:
    T obj_;
  };

这里会检查T类型是否已注册,注册列表详见framework/var_type_traits.h

REG_PROTO_VAR_TYPE_TRAIT(LoDTensor, proto::VarType::LOD_TENSOR);
REG_PROTO_VAR_TYPE_TRAIT(SelectedRows, proto::VarType::SELECTED_ROWS);
REG_PROTO_VAR_TYPE_TRAIT(std::vector<Scope *>, proto::VarType::STEP_SCOPES);
REG_PROTO_VAR_TYPE_TRAIT(LoDRankTable, proto::VarType::LOD_RANK_TABLE);
REG_PROTO_VAR_TYPE_TRAIT(LoDTensorArray, proto::VarType::LOD_TENSOR_ARRAY);
REG_PROTO_VAR_TYPE_TRAIT(platform::PlaceList, proto::VarType::PLACE_LIST);
REG_PROTO_VAR_TYPE_TRAIT(ReaderHolder, proto::VarType::READER);
REG_PROTO_VAR_TYPE_TRAIT(FeedList, proto::VarType::FEED_LIST);
REG_PROTO_VAR_TYPE_TRAIT(FetchList, proto::VarType::FETCH_LIST);
REG_PROTO_VAR_TYPE_TRAIT(int, proto::VarType::INT32);
REG_PROTO_VAR_TYPE_TRAIT(float, proto::VarType::FP32);
REG_PROTO_VAR_TYPE_TRAIT(Vocab, proto::VarType::VOCAB);
REG_PROTO_VAR_TYPE_TRAIT(String, proto::VarType::STRING);
REG_PROTO_VAR_TYPE_TRAIT(Strings, proto::VarType::STRINGS);

3.4 LoDTensor这里命名空间为paddle::framework,注意与之前paddle_infer::Tensor区分开。LoDTensor的父类为paddle::framework::Tensor(framework/tensor.h),Resize操作也是直接使用父类函数

Tensor& Tensor::Resize(const DDim& dims) {
  dims_ = dims;
  return *this;
}

4. ZeroCopyTensor::CopyFromCpu 

这一步真正进行内存拷贝。我们分4步详细介绍

template <typename T>
void Tensor::CopyFromCpu(const T *data) {
  1.EAGER_GET_TENSOR(paddle::framework::LoDTensor);
  PADDLE_ENFORCE_GE(tensor->numel(), 0,
                    paddle::platform::errors::PreconditionNotMet(
                        "You should call Tensor::Reshape(const "
                        "std::vector<int> &shape)"
                        "function before copying data from cpu."));
  2.size_t ele_size = tensor->numel() * sizeof(T);

  3.if (place_ == PlaceType::kCPU) {
    auto *t_data = tensor->mutable_data<T>(paddle::platform::CPUPlace());
    std::memcpy(static_cast<void *>(t_data), data, ele_size);
  4.} else if (place_ == PlaceType::kGPU) {
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
    paddle::platform::DeviceContextPool &pool =
        paddle::platform::DeviceContextPool::Instance();
    paddle::platform::CUDAPlace gpu_place(device_);
    auto *t_data = tensor->mutable_data<T>(gpu_place);
    auto *dev_ctx = static_cast<const paddle::platform::CUDADeviceContext *>(
        pool.Get(gpu_place));

    paddle::memory::Copy(gpu_place, static_cast<void *>(t_data),
                         paddle::platform::CPUPlace(), data, ele_size,
                         dev_ctx->stream());
#else
    PADDLE_THROW(paddle::platform::errors::Unavailable(
        "Can not create tensor with CUDA place because paddle is not compiled "
        "with CUDA."));
#endif
  } else if (place_ == PlaceType::kXPU) {
   ...// 昆仑xpu相关
  } else if (place_ == PlaceType::kNPU) {
   ...// 华为昇腾相关
  } else {
    PADDLE_THROW(paddle::platform::errors::InvalidArgument(
        "The analysis predictor supports CPU, GPU, NPU and XPU now."));
  }
}

4.1 取出scope对应var中创建的LoDTensor指针,赋值给tensor_

1调用入口
EAGER_GET_TENSOR(paddle::framework::LoDTensor);
2 调用FindTensor获取指针
#define EAGER_GET_TENSOR(tensor_type)    \
  if (!tensor_) {                        \
    tensor_ = FindTensor<tensor_type>(); \
  }                                      \
  auto *tensor = static_cast<tensor_type *>(tensor_);
3 实际逻辑,在scope对应var中使用GetMutable,由于Reshape时已经调用该接口进行了创建,而且本地调用类型与创建类型一致,会直接获取之前创建的LoDTensor对象指针。
template <typename T>
void *Tensor::FindTensor() const {
  PADDLE_ENFORCE_EQ(
      name_.empty(), false,
      paddle::platform::errors::PreconditionNotMet(
          "Need to SetName first, so that the corresponding tensor can "
          "be retrieved."));
  auto *scope = static_cast<paddle::framework::Scope *>(scope_);
  auto *var = scope->FindVar(name_);
  PADDLE_ENFORCE_NOT_NULL(
      var, paddle::platform::errors::PreconditionNotMet(
               "No tensor called [%s] in the runtime scope", name_));
  auto *tensor = var->GetMutable<T>();
  return tensor;
}

4.2 

上一篇:Python——逻辑异或运算及图示(采用逻辑异或运算方法实现)(tkinter实现)


下一篇:tkinter布局定位方法place示例讲解,place与grid,pack方法混用