TVM学习（七）算子 | FPGA 开发圈

judy 在周二, 02/23/2021 - 10:58 提交

作者：安平博，Xilinx高级工程师；来源：AI加速微信公众号

relay算子

上一章梳理了一遍TVM前端流程，前端完成了tensorflow算子到TVM算子的转换。这一章CNN网络中最普遍的卷积运算为例，看一下relay算子的表示。在python/tvm/relay/frontend/tensorflow.py文件中convert_map有：

_convert_map = {
…
'Conv2D'                            : _conv('conv'),
…
}

在_conv函数中会根据layout对weights，inputs，outputs进行重排，然后调用AttrCvt来获得op。

out = AttrCvt(op_name=_dimension_picker('conv',

                                      surfix="_transpose" if opname == 'conv_transpose' else ""),

            ignores=['explicit_paddings'],

            transforms={

                'kernel_shape': 'kernel_size',

                'data_format': 'data_layout',

                'dilations': ('dilation', (0, 0)),

                'group': ('groups', 1)},

            custom_check=_dimension_constraint())([inputs_data, inputs[1]], attr)

AttrCvt的调用位于python/tvm/relay/frontend/common.py文件夹中，根据注释看出这个类主要实现算子转换，实际上是根据传入的op_name映射到relay的算子。首先会对传入的attrs进行检测，如果有不符合的属性会报错或者warning，如果属性有相应转换就进行新属性替换，最后调用get_relay_op。在这个函数中可以看到，依据op_name在全局字典_op中搜索相应op，然后返回。所有的op都位于python/tvm/relay/op包中，conv在op/nn中定义。nn.py中包含如下调用关系：conv2d -> _make.conv2d()。在_make.py中实际上实现了C++类到python类的注册，就是一行代码：

tvm._ffi._init_api("relay.op.nn._make", __name__)

_init_api函数在python/tvm/_ffi/registry.py中，我们可以看一下：

def _init_api(namespace, target_module_name=None):

   

    target_module_name = (

        target_module_name if target_module_name else namespace)

    if namespace.startswith("tvm."):

        _init_api_prefix(target_module_name, namespace[4:])

    else:

        _init_api_prefix(target_module_name, namespace)

 

 

def _init_api_prefix(module_name, prefix):

    module = sys.modules[module_name]

 

    for name in list_global_func_names():

        if not name.startswith(prefix):

            continue

 

        fname = name[len(prefix)+1:]

        target_module = module

 

        if fname.find(".") != -1:

            continue

        f = get_global_func(name)

        ff = _get_api(f)

        ff.__name__ = fname

        ff.__doc__ = ("TVM PackedFunc %s. " % fname)

        setattr(target_module, ff.__name__, ff)

实际上是通过名字获取C++注册的函数，然后设置给到_make.py文件中。这样就相当于_make.py文件中定义了conv2d相关的函数了。

接下来我们转移到C++中，看一看conv算子的实现。

在src/relay/op/nn/convolution.cc中有以下代码：

TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d").set_body_typed([](Expr data, Expr weight, Array strides, Array padding,  Array dilation, int groups, IndexExpr channels, Array kernel_size, String data_layout, String kernel_layout, String out_layout, DataType out_dtype) {

      return MakeConv(data, weight, strides, padding, dilation, groups, channels, kernel_size, data_layout, kernel_layout, out_layout, out_dtype, "nn.conv2d");

    });



RELAY_REGISTER_OP("nn.conv2d")

    .describe(R"code(2D convolution layer (e.g. spatial convolution over images).

 

This layer creates a convolution kernel that is convolved

with the layer input to produce a tensor of outputs.

 

- **data**: This depends on the `layout` parameter. Input is 4D array of shape

            (batch_size, in_channels, height, width) if `layout` is `NCHW`.

- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])

- **out**:  This depends on the `layout` parameter. Output is 4D array of shape

            (batch_size, channels, out_height, out_width) if `layout` is `NCHW`.

 

)code" TVM_ADD_FILELINE)

    .set_attrs_type()

    .set_num_inputs(2)

    .add_argument("data", "Tensor", "The input tensor.")

    .add_argument("weight", "Tensor", "The weight tensor.")

    .set_support_level(2)

    .add_type_rel("Conv2D", Conv2DRel)

.set_attr("FInferCorrectLayout", ConvInferCorrectLayout);

C++通过宏定义TVM_REGISTER_GLOBAL将算子注册到一个全局对象中。可以看一下这个宏定义：

#define TVM_REGISTER_GLOBAL(OpName) \

  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

可以看到注册实现在Registry类中，其中有一个Register函数，这个函数会通过全局manager来将算子注册进去。Set_body会将lamda以及普通函数设置给registry中的统一函数封装形式。

Set_body中将通过MakeConv构建一个conv算子，然后注册到registry中。在MakeConv中，首先根据传入的conv参数，包括strides，kernel，layout等，构建atrrs对象，然后根据op的名字从已经注册过的conv算子中得到conv的算子，然后再将attrs和op一起打包到call类中。

Op算子是通过RELAY_REGISTER_OP注册到一个公共AttrRegistry中的。在一个op类中实际上并没有包含这个op的计算过程，只是纳入了这个算子的输入输出以及属性的信息。

TOPI算子

TOPI是TVM自身的一个算子库，这些算子可以通过te来进行表达，类似于numpy的方式。比如对于numpy有np.sum()，同样tvm也可以有te.sum这样的表示。这为通过tvm语言来构造计算图结构提供了方便。我们用官方的一个例子来深入追踪一下topi算子的代码。

n = te.var("n")
m = te.var("m")
A = te.placeholder((n, m), name="A")
k = te.reduce_axis((0, m), "k")
B = te.compute((n,), lambda i: te.sum(A[i, k], axis=k), name="B")
s = te.create_schedule(B.op)
print(tvm.lower(s, [A], simple_mode=True))

输出的函数是这样的：

primfn(A_1: handle) -> ()  attr = {"global_symbol": "main", "tir.noalias": True}  
buffers = {A: Buffer(A_2: Pointer(float32), float32, [n: int32, m: int32], [stride: int32, stride_1: int32], type="auto")}  
buffer_map = {A_1: A} {  attr [B: Pointer(float32)] "storage_scope" = "global";  
allocate(B, float32, [n]);  
for (i: int32, 0, n) {    
B[i] = 0f32    
for (k: int32, 0, m) {      
B[i] = ((float32*)B[i] + (float32*)A_2[((i*stride) + (k*stride_1))])    }  
}
}

Var就类似于tensorflow中variable，创建了一个变量。其调用链为python/tvm/te/operation.py -> python/tvm/tir/expr.py -> src/tir/ir/expr.cc。var继承了PrimExpr类，建立var的时候创建了VarNode。VarNode中保存了变量的类型，名字等信息。

Placeholder也类似tensorflow中的占位符，实际上最终创建了一个PlaceholderOp，保存了名字，shape，dtype信息。

接下来重点看一下compute这个算子。Python调用位于python/tvm/te/operation.py中。这个主要是实现lamba函数算子的转换。

def compute(shape, fcompute, name="compute", tag="", attrs=None):

fcompute是对应着lamba表达式。首先从fcompute的__code__中提取出变量名称和数目信息，然后对应每个输入变量和对应的shape信息一起创建IterVar。IterVar有点像for循环中的循环变量，这里做了IterVar的抽象。然后将IterVar传入fcompute创建了函数体。然后就是根据fcompute的类型建立ComputeOp或者TensorComputeOp。这里我们追踪一下ComputeOp的实现。在src/te/operation/compute_op.cc中。也是构建一个ComputeOpNoe。并记录下数据，表达式信息。

dim_var = [tvm.tir.IterVar((0, s), x, 0) for x, s in zip(arg_names, shape[:out_ndim])]

body = fcompute(*[v.var for v in dim_var])

    if isinstance(body, _tensor.TensorIntrinCall):

        for i, s in enumerate(shape[out_ndim:]):

            var_name = "ax" + str(i)

            dim_var.append(tvm.tir.IterVar((0, s), var_name, 4))

        op_node = _ffi_api.TensorComputeOp(name,

                                           tag,

                                           dim_var,

                                           body.reduce_axis,

                                           out_ndim,

                                           body.intrin,

                                           body.tensors,

                                           body.regions,

                                           body.scalar_inputs)

    else:

        if not isinstance(body, (list, tuple)):

            body = [body]

        body = convert(body)

        op_node = _ffi_api.ComputeOp(

            name, tag, attrs, dim_var, body)

 

num = op_node.num_outputs

outputs = tuple(op_node.output(i) for i in range(num))

return outputs[0] if num == 1 else outputs

TVM