create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc创建深度学习层损失连接时序分类(算子)

名称

create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc — 创建 CTC(Connectionist Temporal Classification,连接时序分类)损失层。

签名

create_dl_layer_loss_ctc( : : DLLayerInput, DLLayerInputLengths, DLLayerTarget, DLLayerTargetLengths, LayerName, GenParamName, GenParamValue : DLLayerLossCTC)

Herror T_create_dl_layer_loss_ctc(const Htuple DLLayerInput, const Htuple DLLayerInputLengths, const Htuple DLLayerTarget, const Htuple DLLayerTargetLengths, const Htuple LayerName, const Htuple GenParamName, const Htuple GenParamValue, Htuple* DLLayerLossCTC)

void CreateDlLayerLossCtc(const HTuple& DLLayerInput, const HTuple& DLLayerInputLengths, const HTuple& DLLayerTarget, const HTuple& DLLayerTargetLengths, const HTuple& LayerName, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* DLLayerLossCTC)

HDlLayer HDlLayer::CreateDlLayerLossCtc(const HDlLayer& DLLayerInputLengths, const HDlLayer& DLLayerTarget, const HDlLayer& DLLayerTargetLengths, const HString& LayerName, const HTuple& GenParamName, const HTuple& GenParamValue) const

HDlLayer HDlLayer::CreateDlLayerLossCtc(const HDlLayer& DLLayerInputLengths, const HDlLayer& DLLayerTarget, const HDlLayer& DLLayerTargetLengths, const HString& LayerName, const HString& GenParamName, const HString& GenParamValue) const

HDlLayer HDlLayer::CreateDlLayerLossCtc(const HDlLayer& DLLayerInputLengths, const HDlLayer& DLLayerTarget, const HDlLayer& DLLayerTargetLengths, const char* LayerName, const char* GenParamName, const char* GenParamValue) const

HDlLayer HDlLayer::CreateDlLayerLossCtc(const HDlLayer& DLLayerInputLengths, const HDlLayer& DLLayerTarget, const HDlLayer& DLLayerTargetLengths, const wchar_t* LayerName, const wchar_t* GenParamName, const wchar_t* GenParamValue) const   ( Windows only)

static void HOperatorSet.CreateDlLayerLossCtc(HTuple DLLayerInput, HTuple DLLayerInputLengths, HTuple DLLayerTarget, HTuple DLLayerTargetLengths, HTuple layerName, HTuple genParamName, HTuple genParamValue, out HTuple DLLayerLossCTC)

HDlLayer HDlLayer.CreateDlLayerLossCtc(HDlLayer DLLayerInputLengths, HDlLayer DLLayerTarget, HDlLayer DLLayerTargetLengths, string layerName, HTuple genParamName, HTuple genParamValue)

HDlLayer HDlLayer.CreateDlLayerLossCtc(HDlLayer DLLayerInputLengths, HDlLayer DLLayerTarget, HDlLayer DLLayerTargetLengths, string layerName, string genParamName, string genParamValue)

def create_dl_layer_loss_ctc(dllayer_input: HHandle, dllayer_input_lengths: HHandle, dllayer_target: HHandle, dllayer_target_lengths: HHandle, layer_name: str, gen_param_name: MaybeSequence[str], gen_param_value: MaybeSequence[Union[int, float, str]]) -> HHandle

描述

算子 create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc 创建一个连接时序分类(CTC)损失层,其句柄通过 DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc 返回。有关 CTC 损失的详细信息,请参阅下方引用的参考文献。

通过该损失层可训练序列到序列模型(Seq2Seq)。例如,可用于训练能够识别图像中文本的模型。为此需比较序列:将具有序列长度 DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths 的网络预测 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 与具有序列长度 DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths 的给定 DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target 进行对比。

以下变量对于理解输入形状至关重要:

该层期望接收多个层作为输入:

参数 LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name 用于设置单个层的名称。请注意,若使用 create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelCreateDlModelcreate_dl_model 创建模型,则创建网络中的每个层必须具有唯一名称。

CTC 损失函数在卷积神经网络(CNN)中的典型应用方式如下:输入序列需经 CNN 层编码,输出形状为 [widthT, height:1, depthC]。通常在大型全卷积分类器末端,会通过平均池化层将 height 池化至 1。关键在于末层需具备足够宽度以承载完整信息。为在输出 depth 获得序列预测,需在池化层后添加 1x1 卷积层,其核数量设为 C。在此应用场景中,CTC 损失将该卷积层作为输入层 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input。输入层的 width 决定了模型的最大输出序列长度。

CTC 损失函数可应用于输入序列长度与目标序列长度各异的批次输入项。TS 表示最大长。在 DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths 中,需为批次中的每个输入项单独指定其长度。

限制

支持以下泛型参数 GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name 及其对应值 GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value

'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"

确定 apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelApplyDlModelapply_dl_model 是否将此层的输出包含在字典 DLResultBatchDLResultBatchDLResultBatchDLResultBatchDLResultBatchdlresult_batch 中,即使未在 OutputsOutputsOutputsOutputsoutputsoutputs 中指定此层('true'"true""true""true""true""true")或不包含('false'"false""false""false""false""false")。

默认值: 'false'"false""false""false""false""false"

使用 create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc 算子创建的层,其特定参数可通过其他算子进行设置与检索。下表概述了可通过 set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param 设置的参数,以及可通过 get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_paramget_dl_layer_paramget_dl_layer_paramGetDlLayerParamGetDlLayerParamGetDlLayerParamget_dl_layer_param 检索的参数。请注意,算子 set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_paramget_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param 需基于 create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelCreateDlModelcreate_dl_model 创建的模型。

层参数 设置 获取
'input_layer'"input_layer""input_layer""input_layer""input_layer""input_layer" (DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input, DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths, DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target 和/或 DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths)
'name'"name""name""name""name""name" (LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name)
'output_layer'"output_layer""output_layer""output_layer""output_layer""output_layer" (DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc)
'shape'"shape""shape""shape""shape""shape"
'type'"type""type""type""type""type"
泛型层参数 设置 获取
'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"
'num_trainable_params'"num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params"

执行信息

参数

DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input (输入控制)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

带网络预测结果的输入层。

DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths (输入控制)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

输入层,用于指定批次中每项的输入序列长度。

DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target (输入控制)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

输入层,用于指定目标序列。若 CNN 的输入维度发生变化,该层的宽度将自动调整为与 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 层相同的宽度。

DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths (输入控制)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

输入层,用于指定批次中每项的目标序列长度。

LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name (输入控制)  string HTuplestrHTupleHtuple (string) (string) (HString) (char*)

输出层的名称。

GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name (输入控制)  attribute.name(-array) HTupleMaybeSequence[str]HTupleHtuple (string) (string) (HString) (char*)

泛型输入参数名称。

默认值: []

值列表: 'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"

GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value (输入控制)  attribute.value(-array) HTupleMaybeSequence[Union[int, float, str]]HTupleHtuple (string / integer / real) (string / int / long / double) (HString / Hlong / double) (char* / Hlong / double)

泛型输入参数值。

默认值: []

建议值: 'true'"true""true""true""true""true", 'false'"false""false""false""false""false"

DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc (输出控制)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

CTC 损失层。

示例(HDevelop)

* Create a simple Seq2Seq model which overfits to a single output sequence.

* Input sequence length
T := 6
* Number of classes including blank (blank is always class_id: 0)
C := 3
* Batch Size
N := 1
* Maximum length of target sequences
S := 3

* Model creation
create_dl_layer_input ('input', [T,1,1], [], [], Input)
create_dl_layer_dense (Input, 'dense', T*C, [], [], DLLayerDense)
create_dl_layer_reshape (DLLayerDense, 'dense_reshape', [T,1,C], [], [],\
                         ConvFinal)

* Training part

* Specify the shapes without batch-size
* (batch-size will be specified in the model).
create_dl_layer_input ('ctc_input_lengths', [1,1,1], [], [],\
                       DLLayerInputLengths)
create_dl_layer_input ('ctc_target', [S,1,1], [], [], DLLayerTarget)
create_dl_layer_input ('ctc_target_lengths', [1,1,1], [], [],\
                       DLLayerTargetLengths)
* Create the loss layer
create_dl_layer_loss_ctc (ConvFinal, DLLayerInputLengths, DLLayerTarget,\
                          DLLayerTargetLengths, 'ctc_loss', [], [],\
                          DLLayerLossCTC)

* Get all names so that users can set values
get_dl_layer_param (ConvFinal, 'name', CTCInputName)
get_dl_layer_param (DLLayerInputLengths, 'name', CTCInputLengthsName)
get_dl_layer_param (DLLayerTarget, 'name', CTCTargetName)
get_dl_layer_param (DLLayerTargetLengths, 'name', CTCTargetLengthsName)

* Inference part
create_dl_layer_softmax (ConvFinal, 'softmax', [], [], DLLayerSoftMax)
create_dl_layer_depth_max (DLLayerSoftMax, 'prediction', 'argmax', [], [],\
                           DLLayerDepthMaxArg, _)

* Setting a seed because the weights of the network are randomly initialized
set_system ('seed_rand', 35)

create_dl_model ([DLLayerLossCTC,DLLayerDepthMaxArg], DLModel)

set_dl_model_param (DLModel, 'batch_size', N)
set_dl_model_param (DLModel, 'runtime', 'gpu')
set_dl_model_param (DLModel, 'learning_rate', 1)

* Create input sample for training
InputSequence := [0,1,2,3,4,5]
TargetSequence := [1,2,1]
create_dict (InputSample)
set_dict_tuple (InputSample, 'input', InputSequence)
set_dict_tuple (InputSample, 'ctc_input_lengths', |InputSequence|)
set_dict_tuple (InputSample, 'ctc_target', TargetSequence)
set_dict_tuple (InputSample, 'ctc_target_lengths', |TargetSequence|)
Eps := 0.01

PredictedSequence := []
dev_inspect_ctrl ([InputSequence, TargetSequence, CTCLoss, PredictedValues,\
                  PredictedSequence])
MaxIterations:= 15
for I := 0 to MaxIterations by 1
  apply_dl_model (DLModel, InputSample, ['prediction','softmax'], \
                  DLResultBatch)
  get_dict_object (Softmax, DLResultBatch, 'softmax')
  get_dict_object (Prediction, DLResultBatch, 'prediction')
  PredictedValues := []
  for t := 0 to T-1 by 1
      get_grayval (Prediction, 0, t, PredictionValue)
      PredictedValues := [PredictedValues, PredictionValue]
  endfor
  train_dl_model_batch (DLModel, InputSample, DLTrainResult)

  get_dict_tuple (DLTrainResult, 'ctc_loss', CTCLoss)
  if (CTCLoss < Eps)
      break
  endif
  stop()
endfor

* Rudimentary implementation of fastest path prediction
PredictedSequence := []
LastV := -1
for I := 0 to |PredictedValues|-1 by 1
  V := PredictedValues[I]
  if (V == 0)
      LastV := -1
      continue
  endif
  if (|PredictedSequence| > 0 and V == LastV)
      continue
  endif
  PredictedSequence := [PredictedSequence, V]
  LastV :=  PredictedSequence[|PredictedSequence|-1]
endfor

参考文献

Graves Alex et al., "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. 2006.

模块

深度学习训练