create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc创建深度学习层损失连接时序分类（算子）

名称

create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc — 创建 CTC（Connectionist Temporal Classification，连接时序分类）损失层。

签名

create_dl_layer_loss_ctc( : : DLLayerInput, DLLayerInputLengths, DLLayerTarget, DLLayerTargetLengths, LayerName, GenParamName, GenParamValue : DLLayerLossCTC)

描述

算子 create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc 创建一个连接时序分类（CTC）损失层，其句柄通过 DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc 返回。有关 CTC 损失的详细信息，请参阅下方引用的参考文献。

通过该损失层可训练序列到序列模型（Seq2Seq）。例如，可用于训练能够识别图像中文本的模型。为此需比较序列：将具有序列长度 DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths 的网络预测 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 与具有序列长度 DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths 的给定 DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target 进行对比。

以下变量对于理解输入形状至关重要：

T：最大输入序列长度（即 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 的 width）
S：最大输出序列长度（即 DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target 的 width）
C：类别数量（包含0作为空白类别ID）（即 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 的 depth）

该层期望接收多个层作为输入：

DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input：指定网络预测层。

形状：[T,1,C]
DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths：指定批次中每项的输入序列长度。

形状：[1,1,1]
DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target：指定目标序列。

形状：[S,1,1]
DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths：输入层，用于指定批次中每项的目标序列长度。

形状：[1,1,1]

参数 LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name 用于设置单个层的名称。请注意，若使用 create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelCreateDlModelcreate_dl_model 创建模型，则创建网络中的每个层必须具有唯一名称。

CTC 损失函数在卷积神经网络（CNN）中的典型应用方式如下：输入序列需经 CNN 层编码，输出形状为 [width：T, height：1, depth：C]。通常在大型全卷积分类器末端，会通过平均池化层将 height 池化至 1。关键在于末层需具备足够宽度以承载完整信息。为在输出 depth 获得序列预测，需在池化层后添加 1x1 卷积层，其核数量设为 C。在此应用场景中，CTC 损失将该卷积层作为输入层 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input。输入层的 width 决定了模型的最大输出序列长度。

CTC 损失函数可应用于输入序列长度与目标序列长度各异的批次输入项。T 和 S 表示最大长。在 DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths 和 DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths 中，需为批次中的每个输入项单独指定其长度。

限制

包含此层的模型无法在 CPU 上训练。
包含此层的模型无法使用 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" != 1.0 进行训练。
输入层 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 不得为 softmax 层。该层内部已完成 softmax 计算。推理时需额外连接 softmax 层至 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input（参见 create_dl_layer_softmaxcreate_dl_layer_softmaxCreateDlLayerSoftmaxCreateDlLayerSoftmaxCreateDlLayerSoftmaxcreate_dl_layer_softmax）。

支持以下泛型参数 GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name 及其对应值 GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value：

'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"：

确定 apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelApplyDlModelapply_dl_model 是否将此层的输出包含在字典 DLResultBatchDLResultBatchDLResultBatchDLResultBatchDLResultBatchdlresult_batch 中，即使未在 OutputsOutputsOutputsOutputsoutputsoutputs 中指定此层（'true'"true""true""true""true""true"）或不包含（'false'"false""false""false""false""false"）。

默认值： 'false'"false""false""false""false""false"

使用 create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc 算子创建的层，其特定参数可通过其他算子进行设置与检索。下表概述了可通过 set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param 设置的参数，以及可通过 get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param 或 get_dl_layer_paramget_dl_layer_paramGetDlLayerParamGetDlLayerParamGetDlLayerParamget_dl_layer_param 检索的参数。请注意，算子 set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param 和 get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param 需基于 create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelCreateDlModelcreate_dl_model 创建的模型。

层参数	设置	获取
'input_layer'"input_layer""input_layer""input_layer""input_layer""input_layer" (`DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input`, `DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths`, `DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target` 和/或 `DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths`)
'name'"name""name""name""name""name" (`LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name`)
'output_layer'"output_layer""output_layer""output_layer""output_layer""output_layer" (`DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc`)
'shape'"shape""shape""shape""shape""shape"
'type'"type""type""type""type""type"

泛型层参数	设置	获取
'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"
'num_trainable_params'"num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params"

执行信息

多线程类型：可重入（与非独占算子并行运行）。
多线程作用域：全局（可从任何线程调用）。
未采用并行化处理。

参数

DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input (输入控制) dl_layer → (handle)

带网络预测结果的输入层。

DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths (输入控制) dl_layer → (handle)

输入层，用于指定批次中每项的输入序列长度。

DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target (输入控制) dl_layer → (handle)

输入层，用于指定目标序列。若 CNN 的输入维度发生变化，该层的宽度将自动调整为与 DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input 层相同的宽度。

DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths (输入控制) dl_layer → (handle)

输入层，用于指定批次中每项的目标序列长度。

LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name (输入控制) string → (string)

输出层的名称。

GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name (输入控制) attribute.name(-array) → (string)

泛型输入参数名称。

默认值： []

值列表： 'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"

GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value (输入控制) attribute.value(-array) → (string / integer / real)

泛型输入参数值。

默认值： []

建议值： 'true'"true""true""true""true""true", 'false'"false""false""false""false""false"

DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc (输出控制) dl_layer → (handle)

CTC 损失层。

示例（HDevelop）

* Create a simple Seq2Seq model which overfits to a single output sequence.

* Input sequence length
T := 6
* Number of classes including blank (blank is always class_id: 0)
C := 3
* Batch Size
N := 1
* Maximum length of target sequences
S := 3

* Model creation
create_dl_layer_input ('input', [T,1,1], [], [], Input)
create_dl_layer_dense (Input, 'dense', T*C, [], [], DLLayerDense)
create_dl_layer_reshape (DLLayerDense, 'dense_reshape', [T,1,C], [], [],\
                         ConvFinal)

* Training part

* Specify the shapes without batch-size
* (batch-size will be specified in the model).
create_dl_layer_input ('ctc_input_lengths', [1,1,1], [], [],\
                       DLLayerInputLengths)
create_dl_layer_input ('ctc_target', [S,1,1], [], [], DLLayerTarget)
create_dl_layer_input ('ctc_target_lengths', [1,1,1], [], [],\
                       DLLayerTargetLengths)
* Create the loss layer
create_dl_layer_loss_ctc (ConvFinal, DLLayerInputLengths, DLLayerTarget,\
                          DLLayerTargetLengths, 'ctc_loss', [], [],\
                          DLLayerLossCTC)

* Get all names so that users can set values
get_dl_layer_param (ConvFinal, 'name', CTCInputName)
get_dl_layer_param (DLLayerInputLengths, 'name', CTCInputLengthsName)
get_dl_layer_param (DLLayerTarget, 'name', CTCTargetName)
get_dl_layer_param (DLLayerTargetLengths, 'name', CTCTargetLengthsName)

* Inference part
create_dl_layer_softmax (ConvFinal, 'softmax', [], [], DLLayerSoftMax)
create_dl_layer_depth_max (DLLayerSoftMax, 'prediction', 'argmax', [], [],\
                           DLLayerDepthMaxArg, _)

* Setting a seed because the weights of the network are randomly initialized
set_system ('seed_rand', 35)

create_dl_model ([DLLayerLossCTC,DLLayerDepthMaxArg], DLModel)

set_dl_model_param (DLModel, 'batch_size', N)
set_dl_model_param (DLModel, 'runtime', 'gpu')
set_dl_model_param (DLModel, 'learning_rate', 1)

* Create input sample for training
InputSequence := [0,1,2,3,4,5]
TargetSequence := [1,2,1]
create_dict (InputSample)
set_dict_tuple (InputSample, 'input', InputSequence)
set_dict_tuple (InputSample, 'ctc_input_lengths', |InputSequence|)
set_dict_tuple (InputSample, 'ctc_target', TargetSequence)
set_dict_tuple (InputSample, 'ctc_target_lengths', |TargetSequence|)
Eps := 0.01

PredictedSequence := []
dev_inspect_ctrl ([InputSequence, TargetSequence, CTCLoss, PredictedValues,\
                  PredictedSequence])
MaxIterations:= 15
for I := 0 to MaxIterations by 1
  apply_dl_model (DLModel, InputSample, ['prediction','softmax'], \
                  DLResultBatch)
  get_dict_object (Softmax, DLResultBatch, 'softmax')
  get_dict_object (Prediction, DLResultBatch, 'prediction')
  PredictedValues := []
  for t := 0 to T-1 by 1
      get_grayval (Prediction, 0, t, PredictionValue)
      PredictedValues := [PredictedValues, PredictionValue]
  endfor
  train_dl_model_batch (DLModel, InputSample, DLTrainResult)

  get_dict_tuple (DLTrainResult, 'ctc_loss', CTCLoss)
  if (CTCLoss < Eps)
      break
  endif
  stop()
endfor

* Rudimentary implementation of fastest path prediction
PredictedSequence := []
LastV := -1
for I := 0 to |PredictedValues|-1 by 1
  V := PredictedValues[I]
  if (V == 0)
      LastV := -1
      continue
  endif
  if (|PredictedSequence| > 0 and V == LastV)
      continue
  endif
  PredictedSequence := [PredictedSequence, V]
  LastV :=  PredictedSequence[|PredictedSequence|-1]
endfor

参考文献

Graves Alex et al., "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. 2006.

模块

深度学习训练

算子