apply_deep_ocrT_apply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr应用深度光学字符识别（算子）

名称

apply_deep_ocrT_apply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr — 在一组图像上应用深度光学字符识别模型进行推理。

签名

apply_deep_ocr(Image : : DeepOcrHandle, Mode : DeepOcrResult)

描述

apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr 算子将由 DeepOcrHandleDeepOcrHandleDeepOcrHandleDeepOcrHandledeepOcrHandledeep_ocr_handle 提供的深度 OCR 模型应用于输入图像元组 ImageImageImageImageimageimage。该算子返回 DeepOcrResultDeepOcrResultDeepOcrResultDeepOcrResultdeepOcrResultdeep_ocr_result，这是一个元组，其中包含每个输入图像的对应结果字典。

算子 apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr 对输入 ImageImageImageImageimageimage 提出了以下要求：

图像类型： byte。
通道数： 1 或 3。

此外，apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr 算子将对给定 ImageImageImageImageimageimage 进行预处理以匹配模型规格。这意味着字节图像将被归一化并转换为实数类型。当 ModeModeModeModemodemode = 'auto'"auto""auto""auto""auto""auto" 或 'detection'"detection""detection""detection""detection""detection" 时，输入图像 ImageImageImageImageimageimage 将被填充至模型输入尺寸，若其仅含单通道，则会被转换为三通道图像。当 ModeModeModeModemodemode = 'recognition'"recognition""recognition""recognition""recognition""recognition" 时，三通道图像将自动转换为单通道图像。

参数 ModeModeModeModemodemode 指定一种模式，并由此确定执行哪个组件。支持的值：

'auto'"auto""auto""auto""auto""auto" (A)：: 执行两个部分：单词检测及其识别。
'detection'"detection""detection""detection""detection""detection" (DET)：: 仅执行检测部分。因此，该模型仅对图像中的单词区域进行定位。
'recognition'"recognition""recognition""recognition""recognition""recognition" (REC)：: 仅执行识别部分。因此，模型要求图像仅包含对单词的紧密裁剪。

请注意，该模型必须使用所需的组件创建，详见 create_deep_ocrcreate_deep_ocrCreateDeepOcrCreateDeepOcrCreateDeepOcrcreate_deep_ocr。

输出字典 DeepOcrResultDeepOcrResultDeepOcrResultDeepOcrResultdeepOcrResultdeep_ocr_result 的条目取决于所应用的 ModeModeModeModemodemode（以缩写标记）：

image (A, DET, REC)：

预处理后的图像。

score_maps (A, DET)：

以四通道图像形式呈现的评分：

特征评分：特征检测的评分。
链接评分：检测到的字符中心与连通单词的连接评分。
方向1：预测单词方向的正弦分量。
方向2：预测单词方向的余弦分量。

words (A, DET)：

包含以下条目的字典。因此，每个条目都是元组，其中包含每个找到单词的值。

word (A)：识别到的单词。
char_candidates (A)：收录了每个已识别单词中每个字符的信息的字典。该字典为每个单词包含一个键/值对：单词的索引作为键，一个字典元组作为值。每个字符字典包含以下键/值对：
- 'candidate'：具有最佳 'recognition_num_char_candidates'"recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates" 候选项的元组。
- 'confidence'：基于 Softmax 的最佳候选项置信度值。请注意，这些值未经校准，使用时需谨慎。不同模型间的置信度值可能存在显著差异。
word_image (A)：包含该单词的预处理图像部分。
row (A, DET)：定位单词：中心点，行坐标。
col (A, DET)：定位单词：中心点，列坐标。
phi (A, DET)：定位单词：角度φ。
length1 (A, DET)：定位单词：边长的一半 1。
length2 (A, DET)：定位单词：边长的一半 2。
line_index (A, DET)：当 'detection_sort_by_line'"detection_sort_by_line""detection_sort_by_line""detection_sort_by_line""detection_sort_by_line""detection_sort_by_line" 设置为 'true'"true""true""true""true""true" 时，定位单词的行索引。

该单词的定位由定向矩形的参数决定，更多信息请参阅 gen_rectangle2gen_rectangle2GenRectangle2GenRectangle2GenRectangle2gen_rectangle2。

word_boxes_on_image (A, DET)：

包含在 image 中预处理图像的坐标系上定位的单词的字典。条目为元组，每个找到的单词都对应一个值。

row (A, DET)：定位单词：中心点，行坐标。
col (A, DET)：定位单词：中心点，列坐标。
phi (A, DET)：定位单词：角度φ。
length1 (A, DET)：定位单词：边长的一半 1。
length2 (A, DET)：定位单词：边长的一半 2。

该单词的定位由定向矩形的参数决定，更多信息请参阅 gen_rectangle2gen_rectangle2GenRectangle2GenRectangle2GenRectangle2gen_rectangle2。

word_boxes_on_score_maps (A, DET)：

包含单词定位的字典，其坐标系基于放置在 score_maps 评分图像。条目与上述 word_boxes_on_image 中的条目相同。

word (REC)：

已识别单词。

char_candidates (REC)：

包含识别出的单词中每个字符的信息的字典元组。

每个字符字典都包含以下键/值对：

'candidate'：具有最佳 'recognition_num_char_candidates'"recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates" 候选项的元组。
'confidence'：基于 Softmax 的最佳候选项置信度值。请注意，这些值未经校准，使用时需谨慎。不同模型间的置信度值可能存在显著差异。

识别组件可通过自定义数据进行重新训练，以进一步提升性能。更多信息请参阅 OCR / 深度 OCR。

注意

系统要求：要在GPU上运行此算子（参见 get_deep_ocr_paramget_deep_ocr_paramGetDeepOcrParamGetDeepOcrParamGetDeepOcrParamget_deep_ocr_param），需安装 cuDNN 和 cuBLAS。更多详情请参阅 “安装指南” 中“深度学习及基于深度学习方法的要求”一章。此外，此算子也可在 CPU 上运行。

执行信息

多线程类型：可重入（与非独占算子并行运行）。
多线程作用域：全局（可从任何线程调用）。
在内部数据级别上自动并行化。

此算子返回一个句柄。请注意，即使该句柄被用作特定算子的输入参数，这些算子仍可能改变此句柄类型的实例状态。

此算子支持取消超时和中断。

此算子支持中断超时和中断。

参数

ImageImageImageImageimageimage (输入对象) (multichannel-)image(-array) → object (byte)

输入图像。

DeepOcrHandleDeepOcrHandleDeepOcrHandleDeepOcrHandledeepOcrHandledeep_ocr_handle (输入控制) deep_ocr → (handle)

深度 OCR 模型的句柄。

ModeModeModeModemodemode (输入控制) string → (string)

推理模式。

默认值： []

值列表： 'auto'"auto""auto""auto""auto""auto", 'detection'"detection""detection""detection""detection""detection", 'recognition'"recognition""recognition""recognition""recognition""recognition"

DeepOcrResultDeepOcrResultDeepOcrResultDeepOcrResultdeepOcrResultdeep_ocr_result (输出控制) dict(-array) → (handle)

结果字典的元组。

结果

如果参数有效，算子 apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr 返回值 2 (H_MSG_TRUE)。如有必要，则抛出异常。

可能的前趋

get_deep_ocr_paramget_deep_ocr_paramGetDeepOcrParamGetDeepOcrParamGetDeepOcrParamget_deep_ocr_param, set_deep_ocr_paramset_deep_ocr_paramSetDeepOcrParamSetDeepOcrParamSetDeepOcrParamset_deep_ocr_param, create_deep_ocrcreate_deep_ocrCreateDeepOcrCreateDeepOcrCreateDeepOcrcreate_deep_ocr

模块

光学字符识别/光学字符验证

算子