Skip to main content

OCR (Optical Character Recognition)

Basic Introduction

OCR (Optical Character Recognition) technology refers to the use of artificial intelligence models to extract text information from images. Its core task is to convert text content in images into editable text data, which is widely applied in fields such as document digitization, information extraction, and data entry.

Currently, the OCR model available on Model Plaza includes: GOT-OCR2_0

GOT-OCR2_0

GOT-OCR2_0 provides a powerful OCR solution, capable of extracting text information from images with high accuracy, speed, and comprehensiveness. It supports multiple languages and is suitable for various scenarios, such as extracting information from ID cards, bank cards, PDF documents, tables, license plates, handwritten text, equipment nameplates, mathematical formulas, and other images.

Usage Method

You can click GOT-OCR2_0 for free online experience. The following are code invocation examples.

curl https://moark.ai/v1/images/ocr \
-X POST \
-H "Authorization: Bearer 私人令牌" \
-F "model=GOT-OCR2_0"
-F "image=@path/to/image.jpg"
-F "response_format=text"

Parameter Description:

  • Private token: Used to verify the identity of the caller. Click Private Token to obtain it.
  • model: Fill in GOT-OCR2_0 to specify the use of the OCR large model.
  • image: The image file to be processed with OCR.
    • Supports png, jpg, jpeg, webp, gif formats.
    • Maximum resolution support: 4096X4096.
    • File size does not exceed 3MB.
  • response_format: Format type.
    • Value text returns plain text content.
    • Value format returns content in mathpix-markdown format, recommended for images with mathematical formulas.

Usage Example

image-20250114152140833

After executing the above code, the response will be:

{
"text": "美迪兰(南京)医疗设备有限公司\n名称:..."
}

The online experience effect of GOT-OCR2_0 is as follows:

image-20250114152442007