Skip to main content

Structured Output

Introduction

JSON is one of the most widely used data exchange formats in the world. JSON data that conforms to the JSON Schema specification can be extensively parsed and validated. MoArk's models support structured output functionality, allowing developers to specify a JSON Schema in requests to ensure that the model's generated responses conform to the expected format. Simply include the guided_json parameter in the request, eliminating concerns about the model missing required keys or generating invalid enumeration values.

Advantages

Some advantages of using JSON structured output include:

  1. Data format guarantee: Ensures the model output always conforms to the predefined JSON Schema format, eliminating the need for additional format validation steps
  2. Type safety: Automatically ensures correct data types (such as strings, integers, booleans), avoiding type conversion errors
  3. Required field control: Ensures key fields are not omitted through the required property, improving data integrity
  4. Enumeration value constraints: Can restrict certain fields to specific values, preventing generation of invalid options
  5. Standardized error handling: When the model cannot generate valid content, it can return error messages in a standard format for easier program processing
  6. Reduced parsing complexity: Eliminates the need to write complex text parsing logic, directly obtaining structured data
  7. API integration friendliness: Generated JSON can be directly used for API calls, database storage, or other system integrations
  8. Simplified prompts: No need to repeatedly emphasize output format requirements in prompts, allowing prompts to focus more on business logic

Example Code

python
"guided_json": """{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's name"
},
"age": {
"type": "integer",
"description": "The user's age"
},
"city": {
"type": "string",
"description": "The user's city"
}
},
"required": ["name", "age", "city"]
}""",

Add some prompts to improve reliability, and AI will extract data from the input to generate standard JSON:

python
from langchain_openai import ChatOpenAI

model_name = "Qwen2.5-72B-Instruct"
base_url = "https://moark.ai/v1"
your_access_token = ""
llm = ChatOpenAI(model=model_name, api_key=your_access_token, base_url=base_url, streaming=True, temperature=0.1,
presence_penalty=1.05, top_p=0.9,
extra_body={
"guided_json": """{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user's name"
},
"age": {
"type": "integer",
"description": "The user's age"
},
"city": {
"type": "string",
"description": "The user's city"
}
},
"required": ["name", "city"]
}"""
})

prompt = [
{"role": "system", "content": "You are an intelligent assistant. Output data in JSON format. If age cannot be determined, set age to 0"},
{"role": "user", "content": """
On a sunny spring afternoon, Mark was walking down the streets of New York. It was April 15th, 2023,
just as the cherry blossoms were in full bloom. As a young photography enthusiast, he carried his camera,
hoping to capture the most beautiful moments of this season. New York's spring is always brief but stunning,
with every park and street filled with vitality and energy.
"""}
]

for response in llm.stream(prompt):
if response.content:
print(response.content, end="")

Output JSON:

{ "name": "Mark", "age": 0, "city": "New York" }