Text Generation
Overview
Text generation enables the creation of contextually relevant and logically coherent content based on user-provided prompts. Whether generating long-form content or creating entirely new text, this feature can be seamlessly integrated via a simple API call. Fully compatible with OpenAI-style interfaces, it reduces integration complexity for developers.
Key Features:
- Multi-Scenario Support: Ideal for copywriting, code generation, summarization, and more.
- Style Control: Fine-tune randomness, length, and tone through configurable parameters.
- Seamless Integration: Compatible with popular frameworks like
LangChainandLlamaIndex, supporting function calling and tool chaining. - Efficient Response: Streamed output (
Stream) reduces latency and enhances interactivity.
MoArk integrates a variety of high-performance text generation models. Explore their capabilities in the AI Model Hub or get started quickly with the Basic Text Generation Example.
Quick Start: Basic Text Generation Example
Example 1: Quick Use of Large Model Capabilities via curl
For transparency into raw request paths and parameters—especially useful with highly abstracted frameworks—curl offers a clear view:
- Bash
- JavaScript
curl https://moark.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_access_token" \
-d '{
"model": "Qwen2.5-72B-Instruct",
"stream": false,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Can a mouse be treated with mouse medicine?"
}
]
}'
async function query(data) {
const response = await fetch('https://ai.gitee.com/v1/chat/completions', {
headers: {
Authorization: 'Bearer xxxxx',
'Content-Type': 'application/json',
},
method: 'POST',
body: JSON.stringify(data),
});
const result = await response.json();
return result;
}
query({
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Can a mouse be treated with mouse medicine?"
}
],
"model": "Qwen2.5-72B-Instruct",
"stream": false,
"max_tokens": 512,
"temperature": 0.7,
"top_p": 0.7,
"frequency_penalty": 1
}).then((response) => {
console.log(JSON.stringify(response));
});
AI Model Response:
{
"id": "chat-476266af435142d2bb7d342ea54694f2",
"object": "chat.completion",
"created": 1731401912,
"model": "Qwen2.5-72B-Instruct",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "No, mouse medicine is not used to treat mice. It is a poison that kills mice. If a mouse is sick, it should be treated with veterinary medicine.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}],
"usage": {
"prompt_tokens": 27,
"total_tokens": 57,
"completion_tokens": 30
},
"prompt_logprobs": null
}
Example 2: Using the OpenAI Client to Call MoArk's Model API
MoArk's Serverless API is compatible with the OpenAI-style API favored by developers and the community.
Any tool supporting the OpenAI API can directly work with MoArk's Serverless API.
- Obtain Access Credentials: Log in to the MoArk console, and create an access token in Workbench -> Access Tokens.
- Install the Client: As an example, install the OpenAI client first:
pip install openai -i https://mirrors.cloud.tencent.com/pypi/simple
If you have experience with JavaScript, you can use the OpenAI Node.js client.
By calling the API with simple interfaces, you can generate the specified content:
from openai import OpenAI
import json
base_url = "https://moark.ai/v1"
model_name = "Qwen2.5-72B-Instruct"
# https://moark.ai/dashboard/settings/tokens Obtain your access token.
GITEE_AI_API_KEY = "Your access token"
client = OpenAI(base_url=base_url, api_key=GITEE_AI_API_KEY)
completion = client.chat.completions.create(
model=model_name, # Specify the model name, e.g., Qwen2.5-72B-Instruct, which can be accessed at https://moark.ai/serverless-api
stream=True,
temperature=0.7,
top_p=0.95,
frequency_penalty=1.05,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a concise Python tutorial."}
]
)
for chunk in completion:
print(chunk.choices[0].delta.content, end="")
Except for pure text, you can also let the model return structured data in JSON format — this feature is called Structured Output and can be implemented in MoArk through the guided_json parameter.
Streamed Response
For long text generation, enabling streamed response stream=True allows real-time retrieval of results, reducing waiting time:
from openai import OpenAI
# Initialize client
client = OpenAI(
base_url="https://moark.ai/v1",
api_key="Your access token"
)
stream = client.chat.completions.create(
model="Qwen2.5-72B-Instruct",# Replace with the specified model name.
messages=[
{"role": "user", "content": "Write a short article about the development history of artificial intelligence."}
],
stream=True, # Enable streamed response
temperature=0.6
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Core Parameters Explanation
- tool_choice: Set to
autoto let the model automatically choose tools, or force a tool choice by setting it to a specific tool name. The format is as follows:
"tool_choice": {"type": "function", "function": {"name": "function_name"}},
-
guided_json: Let the model response in the specified JSON Schema. It is not recommended to pass
toolsandguided_jsonat the same time.More information about JSON Schema can be found at: JSON Schema
-
guided_choice: Let the model choose one of the provided string options. It is not recommended to pass
toolsandguided_jsonat the same time.
For example, to judge the positivity or negativity of user input, you can pass:
"guided_choice": ["正面", "负面", "中性"]
Other parameters can be found in the API documentation.
Model Selection
When generating content through the API, a key choice is the model you want to use, which is the model parameter in the above code example. You can find a complete list of available models in the Model Square. When selecting a text generation model, you need to consider the following factors:
Task Scenario Adaptation
- General Content Creation: For copywriting or storytelling, choose models with balanced language understanding and generation.
- Specialized Tasks: For code generation or math reasoning, prioritize models with proven expertise in those domains.
- Long Text Processing: For multi-thousand-word content, select models with large context windows (e.g., 32K, 128K).
Functional Requirements
- Tool Calling: When integrating with external systems (e.g., function calling, multi-modal tools), prioritize models with the "Function Calling" label (e.g., kimi-k2-instruct, Qwen2.5-72B-Instruct) to ensure deep interaction with business systems.
- Multilingual Support: For tasks involving both Chinese and English, or for small languages, focus on models with the "Multi-language" label (e.g., ERNIE-4.5-Turbo).
Practical Selection Tips
- Rapid Prototyping: Start with free models (e.g., Qwen3-8B, Qwen3-4B) to validate task feasibility.
- Production Optimization: Migrate to specialized models (e.g., kimi-k2-instruct for long context, DeepSeek-R1 for reasoning) for enhanced performance.
By aligning your use case with these criteria, you can precisely match MoArk's models to your business needs, maximizing both efficiency and quality.
Prompt Engineering
Prompt engineering is the process of writing effective instructions for models to generate content that meets your needs. It's about crafting clear and precise prompts to guide the model's output. While models are good at following instructions, the quality of the results depends on how well you communicate with them.
- Core Principle - Model Specificity: Different models—and even versions within a series—respond differently to prompts.
Message Roles and Instruction Following
By using the instructions parameter or message roles, you can layer instructions into the model to control the response logic.
- instructions parameter: Global instruction priority
instructions provides high-level instructions (tone, goal, examples, etc.) that have higher priority than input content.
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://moark.ai/v1',
apiKey: 'Your access token',
});
const response = await client.responses.create({
model: 'target-model-name',
// Define global style/rules
instructions: 'Respond in the tone of a kind grandmother, using simple analogies and warm, conversational language',
input: 'Are semicolons optional in JavaScript?',
});
console.log(response.output_text);
// Example output:"Oh honey, in JavaScript, semicolons are like the period at the end of a sentence—technically optional, but they help make things clear. I'd say go ahead and use them, so your code doesn't get confused!"
-
Message Roles: Refined Scene Control**
By combining the
system,user, andassistantroles, you can define interaction logic more flexibly.For advanced AI persona customization (e.g., anime style, expert assistant), you can read more about: Customizing AI Chat Character Styles
Prompt Formatting Techniques
Use Markdown and XML tags to structure prompts, helping the model understand boundaries and improve output accuracy.
Recommended Structure (developer message):
- Identity: Define the assistant's role, style, and goal.
- Instructions: Clear rules (do's and don'ts).
- Examples: Input/output pairs.
- Context: Private data or supplementary info (place at the end for easy dynamic replacement).
Example: Code Generation Prompt
# Identity
You are a coding assistant that enforces snake_case variable naming in JavaScript and writes code compatible with Internet Explorer 6.
# Instructions
Use snake_case (e.g., my_variable) instead of camelCase (e.g., myVariable).
Use the older "var" keyword for variable declaration to support legacy browsers.
Do not format your response in Markdown; return only the requested code.
# Examples
<user_query>
How to declare a string variable for a name?
</user_query>
<assistant_response>
var first_name = "Anna";
</assistant_response>
Few-Shot Learning
Include a few input/output examples in your prompt to guide the model—no fine-tuning required. The model implicitly learns the pattern and applies it to new inputs.
Example: Sentiment Classification
# Identity
You are a helpful assistant that classifies short product reviews as positive, negative, or neutral.
# Instructions
* Respond with only one word—no extra formatting or commentary.
* Your response must be exactly one of: "positive", "negative", or "neutral".
# Examples
<product_review id="example-1">
I love these headphones—the sound quality is amazing!
</product_review>
<assistant_response id="example-1">
positive
</assistant_response>
<product_review id="example-2">
Battery life is average, but it feels cheap.
</product_review>
<assistant_response id="example-2">
neutral
</assistant_response>
<product_review id="example-3">
Customer service is terrible; I will never buy from them again.
</product_review>
<assistant_response id="example-3">
negative
</assistant_response>
Combining Prompts with Function Calling
When prompts require external data (e.g., real-time weather, database queries), extend model capabilities via function calling. The model determines whether to call a tool, which one, and with what parameters. After the client executes the function, return the result to the model.
For detailed implementation methods, refer to: Function Calling
Context Management & Performance Optimization
- Context Window Planning Models have a finite context window:
- Sizes vary by model.
- For long text: chunk input or use summarization to compress context.
- Prompt Caching Optimization Place reusable fixed content (e.g., system instructions) at the beginning of the prompt to leverage caching and reduce compute costs:
# Efficient structure: static content first, dynamic content last
messages = [
{"role": "system", "content": "Fixed system instruction, e.g., 'You are an e-commerce agent who must politely answer order inquiries'"}, # Cacheable
{"role": "user", "content": "Dynamic input: When will my order #12345 ship?"} # Processed in real time
]