Vision Models
Function Description
Vision models are a category of artificial intelligence models specifically designed to process and understand visual information such as images and videos. These models, leveraging deep learning technologies, can "see" and interpret image content like humans do, and perform various vision-related tasks.
The core capabilities of vision models include image recognition, content understanding, feature extraction, and pattern matching. They are widely applied in fields such as autonomous driving, medical image analysis, security monitoring, and content moderation, helping people process and utilize visual data more efficiently.
Vision Models Overview
Loading Serverless API service list...
Usage Method
Provide an image along with corresponding prompts, and the model will return text-based results according to the requirements specified in the prompts.

Example Code
from openai import OpenAI
client = OpenAI(
base_url="https://moark.ai/v1",
api_key="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", # Replace with your access token
)
response = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful and harmless assistant. You should think step-by-step."
},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg" # Replace with your image URL
}
},
{
"type": "text",
"text": "Please describe this image" # Replace with your prompt
}
]
}
],
model="InternVL3-78B",
stream=True,
max_tokens=512,
temperature=0.7,
top_p=1,
extra_body={
"top_k": 1,
},
frequency_penalty=0,
)
fullResponse = ""
print("Response:")
# Print streaming response
for chunk in response:
delta = chunk.choices[0].delta
# If is thinking content, print it in gray
if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
fullResponse += delta.reasoning_content
print(f"\033[90m{delta.reasoning_content}\033[0m", end="", flush=True)
elif delta.content:
fullResponse += delta.content
print(delta.content, end="", flush=True)
Application Scenarios
🛡️ Content Moderation
- Violation Detection: Identify inappropriate or non-compliant image content
- Quality Assessment: Evaluate the quality and suitability of images
- Copyright Protection: Detect image copyright and source information
🏥 Medical Diagnosis
- Medical Image Analysis: Analyze X-rays, CT scans, MRI images, etc.
- Lesion Detection: Identify and locate pathological areas
- Auxiliary Diagnosis: Provide diagnostic suggestions and references for doctors
🚗 Autonomous Driving
- Environmental Perception: Identify road, vehicles, pedestrians, etc.
- Obstacle Detection: Real-time detection of obstacles on the road
- Traffic Sign Recognition: Identify traffic lights, signs, etc.
🛒 E-commerce Retail
- Product Recognition: Automatically identify and classify products
- Product Recommendation: Recommend products based on visual similarity
- Inventory Management: Manage inventory through image recognition
🏭 Industrial Detection
- Quality Control: Detect defects and quality issues in products
- Safety Monitoring: Monitor the safety status of industrial environments
- Device Maintenance: Evaluate device status through visual detection