Vision Models

Function Description

Vision models are a category of artificial intelligence models specifically designed to process and understand visual information such as images and videos. These models, leveraging deep learning technologies, can "see" and interpret image content like humans do, and perform various vision-related tasks.

The core capabilities of vision models include image recognition, content understanding, feature extraction, and pattern matching. They are widely applied in fields such as autonomous driving, medical image analysis, security monitoring, and content moderation, helping people process and utilize visual data more efficiently.

Vision Models Overview

Loading Serverless API service list...

Usage Method

Provide an image along with corresponding prompts, and the model will return text-based results according to the requirements specified in the prompts.

Vision Model Experience

Example Code

python
from openai import OpenAI

client = OpenAI(
	base_url="https://moark.ai/v1",
	api_key="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", # Replace with your access token
)

response = client.chat.completions.create(
	messages=[
		{
			"role": "system",
			"content": "You are a helpful and harmless assistant. You should think step-by-step."
		},
		{
			"role": "user",
			"content": [
				{
					"type": "image_url",
					"image_url": {
						"url": "https://example.com/image.jpg"  # Replace with your image URL
					}
				},
				{
					"type": "text",
					"text": "Please describe this image" # Replace with your prompt
				}
			]
		}
	],
	model="InternVL3-78B",
	stream=True,
	max_tokens=512,
	temperature=0.7,
	top_p=1,
	extra_body={
		"top_k": 1,
	},
	frequency_penalty=0,
)

fullResponse = ""
print("Response:")
# Print streaming response
for chunk in response:
	delta = chunk.choices[0].delta
	# If is thinking content, print it in gray
	if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
		fullResponse += delta.reasoning_content
		print(f"\033[90m{delta.reasoning_content}\033[0m", end="", flush=True)
	elif delta.content:
		fullResponse += delta.content
		print(delta.content, end="", flush=True)

Application Scenarios

🛡️ Content Moderation

Violation Detection: Identify inappropriate or non-compliant image content
Quality Assessment: Evaluate the quality and suitability of images
Copyright Protection: Detect image copyright and source information

🏥 Medical Diagnosis

Medical Image Analysis: Analyze X-rays, CT scans, MRI images, etc.
Lesion Detection: Identify and locate pathological areas
Auxiliary Diagnosis: Provide diagnostic suggestions and references for doctors

🚗 Autonomous Driving

Environmental Perception: Identify road, vehicles, pedestrians, etc.
Obstacle Detection: Real-time detection of obstacles on the road
Traffic Sign Recognition: Identify traffic lights, signs, etc.

🛒 E-commerce Retail

Product Recognition: Automatically identify and classify products
Product Recommendation: Recommend products based on visual similarity
Inventory Management: Manage inventory through image recognition

🏭 Industrial Detection

Quality Control: Detect defects and quality issues in products
Safety Monitoring: Monitor the safety status of industrial environments
Device Maintenance: Evaluate device status through visual detection

Function Description​

Vision Models Overview​

Usage Method​

Example Code​

Application Scenarios​

🛡️ Content Moderation​

🏥 Medical Diagnosis​

🚗 Autonomous Driving​

🛒 E-commerce Retail​

🏭 Industrial Detection​