Image Segmentation

Function Description

Image segmentation is a technique that precisely separates different objects or regions in an image. Unlike traditional image processing, image segmentation can identify which object each pixel in an image belongs to, achieving precise object separation and returning the contour or mask information of each object.

Supported Model

SAM3

How to Call the Interface

Prerequisites

Obtain an API Token (create from MoLiArk Console)
An image to process (local file or network image URL)

Python Call Example

import requests

API_URL = "https://ai.gitee.com/v1/images/segmentation"
API_TOKEN = "<your_api_token>"  # Replace with your actual token

headers = {
    "Authorization": f"Bearer {API_TOKEN}"
}

def cut_image(image_path, prompt=""):
    data = {
        "model": "sam3",
        "prompt": prompt
    }

    if image_path.startswith(("http://", "https://")):
        # Network image: pass URL directly
        data["image"] = image_path
        response = requests.post(API_URL, headers=headers, data=data)
    else:
        # Local image: upload via file
        with open(image_path, "rb") as image_file:
            files = {
                "image": (image_path, image_file, "image/jpeg")
            }
            response = requests.post(API_URL, headers=headers, files=files, data=data)

    return response.json()

# Example 1: Segment people in a local image
result = cut_image("path/to/your/photo.jpg", prompt="person")
print(f"Found {result['num_segments']} people")

# Example 2: Segment all objects in a network image (without prompt)
result2 = cut_image("https://example.com/image.jpg")
print(f"Recognized {result2['num_segments']} objects")

Interface Parameter Description

Parameter Name	Type	Required	Description	Example Value
model	string	Yes	Model name, fixed as `sam3`	`"sam3"`
image	file/string	Yes	Image file / network image URL	Local file / `"https://..."`
prompt	string	No	Prompt word, specify objects to segment	`"person"`, `"car"`, `"dog,cat"`

Processing Return Results

Result Data Structure

The interface returns data in JSON format, containing the following fields:

{
  "num_segments": 3,
  "segments": [
    {
      "id": 1,
      "label": "person",
      "confidence": 0.95,
      "bbox": [120, 80, 320, 450],
      "mask": {
        "encoding": "rle",
        "size": [512, 512],
        "counts": "H4sIAAAAAAAA..."
      }
    },
    {
      "id": 2,
      "label": "dog",
      "confidence": 0.87,
      "bbox": [350, 300, 480, 420],
      "mask": {...}
    }
  ]
}

Mask Decoding Method Explanation (Frontend/Client)

The mask field returned by the interface is COCO RLE counts, which needs to be decoded with image height and width to obtain an H × W 2D 0/1 mask matrix.

RLE Decoding Example (JavaScript)

typescript
/**
 * Decode COCO RLE format mask
 * @param rleString - RLE encoded string
 * @param height - Image height
 * @param width - Image width
 * @returns 2D array [height][width], 1 indicates foreground, 0 indicates background
 */
export function rleDecode(
  rleString: string,
  height: number,
  width: number,
): number[][] {
  rleString = atob(rleString);
  // Decode RLE counts
  const counts: number[] = [];
  let p = 0;
  while (p < rleString.length) {
    let x = 0;
    let k = 0;
    let more = true;
    while (more) {
      const c = rleString.charCodeAt(p) - 48;
      x |= (c & 0x1f) << (5 * k);
      more = (c & 0x20) !== 0;
      p++;
      k++;
      if (!more && (c & 0x10) !== 0) {
        x |= -1 << (5 * k);
      }
    }
    if (counts.length > 2) {
      x += counts[counts.length - 2];
    }
    counts.push(x);
  }
  // Create flat array from RLE counts (column-first order)
  const total = height * width;
  const mask = new Uint8Array(total);
  let pos = 0;
  let val = 0;
  for (const count of counts) {
    for (let i = 0; i < count && pos < total; i++) {
      mask[pos] = val;
      pos++;
    }
    val = 1 - val; // Switch between 0 and 1
  }
  // Convert flat array to 2D array (column-first to row-first)
  const result: number[][] = Array.from({ length: height }, () =>
    new Array(width).fill(0),
  );
  // COCO RLE uses column-first (Fortran) order storage
  let idx = 0;
  for (let x = 0; x < width; x++) {
    for (let y = 0; y < height; y++) {
      if (idx < total) {
        result[y][x] = mask[idx];
        idx++;
      }
    }
  }
  return result;
}

You can visit the MoArk Example Code Repository to reference more example code.

Online Experience

You can also directly experience the image segmentation function online.

Usage Method

Upload reference image → Input prompt → Click run

segmentation1

Example Effects

Reference Image	prompt	Result
	kids
	apple

Usage Scenarios

Through the image segmentation interface, you can quickly add advanced visual analysis capabilities to your applications without training models or building complex infrastructure, enjoying pixel-level precision image understanding services:

Basic Usage Scenarios

Intelligent matting: Automatically separate subjects like people, products, etc., and remove backgrounds
Object counting: Count specific objects in images
Image annotation: Prepare annotation data for training AI models

Practical Applications

# Example 1: E-commerce product matting
product_result = cut_image("product_photo.jpg", prompt="shoes, handbag")
# Get precise contours of each product for creating transparent background product images

# Example 2: Surveillance video analysis
security_result = cut_image("camera_feed.jpg", prompt="person, vehicle")
# Identify people and vehicles in the scene for security monitoring

Function Description​

Supported Model​

How to Call the Interface​

Prerequisites​

Python Call Example​

Interface Parameter Description​

Processing Return Results​

Result Data Structure​

Mask Decoding Method Explanation (Frontend/Client)​

Online Experience​

Usage Method​

Example Effects​

Usage Scenarios​

Basic Usage Scenarios​

Practical Applications​