Vectorization and Reranking
Basic Introduction
Vectorization and reranking are crucial technologies in the fields of natural language processing and information retrieval. They help computers understand and process language data by converting text into vector representations. These technologies play a key role in applications such as search engines, recommendation systems, and semantic analysis.
All vectorization and reranking models on MoArk are free to use.
Model List
The following vectorization and reranking models are available on MoArk:
Loading Serverless API service list...
Vectorization Models
Vectorization models convert text data into high-dimensional vector representations, such that similar texts are close to each other in the vector space. These models typically use deep learning technologies, such as the Transformer architecture, to capture the semantic information of text.
The most critical parameter of a vector model is its dimension. A higher dimension means stronger expressive ability of the model, but it also increases computational and storage costs. The vectorization models on MoArk cover dimensions from 512 to 4096. Please choose according to your actual needs.
Sample code:
from openai import OpenAI
client = OpenAI(
base_url="https://moark.ai/v1",
api_key="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", # Replace with your token
default_headers={"X-Failover-Enabled":"true"},
)
response = client.embeddings.create(
input="Hello, I am MoArk.",
model="Qwen3-Embedding-4B",
)
Multimodal Vectorization
MoArk supports multimodal vectorization models, which can handle multiple input types such as text, images, and videos. These models can convert different modal data into unified vector representations, enabling cross-modal retrieval and analysis.
The following multimodal vectorization models are currently deployed on MoArk:
- jina-embeddings-v4 with a vector dimension of 2048
- jina-clip-v1 with a vector dimension of 768
- jina-clip-v2 with a vector dimension of 1024
Sample code:
from openai import OpenAI
client = OpenAI(
base_url="https://moark.ai/v1",
api_key="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", # Replace with your access token
)
response = client.embeddings.create(
input=[
{ "text": "a blue cat" },
{ "text": "a dog" },
{ "image": "https://example.com/image.jpg"}
],
model="jina-embeddings-v4",
)
Reranking Models
Reranking models reorder search results to improve the accuracy and relevance of information retrieval. These models typically use user history and preferences, combined with vectorization techniques, to optimize the display order of search results.
Sample code:
import requests
API_URL = "https://moark.ai/v1/rerank"
API_TOKEN = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" # Replace with your access token
headers = {
"X-Failover-Enabled": "true",
"Authorization": f"Bearer {API_TOKEN}"
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"query": "How to read a CSV file in Python?",
"documents": [
"Use pandas: `import pandas as pd; df = pd.read_csv('data.csv')`",
"You can read CSV files with numpy.loadtxt()",
"To write JSON files, use json.dump() in Python",
"CSV means Comma Separated Values. Python files can be opened using read() method."
],
"model": "Qwen3-Reranker-4B"
})
print(output)
Application Scenarios
- Information Retrieval: By using vectorization technology, the retrieval efficiency and accuracy of search engines can be improved.
- Recommendation System: Utilizing vectorization and reranking technologies, personalized content recommendations can be provided to users.
- Semantic Analysis: By vectorizing text data, the semantic relationships and contextual information of text can be understood.
- Natural Language Processing: In tasks such as machine translation and sentiment analysis, vectorization technology is used to represent and process language data.