异步解析 — 提交任务

POST

parse

async

Python

import json
import requests

url = "https://somark.tech/api/v1/parse/async"

data = {
    "output_formats": ["markdown", "json"],
    "api_key": "sk-***",
    "element_formats": json.dumps({
        "image": "url",
        "formula": "latex",
        "table": "html",
        "cs": "image",
    }),
    "feature_config": json.dumps({
        "enable_text_cross_page": False,
        "enable_table_cross_page": False,
        "enable_title_level_recognition": False,
        "enable_inline_image": False,
        "enable_table_image": True,
        "enable_image_understanding": True,
        "keep_header_footer": False,
    }),
}

files = {"file": ("example.pdf", open("example.pdf", "rb"))}

response = requests.post(url, data=data, files=files)
task_id = response.json()["data"]["task_id"]
print(f"任务已提交，task_id: {task_id}")

{
  "code": 0,
  "message": "任务已提交",
  "data": {
    "task_id": "c5e6c983f28a4e6eb5d6c061343a8642",
    "status": "queuing"
  }
}

路径变更：该接口路径已从 /extract/async 更改为 /parse/async。旧路径将于 2026-12-31 停用，请在此之前迁移至新路径。参数变更：extract_config 已更名为 feature_config。请将请求中的 extract_config 字段替换为 feature_config。

异步解析需要配合两个接口一起使用，单独调用提交任务接口不会直接返回解析结果。

调用当前接口提交任务，接口会立即返回 task_id。
使用这个 task_id 调用结果查询接口轮询任务状态。
当状态变为成功后，再从结果查询接口读取解析结果。建议轮询间隔为 3~5 秒。

output_formats 、 element_formats 和 feature_config 的参数说明与同步解析相同；如果你要看鉴权、限制和模式选择，回到 API 概览。

请求体

multipart/form-data

file

必填

待解析的文件，支持 PDF、图片、Word、PPT 和 Excel 格式

api_key

string

必填

API 密钥，格式 sk-***

output_formats

enum<string>[]

输出格式，可多选。不传时默认为 ["markdown", "json"]。支持 json / markdown / zip，其中 zip 将所有输出文件打包为压缩包

可用选项:

json,

markdown,

zip

element_formats

object

元素格式配置，控制各类元素的输出格式

Show child attributes

feature_config

object

特色功能配置（参数已从 extract_config 更名为 feature_config）

Show child attributes

响应

200 - application/json

任务提交成功

code

integer

状态码，0 为成功，非 0 见错误码

示例:

0

message

string

示例:

"任务已提交"

data

object

Show child attributes

同步解析异步解析 — 查询结果