superun & Prompt.to.design Documentation

本系統支援接入 4 個國內主流語音引擎,實現語音轉文字（ASR）和文字轉語音（TTS）功能.每個引擎都已完整接入並測試通過.

支援的引擎

接入所需配置

百度智能雲

ASR（語音轉文字）

需要配置以下環境變數:

SUPERUN_BAIDU_API_KEY - API Key
SUPERUN_BAIDU_SECRET_KEY - Secret Key

TTS（文字轉語音）

需要配置以下環境變數:

SUPERUN_BAIDU_API_KEY - API Key
SUPERUN_BAIDU_SECRET_KEY - Secret Key

音色選項:

0 - 度小宇（女）
1 - 度小美（男）
3 - 度逍遙（女）
4 - 度丫丫（男）

訊飛開放平台

ASR（語音轉文字）

需要配置以下環境變數:

SUPERUN_XUNFEI_APP_ID - App ID
SUPERUN_XUNFEI_API_KEY - API Key
SUPERUN_XUNFEI_API_SECRET - API Secret

技術特點: 使用 WebSocket 協議進行實時語音識別.

TTS（文字轉語音）

需要配置以下環境變數:

SUPERUN_XUNFEI_APP_ID - App ID
SUPERUN_XUNFEI_API_KEY - API Key
SUPERUN_XUNFEI_API_SECRET - API Secret

音色選項:

xiaoyan - 訊飛小燕（女）
xiaoyu - 訊飛小宇（男）
xiaomei - 訊飛小美（女）
xiaoqi - 訊飛小琪（男）

技術特點: 使用 WebSocket 協議進行語音合成.

火山引擎

ASR（語音轉文字）

需要配置以下環境變數:

SUPERUN_VOLCANO_APP_ID - App ID
SUPERUN_VOLCANO_ACCESS_TOKEN - Access Token
SUPERUN_VOLCANO_SECRET_KEY - Secret Key（WebSocket 鑑權用）
SUPERUN_VOLCANO_ASR_CLUSTER - ASR Cluster（可選,預設:volcengine_input_common）

技術特點: 使用 WebSocket 二進制協議,支援 Gzip 壓縮,支援分片傳輸.

TTS（文字轉語音）

需要配置以下環境變數:

SUPERUN_VOLCANO_APP_ID - App ID
SUPERUN_VOLCANO_ACCESS_TOKEN - Access Token

音色選項:

BV700_V2_streaming - 清新女聲
BV001_V2_streaming - 通用男聲
BV705_streaming - 甜美女聲
BV701_V2_streaming - 醇厚男聲

阿里雲

ASR（語音轉文字）

需要配置以下環境變數:

SUPERUN_ALIYUN_ACCESS_KEY_ID - Access Key ID
SUPERUN_ALIYUN_ACCESS_KEY_SECRET - Access Key Secret
SUPERUN_ALIYUN_APP_KEY - App Key

技術特點: 使用 REST API,支援 HMAC-SHA1 簽名認證,使用 Token 機制. 限制: 單次識別音頻長度 ≤ 60 秒.

TTS（文字轉語音）

需要配置以下環境變數:

SUPERUN_ALIYUN_ACCESS_KEY_ID - Access Key ID
SUPERUN_ALIYUN_ACCESS_KEY_SECRET - Access Key Secret
SUPERUN_ALIYUN_APP_KEY - App Key

音色選項:

aixia - 艾夏（女）
aiwei - 艾偉（男）
aida - 艾達（女）
kenny - 肯尼（男）

技術特點: 使用 REST API,支援 HMAC-SHA1 簽名認證.

配置方式

Supabase Edge Functions（生產環境）

在 Supabase 專案中配置環境變數:

# 百度
supabase secrets set SUPERUN_BAIDU_API_KEY=your_api_key
supabase secrets set SUPERUN_BAIDU_SECRET_KEY=your_secret_key

# 訊飛
supabase secrets set SUPERUN_XUNFEI_APP_ID=your_app_id
supabase secrets set SUPERUN_XUNFEI_API_KEY=your_api_key
supabase secrets set SUPERUN_XUNFEI_API_SECRET=your_api_secret

# 火山引擎
supabase secrets set SUPERUN_VOLCANO_APP_ID=your_app_id
supabase secrets set SUPERUN_VOLCANO_ACCESS_TOKEN=your_access_token
supabase secrets set SUPERUN_VOLCANO_SECRET_KEY=your_secret_key
supabase secrets set SUPERUN_VOLCANO_ASR_CLUSTER=volcengine_input_common

# 阿里雲
supabase secrets set SUPERUN_ALIYUN_ACCESS_KEY_ID=your_access_key_id
supabase secrets set SUPERUN_ALIYUN_ACCESS_KEY_SECRET=your_access_key_secret
supabase secrets set SUPERUN_ALIYUN_APP_KEY=your_app_key

代碼實現架構

前端組件

ASR 模組（語音轉文字）

// src/components/mobile/ASRModule.tsx
const ASRModule = ({ engine = "baidu" }: ASRModuleProps) => {
  const callASRAPI = async (audioData: string) => {
    const { data, error } = await supabase.functions.invoke('asr-convert', {
      body: {
        engine: engine,
        audioData: audioData,
      }
    });
    
    if (data.success) {
      setResult(data.result.text);
      setMetrics({
        time: Math.round(data.result.duration || 0),
        confidence: Math.round((data.result.confidence || 0) * 100),
        rate: "16k"
      });
    }
  };
  
  // ... 錄音和文件上傳邏輯
};

TTS 模組（文字轉語音）

// src/components/mobile/TTSModule.tsx
const TTSModule = ({ engine = "baidu" }: TTSModuleProps) => {
  const callTTSAPI = async () => {
    const { data, error } = await supabase.functions.invoke('tts-convert', {
      body: {
        engine: engine,
        text: text,
        voice: selectedVoice,
        speed: speed[0],
        volume: volume[0],
      }
    });
    
    if (data.success) {
      setAudioUrl(data.result.audioUrl);
      setStatus("complete");
    }
  };
  
  // ... 合成邏輯
};

引擎選擇器

// src/components/mobile/EngineSelector.tsx
const engines = [
  { id: "baidu", name: "百度", shortName: "BD" },
  { id: "xunfei", name: "訊飛", shortName: "XF" },
  { id: "volcano", name: "火山", shortName: "HS" },
  { id: "aliyun", name: "阿里雲", shortName: "ALI" },
];

後端實現（Supabase Edge Functions）

ASR 轉換服務

檔案位置: supabase/functions/asr-convert/index.ts 核心邏輯:

根據 engine 參數選擇對應的引擎實現
從環境變數讀取對應的 API 憑證
調用各引擎的 ASR API
返回標準化的識別結果

百度實現:

async function callBaiduASR(apiKey: string, secretKey: string, audioData: string) {
  // 1. 獲取 Access Token
  const accessToken = await getBaiduAccessToken(apiKey, secretKey);
  
  // 2. API URL - 不要带任何参数
  const apiUrl = 'https://vop.baidu.com/server_api';
  
  // 3. 請求體 - token 必須在這裡
  const requestBody = {
    format: "wav",           // 音頻格式
    rate: 16000,             // 採樣率（必須是 number 類型）
    channel: 1,              // 聲道數
    cuid: userId,            // 用戶標識
    token: accessToken,      // ← 關鍵:token 放請求體內
    speech: base64Audio,     // Base64 編碼的音頻
    len: audioByteLength,    // WAV 文件的實際字節數（必須是 number 類型）
    // 不要使用 dev_pid
  };
  
  // 4. 發送請求
  const response = await fetch(apiUrl, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(requestBody),
  });
  
  return { text: result.result[0], confidence: 0.95 };
}

訊飛實現:

async function callXunfeiASR(appId: string, apiKey: string, apiSecret: string, audioData: string) {
  // 1. 構建 WebSocket 鑑權 URL（HMAC-SHA256 簽名）
  const wsUrl = buildWebSocketAuthUrl(host, path, apiKey, apiSecret);
  
  // 2. 建立 WebSocket 連接
  const ws = new WebSocket(wsUrl);
  
  // 3. 發送識別請求
  ws.send(JSON.stringify({
    common: { app_id: appId },
    business: { language: "zh_cn", domain: "iat", accent: "mandarin" },
    data: { status: 2, format: "audio/L16;rate=16000", audio: base64Audio }
  }));
  
  // 4. 接收並解析結果
  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    // 解析識別結果...
  };
}

火山引擎實現:

// 使用 WebSocket 二進制協議
async function callVolcanoASR(appId: string, accessToken: string, audioData: string) {
  // 1. 構建 WebSocket URL
  const wsUrl = `wss://openspeech.bytedance.com/api/v2/asr?appid=${appId}&token=${accessToken}&cluster=${cluster}`;
  
  // 2. 建立連接（binaryType 設為 "arraybuffer"）
  const ws = new WebSocket(wsUrl);
  ws.binaryType = "arraybuffer";
  
  // 3. 發送 Full Client Request（二進制協議,Gzip 壓縮）
  const fullRequestMessage = await buildMessage(
    0b0001,  // message_type: full client request
    0b0000,  // flags: 非最後包
    0b0001,  // serialization: JSON
    0b0001,  // compression: Gzip
    jsonBytes
  );
  ws.send(fullRequestMessage);
  
  // 4. 分片發送音頻數據
  const audioMessage = await buildMessage(
    0b0010,  // message_type: audio only
    0b0010,  // flags: 最後包
    0b0000,  // serialization: none
    0b0001,  // compression: Gzip
    audioChunk
  );
  ws.send(audioMessage);
  
  // 5. 解析二進制響應
  ws.onmessage = async (event) => {
    const result = await parseServerResponse(event.data);
    // 解析識別結果...
  };
}

阿里雲實現:

async function callAliyunASR(accessKeyId: string, accessKeySecret: string, appKey: string, audioData: string) {
  // 1. 獲取 Token（HMAC-SHA1 簽名）
  const token = await getAliyunToken(accessKeyId, accessKeySecret);
  
  // 2. 發送 REST API 請求
  const response = await fetch('https://nls-gateway-cn-shanghai.aliyuncs.com/stream/v1/asr?appkey=...', {
    method: 'POST',
    headers: {
      'X-NLS-Token': token,
      'Content-Type': 'application/octet-stream'
    },
    body: audioBytes  // 二進制音頻數據
  });
  
  return { text: result.result, confidence: 0.94 };
}

TTS 轉換服務

檔案位置: supabase/functions/tts-convert/index.ts 核心邏輯:

根據 engine 參數選擇對應的引擎實現
從環境變數讀取對應的 API 憑證
根據 voice 參數映射到各引擎的音色代碼
調用各引擎的 TTS API
返回 base64 編碼的音頻數據

音色映射:

const voiceMapping: Record<string, Record<string, { code: string; name: string }>> = {
  baidu: {
    female_1: { code: "0", name: "度小宇" },
    male_1: { code: "1", name: "度小美" },
    // ...
  },
  xunfei: {
    female_1: { code: "xiaoyan", name: "訊飛小燕" },
    // ...
  },
  volcano: {
    female_1: { code: "BV700_V2_streaming", name: "清新女聲" },
    // ...
  },
  aliyun: {
    female_1: { code: "aixia", name: "艾夏" },
    // ...
  },
};

百度實現:

async function callBaiduTTS(apiKey: string, secretKey: string, text: string, voice: string, speed: number, volume: number) {
  const accessToken = await getBaiduAccessToken(apiKey, secretKey);
  
  const params = new URLSearchParams({
    tex: text,
    tok: accessToken,
    lan: "zh",
    spd: Math.round(speed * 5).toString(),
    vol: Math.round((volume / 100) * 15).toString(),
    per: voiceCode,
    aue: "3",  // MP3 格式
  });
  
  const response = await fetch(`https://tsn.baidu.com/text2audio?${params.toString()}`);
  const audioBuffer = await response.arrayBuffer();
  
  // 轉換為 base64
  const audioBase64 = bufferToBase64(audioBuffer);
  return { audioUrl: `data:audio/mp3;base64,${audioBase64}` };
}

訊飛實現:

async function callXunfeiTTS(appId: string, apiKey: string, apiSecret: string, text: string, voice: string, speed: number, volume: number) {
  // 使用 WebSocket 協議
  const wsUrl = buildWebSocketAuthUrl(host, path, apiKey, apiSecret);
  const ws = new WebSocket(wsUrl);
  
  ws.send(JSON.stringify({
    common: { app_id: appId },
    business: {
      aue: "lame",  // MP3 格式
      vcn: voiceCode,
      speed: Math.round(speed * 50),
      volume: Math.round(volume * 100 / 80),
    },
    data: {
      status: 2,
      text: btoa(unescape(encodeURIComponent(text)))
    }
  }));
  
  // 接收音頻數據塊並合併
  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    if (data.data && data.data.audio) {
      audioChunks.push(data.data.audio);
    }
    if (data.data && data.data.status === 2) {
      // 合成完成
      const audioBase64 = audioChunks.join('');
      return { audioUrl: `data:audio/mp3;base64,${audioBase64}` };
    }
  };
}

火山引擎實現:

async function callVolcanoTTS(appId: string, accessToken: string, text: string, voice: string, speed: number, volume: number) {
  const response = await fetch('https://openspeech.bytedance.com/api/v1/tts', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${accessToken}`
    },
    body: JSON.stringify({
      app: { appid: appId, token: accessToken, cluster: "volcano_tts" },
      audio: {
        voice_type: voiceCode,
        encoding: "mp3",
        speed_ratio: speed,
        volume_ratio: volume / 100,
      },
      request: { text: text, text_type: "plain" }
    })
  });
  
  const result = await response.json();
  // 返回 base64 音頻
  return { audioUrl: `data:audio/mp3;base64,${result.data}` };
}

阿里雲實現:

async function callAliyunTTS(accessKeyId: string, accessKeySecret: string, appKey: string, text: string, voice: string, speed: number, volume: number) {
  const token = await getAliyunToken(accessKeyId, accessKeySecret);
  
  const response = await fetch('https://nls-gateway.cn-shanghai.aliyuncs.com/stream/v1/tts', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-NLS-Token': token,
    },
    body: JSON.stringify({
      appkey: appKey,
      text: text,
      voice: voiceCode,
      format: "mp3",
      sample_rate: 16000,
      volume: volume,
      speech_rate: Math.round((speed - 0.5) * 200),
    })
  });
  
  const audioBuffer = await response.arrayBuffer();
  const audioBase64 = bufferToBase64(audioBuffer);
  return { audioUrl: `data:audio/mp3;base64,${audioBase64}` };
}

百度 ASR 常見錯誤及解決方案

錯誤碼 3311: param rate invalid

這是最常見的錯誤,原因通常是以下幾點:

問題	解決方案
Token 放置位置錯誤	Token 必須放在請求體內,不要放在 URL 參數中
cuid 重複	cuid 只放請求體內,不要在 URL 中重複
使用 dev_pid	不要使用 dev_pid 參數,讓百度自動檢測語言
rate 類型錯誤	確保 rate 是 number 類型,不是 string
len 計算錯誤	len 必須是 WAV 文件的實際字節數

正確的 len 參數計算

從 Base64 字符串計算實際字節數:

// 從 Base64 字符串計算實際字節數
const padding = (base64Audio.match(/=/g) || []).length;
const audioByteLength = Math.floor((base64Audio.length * 3) / 4) - padding;

// 驗證:audioByteLength 應該等於 WAV 文件的 blob.size

前端音頻處理要點

1. 錄音格式

瀏覽器通常是 webm/opus:

const mimeType = "audio/webm;codecs=opus";

2. 必須重採樣到 16kHz（百度要求）

const offlineContext = new OfflineAudioContext(
  1,                    // 單聲道
  targetLength,         
  16000                 // 目標採樣率
);

3. 轉換為 16bit PCM

const pcm16 = new Int16Array(samples.length);
for (let i = 0; i < samples.length; i++) {
  const s = Math.max(-1, Math.min(1, samples[i]));
  pcm16[i] = s < 0 ? s * 0x8000 : s * 0x7FFF;
}

4. 添加 WAV 頭（44 字節）

const wavHeader = {
  sampleRate: 16000,
  numChannels: 1,
  bitsPerSample: 16,
  byteRate: 32000,      // 16000 * 1 * 16 / 8
  blockAlign: 2,        // 1 * 16 / 8
};

環境變數配置

在 Supabase Edge Function Secrets 中配置:

# Supabase Edge Function Secrets
SUPERUN_BAIDU_API_KEY=你的百度API_Key
SUPERUN_BAIDU_SECRET_KEY=你的百度Secret_Key

獲取方式:百度智能雲控制台 → 語音技術 → 創建應用

調試檢查清單

遇到 3311 錯誤時,按順序檢查:

✅ Token 是否在請求體內（不是 URL 參數）
✅ rate 是否是 number 類型（typeof rate === 'number'）
✅ len 是否等於 WAV 文件實際大小
✅ 是否移除了 dev_pid 參數
✅ WAV 頭中的採樣率是否為 16000
✅ 音頻時長是否在 0.5-60 秒範圍內

完整請求示例

正確 ✓:

{
  format: "wav",
  rate: 16000,          // number 類型
  channel: 1,
  cuid: "user_001",
  token: "24.xxx...",   // 在請求體內
  speech: "UklGR...",   // Base64
  len: 63404            // number 類型,實際字節數
}

錯誤 ✗:

{
  format: "wav",
  rate: "16000",        // ← 錯誤:string 類型
  channel: 1,
  cuid: "user_001",
  dev_pid: 1737,        // ← 錯誤:不要使用
  speech: "UklGR...",
  len: "63404"          // ← 錯誤:string 類型
}
// URL: ?token=xxx      // ← 錯誤:token 不要放 URL

技術要點

ASR（語音轉文字）

音頻格式統一: 所有引擎均使用 WAV 格式,16kHz 採樣率,單聲道
Base64 編碼: 音頻數據在前端轉換為 base64 後傳遞到後端
協議差異:
- 百度,阿里雲:REST API
- 訊飛,火山引擎:WebSocket 協議
結果標準化: 統一返回 { text, confidence, duration } 格式

TTS（文字轉語音）

音色映射: 前端使用統一的音色 ID（female_1, male_1 等）,後端映射到各引擎的實際音色代碼
參數轉換:
- 語速:前端範圍 0.5-2.0x,各引擎轉換為對應範圍
- 音量:前端範圍 0-100%,各引擎轉換為對應範圍
輸出格式: 所有引擎統一返回 MP3 格式的 base64 編碼音頻
協議差異:
- 百度,阿里雲,火山引擎:REST API
- 訊飛:WebSocket 協議（需要接收多個音頻塊）

測試建議

API 憑證測試: 確保所有環境變數正確配置
音頻格式測試: 測試不同格式的音頻文件（WAV,MP3,M4A）
時長限制測試: 特別注意阿里雲的 60 秒限制
錯誤處理測試: 測試網絡錯誤,API 錯誤等異常情況
併發測試: 測試多個用戶同時使用不同引擎的情況

注意事項

費用控制: 各引擎都有各自的計費規則,注意監控 API 調用量
速率限制: 各引擎都有調用頻率限制,注意避免超限
音頻大小: 建議限制上傳音頻文件大小（如 10MB）
超時設置: WebSocket 連接設置合理的超時時間（如 30 秒）
錯誤日誌: 記錄詳細的錯誤信息,便於排查問題

superun 官方網站

瀏覽官網,了解更多功能與使用範例.

指南

工作流程

即用密鑰

功能

技巧與提示

使用案例

定價

更新日誌

​支援的引擎

​1. 百度智能雲

​2. 訊飛開放平台

​3. 火山引擎

​4. 阿里雲

​接入所需配置

​百度智能雲

​ASR（語音轉文字）

​TTS（文字轉語音）

​訊飛開放平台

​ASR（語音轉文字）

​TTS（文字轉語音）

​火山引擎

​ASR（語音轉文字）

​TTS（文字轉語音）

​阿里雲

​ASR（語音轉文字）

​TTS（文字轉語音）

​配置方式

​Supabase Edge Functions（生產環境）

​代碼實現架構

​前端組件

​ASR 模組（語音轉文字）

​TTS 模組（文字轉語音）

​引擎選擇器

​後端實現（Supabase Edge Functions）

​ASR 轉換服務

​TTS 轉換服務

​百度 ASR 常見錯誤及解決方案

​錯誤碼 3311: param rate invalid

​正確的 len 參數計算

​前端音頻處理要點

​1. 錄音格式

​2. 必須重採樣到 16kHz（百度要求）

​3. 轉換為 16bit PCM

​4. 添加 WAV 頭（44 字節）

​環境變數配置

​調試檢查清單

​完整請求示例

​技術要點

​ASR（語音轉文字）

​TTS（文字轉語音）

​測試建議

​注意事項

superun 官方網站

支援的引擎

1. 百度智能雲

2. 訊飛開放平台

3. 火山引擎

4. 阿里雲

接入所需配置

百度智能雲

ASR（語音轉文字）

TTS（文字轉語音）

訊飛開放平台

ASR（語音轉文字）

TTS（文字轉語音）

火山引擎

ASR（語音轉文字）

TTS（文字轉語音）

阿里雲

ASR（語音轉文字）

TTS（文字轉語音）

配置方式

Supabase Edge Functions（生產環境）

代碼實現架構

前端組件

ASR 模組（語音轉文字）

TTS 模組（文字轉語音）

引擎選擇器

後端實現（Supabase Edge Functions）

ASR 轉換服務

TTS 轉換服務

百度 ASR 常見錯誤及解決方案

錯誤碼 3311: param rate invalid

正確的 len 參數計算

前端音頻處理要點

1. 錄音格式

2. 必須重採樣到 16kHz（百度要求）

3. 轉換為 16bit PCM

4. 添加 WAV 頭（44 字節）

環境變數配置

調試檢查清單

完整請求示例

技術要點

ASR（語音轉文字）

TTS（文字轉語音）

測試建議

注意事項