Springboot+Chatgpt+Whisper api#

本文通过springboot将两个OpenAI API的其中两个，ChatGPT和Whisper API集成在一起，方便后续的开发。

OpenAI API#

OpenAI是人工智能和机器学习工具及API的领先提供商之一。它训练了尖端的语言模型，擅长理解和生成文本。OpenAI API提供对这些模型的访问，并可用于解决几乎涉及处理语言的任何任务。它可以执行包括但不限于以下任务：

Content generation 内容生成
Summarization 总结
Classification, categorization, and sentiment analysis
分类、归类和情感分析
Data extraction 数据提取
Translation 翻译
Transcription 转录
Image generation 图像生成

我们可以通过任何语言发出HTTP请求与API进行交互。在本篇文章中，我们将使用Java构建一个Spring Boot微服务来集成Chatgpt API和Whisper API语音转文字功能。

集成chatgpt-whisper-spring-boot#

首先我们打开IDEA新建一个springboot项目

然后往其中的pom.xml文件添加依赖

1
<dependency>
2
            <groupId>org.springframework.boot</groupId>
3
            <artifactId>spring-boot-starter-web</artifactId>
4
        </dependency>
5
        <dependency>
6
            <groupId>org.springframework.cloud</groupId>
7
            <artifactId>spring-cloud-starter-openfeign</artifactId>
8
            <version>4.0.1</version>
9
        </dependency>
10
        <dependency>
11
            <groupId>org.projectlombok</groupId>
12
            <artifactId>lombok</artifactId>
13
            <optional>true</optional>
14
        </dependency>
15
        <dependency>
16
            <groupId>org.springframework.boot</groupId>
17
            <artifactId>spring-boot-configuration-processor</artifactId>
18
            <optional>true</optional>
19
</dependency>

Spring Cloud OpenFeign#

对于第三方REST API集成，有许多可以用的工具，这里我们选择Spring Cloud OpenFeign，因为这个对于Spring Boot 3支持方面比较好。

接下来我们打开 application.yml，往其中添加下面的内容：

1
openai-service:
2
  api-key: xxxxxxxxxx
3
  gpt-model: gpt-3.5-turbo
4
  audio-model: whisper-1
5
  http-client:
6
    read-timeout: 60000
7
    connect-timeout: 60000
8
  urls:
9
    base-url: https://chat.579878700.xyz/v1
10
    chat-url: /chat/completions
11
    create-transcription-url: /audio/transcriptions

其中 api-key请填写自己的api key，如果您还没有API密钥，请到https://platform.openai.com/account/api-keys 以生成API密钥。实际开发中这种配置肯定要放在配置中心里面，但是我们今天先简单的写个demo。

gpt-model ： gpt-3.5-turbo ，功能最强大的GPT-3.5模型，针对聊天进行了优化，成本仅为 text-davinci-003 的1/10，它可以执行任何语言任务，质量更好，输出更长，指令一致。

audio-model ： whisper-1 ，通用语音识别模型。它是在一个大型的多样化音频数据集上进行训练的，也是一个多任务模型，可以执行多语言语音识别以及语音翻译和来自世界各地大约100种不同语言的音频文件的语言识别。

read-time和 connect-time可以根据实际需求更改，这里我使用中文来回答，因此我设置的大一些，避免回答返回时间太长导致超时问题。

base-url这里我填写的是我使用CF进行反代的地址，因为一些原因，openai api在中国不能直接访问，所以使用这个方法。如果不需要，请改成 https://api.openai.com/v1

urls ：根据OpenAI API文档，所有API URL均以 [https://api.openai.com/v1](https://api.openai.com/v1) 开头。我们将其定义为 base-url 。对于聊天功能， chat-url 被定义为 /chat/completions 。对于音频转录， create-transcription-url 被定义为 /audio/transcriptions 。

我们使用 /chat/completions 继续使用 gpt-3.5-turbo 模型。

Feign Client#

现在配置已经完成，让我们继续编写声明式Feign客户端，我们需要两个方法： chat 和 createTranscription ，调用它们各自的OpenAI API端点URL。你可以随意命名这些方法，为了简单起见，让我们继续使用 chat 和 createTranscription 。

1
@FeignClient(
2
        name = "openai-service",
3
        url = "${openai-service.urls.base-url}",
4
        configuration = OpenAIClientConfig.class
5
)
6
public interface OpenAIClient {
7
    @PostMapping(value = "${openai-service.urls.chat-url}", headers = {"Content-Type=application/json"})
8
    ChatGPTResponse chat(@RequestBody ChatGPTRequest chatGPTRequest);
9

10
    @PostMapping(value = "${openai-service.urls.create-transcription-url}", headers = {"Content-Type=multipart/form-data"})
11
    WhisperTranscriptionResponse createTranscription(@ModelAttribute WhisperTranscriptionRequest whisperTranscriptionRequest);
12
}

Models#

通过严格遵循API文档，我们先定义请求/响应类。请参见下面使用 gpt-3.5-turbo 模型的示例请求和响应类。注意，使用 text-davinci-003 完成的请求/响应略有不同，有关详细信息，请参阅API文档。

1
@Data
2
@Builder
3
@AllArgsConstructor
4
@NoArgsConstructor
5
public class ChatGPTRequest implements Serializable {
6
    private String model;
7
    private List<Message> messages;
8
}

1
@Data
2
public class ChatGPTResponse implements Serializable {
3
    private String id;
4
    private String object;
5
    private String model;
6
    private LocalDate created;
7
    private List<Choice> choices;
8
    private Usage usage;
9
}

使用 Whisper 模型的音频转录请求/响应示例：

1
@Data
2
@Builder
3
@AllArgsConstructor
4
@NoArgsConstructor
5
public class TranscriptionRequest implements Serializable {
6
    private MultipartFile file;
7
}

1
@Data
2
public class WhisperTranscriptionResponse implements Serializable {
3
    private String text;
4
}

Service#

下面是服务层类 OpenAIClientService ，构造模型并调用Feignclient以触发对OpenAI API的调用。

1
@Service
2
@RequiredArgsConstructor
3
public class OpenAIClientService {
4

5
    private final OpenAIClient openAIClient;
6
    private final OpenAIClientConfig openAIClientConfig;
7

8
    private final static String ROLE_USER = "user";
9

10
    public ChatGPTResponse chat(ChatRequest chatRequest){
11
        Message message = Message.builder()
12
                .role(ROLE_USER)
13
                .content(chatRequest.getQuestion())
14
                .build();
15
        ChatGPTRequest chatGPTRequest = ChatGPTRequest.builder()
16
                .model(openAIClientConfig.getModel())
17
                .messages(Collections.singletonList(message))
18
                .build();
19
        return openAIClient.chat(chatGPTRequest);
20
    }
21

22
    public WhisperTranscriptionResponse createTranscription(TranscriptionRequest transcriptionRequest){
23
        WhisperTranscriptionRequest whisperTranscriptionRequest = WhisperTranscriptionRequest.builder()
24
                .model(openAIClientConfig.getAudioModel())
25
                .file(transcriptionRequest.getFile())
26
                .build();
27
        return openAIClient.createTranscription(whisperTranscriptionRequest);
28
    }
29
}

请注意，除了必需参数之外，我们还可以在请求正文中传递多个可选参数来指示 OpenAI API。例如，对于/chat/completions，我们可以传入temperature参数，它告诉 OpenAI API 使用什么采样温度，介于 0 和 2 之间。较高的值将使输出更具随机性和风险性，而较低的值会使它更集中和确定性。temperature如果我们不指定，默认值为 1。请参阅OpenAI API 文档，了解我们可以在请求正文中传递的所有可能的参数。

REST Controller REST#

最后，让我们看看REST Controller层的 OpenAIClientController ，里面的两个客户端：

chat （ /api/v1/chat ）：POST调用，使用JSON格式的消息，接受一个问题，并输出响应内容。
createTranscription （ /api/v1/transcription ）：POST调用，使用multipart/form-data，因为我们要上传音频文件，如 m4a和 mp3等音频文件，API将把音频文件转录为文本。

1
@RestController
2
@RequiredArgsConstructor
3
@RequestMapping(value = "/api/v1")
4
public class OpenAIClientController {
5

6
    private final OpenAIClientService openAIClientService;
7

8
    @PostMapping(value = "/chat", consumes = MediaType.APPLICATION_JSON_VALUE)
9
    public ChatGPTResponse chat(@RequestBody ChatRequest chatRequest){
10
        return openAIClientService.chat(chatRequest);
11
    }
12

13
    @PostMapping(value = "/transcription", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
14
    public WhisperTranscriptionResponse createTranscription(@ModelAttribute TranscriptionRequest transcriptionRequest){
15
        return openAIClientService.createTranscription(transcriptionRequest);
16
    }
17
}

Authentication 认证#

OpenAI API调用需要传入一个带有API密钥的请求头“ Authorization “。让我们声明一个 RequestInterceptor bean来处理它，参见 OpenAIClientConfig 类的示例代码片段：

1
@Bean
2
    public RequestInterceptor apiKeyInterceptor() {
3
        return request -> request.header("Authorization", "Bearer " + apiKey);
4
    }

最后运行项目#

用Postman测试一下

下面是 /chat 客户端的示例请求/响应截图：

下面是 /transcription 的请求/响应：

感觉对中文的识别准确度还是可以的。

使用Whisper模型的音频转录或翻译，文件上传目前限制为25 MB，并支持以下输入文件类型：mp3, mp4, mpeg, mpga, m4a, wav, webm

Whisper型号的定价目前设定为0.006美元/分钟。

使用Whisper API进行音频翻译#

除了音频转录，Whisper API还提供音频翻译端点 /audio/translations 。我也测试了这个特性。然而，我发现，尽管它说可以把音频翻译成英语，但我录了一个中文短语，并在应用程序中调用了 /audio/translations api，结果发现回应仍然是中文。我也用日语测试了这个api，它完整的翻译成了英语。这应该是一个bug。

总结#

与任何第三方REST API一样，将OpenAI API集成到Springboot中非常简单。

因此使用api开发自己的人工智能程序会变的非常便捷。