浅析webrtc中音频的录制和播放流程

2023-12-23 17:47:39

前言

本文是基于PineAppRtc项目github.com/thfhongfeng…

在webrtc中音频的录制和播放都是封装在内部，一般情况下我们也不需要关注，直接使用即可。

但是最近有一个需求，需要将我们自己的数据进行传输，所以就需要将这些接口暴露出来使用。所以就需要去研究一下它的源码，就有了这篇文章。

音频引擎

在webrtc中其实是有不只一套音频引擎的，其中有native层的使用OpenSL ES实现的，另外还有一套java层通过android api实现的。

这里注意，java层这套是在audio_device_java.jar中，包名是org.webrtc.voiceengine。但是在最新的官网webrtc代码中还有一套包名org.webrtc.audio的，貌似是替代前面那套的。

但是在PineAppRtc项目中使用的版本只有org.webrtc.voiceengine这套。

默认情况下是使用OpenSL ES这套。但是可以使用

WebRtcAudioManager.setBlacklistDeviceForOpenSLESUsage(true /* enable */);

禁用这套，这样就会使用java层的那套引擎。

那么我们如何将它们暴露出来，我们可以直接将这个包的源码放到项目下，然后将这个jar包删掉，这样就可以直接修改代码了。

发送数据（录音）

在audio_device_java.jar中WebRtcAudioRecord这个类是负责录音的。

这个类及下面函数都是webrtc底层自动调用，所以我们不需要考虑参数的来源，知道怎么使用就好。

首先是构造函数

WebRtcAudioRecord(long nativeAudioRecord) { 
    this.nativeAudioRecord = nativeAudioRecord; 
    ... 
}

这个nativeAudioRecord很重要，是后续调用接口需要用到的重要参数。

下面再来看看init函数

private int initRecording(int sampleRate, int channels) {
    if (this.audioRecord != null) {
        this.reportWebRtcAudioRecordInitError("InitRecording called twice without StopRecording.");
        return -1;
    } else {
        int bytesPerFrame = channels * 2;
        int framesPerBuffer = sampleRate / 100;
        this.byteBuffer = ByteBuffer.allocateDirect(bytesPerFrame * framesPerBuffer);
        this.emptyBytes = new byte[this.byteBuffer.capacity()];
        this.nativeCacheDirectBufferAddress(this.byteBuffer, this.nativeAudioRecord);
       ...
    }
    return framesPerBuffer;
}

两个参数分别是采样率和声道（1是单声道，2是双声道）。这两个参数也很重要，是webrtc通过前期socket协商后选定的。我们也可以修改这两个参数，后面会说。

注意这里不能随便修改bytebuffer的容量大小，因为底层会进行校验。这个大小只能是（采样率 / 100 * 声道数 * 2），实际上就是每秒发送100次数据。

如果改动大小，native层会crash，报错是 Check failed: frames_per_buffer_ == audio_parameters_.frames_per_10ms_buffer() (xxx vs. xxx)

最重要的是nativeCacheDirectBufferAddress这函数，可以看到传入了一个bytebuffer和nativeAudioRecord，后面就会用到。

nativeCacheDirectBufferAddress之后就是初始化AudioRecorder等。

然后再看看startRecording

private boolean startRecording() {
    ...
    if (this.audioRecord.getRecordingState() != 3) {
        ...
    } else {
        this.audioThread = new WebRtcAudioRecord.AudioRecordThread("AudioRecordJavaThread");
        this.audioThread.start();
        return true;
    }
}

可以看到启动了一个线程，线程里做了什么

public void run() {
    ...
    while(this.keepAlive) {
        int bytesRead = WebRtcAudioRecord.this.audioRecord.read(WebRtcAudioRecord.this.byteBuffer, WebRtcAudioRecord.this.byteBuffer.capacity());
        if (bytesRead == WebRtcAudioRecord.this.byteBuffer.capacity()) {
            ...
            if (this.keepAlive) {
                WebRtcAudioRecord.this.nativeDataIsRecorded(bytesRead, WebRtcAudioRecord.this.nativeAudioRecord);
            }
        } else {
            ...
        }
    }
    ...
}

从record中拿到数据后，调用了nativeDataIsRecorded函数。

这里看到从record中拿到数据时传入的时之前的bytebuffer，而调用nativeDataIsRecorded时，只传入了长度和nativeAudioRecord。

所以可以看到，如果要用自己的数据（即不录音）就需要先有nativeAudioRecord（通过构造函数获得）；然后调用nativeCacheDirectBufferAddress初始化；然后循环向bytebuffer写入数据，写入一次调用一次nativeDataIsRecorded发送出去。

接收数据（放音）

在audio_device_java.jar中WebRtcAudioTrack是负责播放的。

这个类及下面函数也是webrtc底层自动调用，所以我们不需要考虑参数的来源，知道怎么使用就好。

同样先是构造函数

WebRtcAudioTrack(long nativeAudioTrack) {
    ...
    this.nativeAudioTrack = nativeAudioTrack;
    ...
}

同样nativeAudioTrack很重要，跟上面的nativeAudioRecord类似

然后来看看init函数

private boolean initPlayout(int sampleRate, int channels) {
    ...
    int bytesPerFrame = channels * 2;
    this.byteBuffer = ByteBuffer.allocateDirect(bytesPerFrame * (sampleRate / 100));
    this.emptyBytes = new byte[this.byteBuffer.capacity()];
    this.nativeCacheDirectBufferAddress(this.byteBuffer, this.nativeAudioTrack);
    ...
    return true;
}

采样率和声道跟上面一样，这里也创建了一个bytebuffer并传入nativeCacheDirectBufferAddress。

这里的bytebuffer容量与录音一样不能随意改动，否则crash。

然后再看看start函数

private boolean startPlayout() {
    ...
    if (this.audioTrack.getPlayState() != 3) {
        ...
    } else {
        this.audioThread = new WebRtcAudioTrack.AudioTrackThread("AudioTrackJavaThread");
        this.audioThread.start();
        return true;
    }
}

也是开启了一个线程，线程里

public void run() {
    ...
    for(int sizeInBytes = WebRtcAudioTrack.this.byteBuffer.capacity(); this.keepAlive; WebRtcAudioTrack.this.byteBuffer.rewind()) {
        WebRtcAudioTrack.this.nativeGetPlayoutData(sizeInBytes, WebRtcAudioTrack.this.nativeAudioTrack);
        ...
        int bytesWritten;
        if (WebRtcAudioUtils.runningOnLollipopOrHigher()) {
            bytesWritten = this.writeOnLollipop(WebRtcAudioTrack.this.audioTrack, WebRtcAudioTrack.this.byteBuffer, sizeInBytes);
        } else {
            bytesWritten = this.writePreLollipop(WebRtcAudioTrack.this.audioTrack, WebRtcAudioTrack.this.byteBuffer, sizeInBytes);
        }
        ...
}

其实跟录音逻辑差不多，只不过这里先调用nativeGetPlayoutData让底层将收到的数据写入bytebuffer中，然后再通过write函数播放（这两个write函数最终都调用AudioTrack的write函数）。

所以如果我们要自己处理接收的数据，只需要在这里调用nativeGetPlayoutData，然后从bytebuffer中读取数据自己处理即可，后面的代码都可以删掉。

总结同样跟录音一样，先构造函数拿nativeAudioTrack这值，然后创建了一个bytebuffer并传入nativeCacheDirectBufferAddress，然后循环调用nativeGetPlayoutData获取数据处理

采样率、声道等设定

关于这些参数的设定，是双方经过协商定的，应该是一方将能支持的参数发送给另一方，另一方根据自己能支持的选出一个合适返回，然后双方就都这个参数处理数据。

但是我们是否可以干预这个过程，比如双方都支持的可能不只一个，我们不想使用自动选择的那个合适的，怎么做？

在audio_device_java.jar中还有两个类WebRtcAudioManager和WebRtcAudioUtils

这两个里就可以做一些设置，比如

采样率

在WebRtcAudioManager中

private int getNativeOutputSampleRate() {
//        if (WebRtcAudioUtils.runningOnEmulator()) {
//            Logging.d("WebRtcAudioManager", "Running emulator, overriding sample rate to 8 kHz.");
//            return 8000;
//        } else if (WebRtcAudioUtils.isDefaultSampleRateOverridden()) {
//            Logging.d("WebRtcAudioManager", "Default sample rate is overriden to " + WebRtcAudioUtils.getDefaultSampleRateHz() + " Hz");
//            return WebRtcAudioUtils.getDefaultSampleRateHz();
//        } else {
//            int sampleRateHz;
//            if (WebRtcAudioUtils.runningOnJellyBeanMR1OrHigher()) {
//                sampleRateHz = this.getSampleRateOnJellyBeanMR10OrHigher();
//            } else {
//                sampleRateHz = WebRtcAudioUtils.getDefaultSampleRateHz();
//            }
//
//            Logging.d("WebRtcAudioManager", "Sample rate is set to " + sampleRateHz + " Hz");
//            return sampleRateHz;
//        }
    return 16000;
}

将原代码去掉，直接返回我们想要的采样率。

声道

同样在WebRtcAudioManager中

public static synchronized boolean getStereoOutput() {
    return useStereoOutput;
}

public static synchronized boolean getStereoInput() {
    return useStereoInput;
}

因为这两个的返回值直接影响声道数：

private void storeAudioParameters() {
    this.outputChannels = getStereoOutput() ? 2 : 1;
    this.inputChannels = getStereoInput() ? 2 : 1;
    this.sampleRate = this.getNativeOutputSampleRate();
    this.hardwareAEC = isAcousticEchoCancelerSupported();
    this.hardwareAGC = false;
    this.hardwareNS = isNoiseSuppressorSupported();
    this.lowLatencyOutput = this.isLowLatencyOutputSupported();
    this.lowLatencyInput = this.isLowLatencyInputSupported();
    this.proAudio = this.isProAudioSupported();
    this.outputBufferSize = this.lowLatencyOutput ? this.getLowLatencyOutputFramesPerBuffer() : getMinOutputFrameSize(this.sampleRate, this.outputChannels);
    this.inputBufferSize = this.lowLatencyInput ? this.getLowLatencyInputFramesPerBuffer() : getMinInputFrameSize(this.sampleRate, this.inputChannels);
}

上面的代码中可以看到还有其他设定，需要的话可以进行相应修改。

总结

这里我们只是简单分析了一下录制和播放的过程，知道我们应该从哪入手及怎么才能传送现有音频并获取对方音频数据，至于如果改造和后续的处理大家可以自己发挥了。

码农公寓

前言

音频引擎

发送数据（录音）

接收数据（放音）

采样率、声道等设定

采样率

声道

总结

相关文章