|
AudioAPI
Audio in the browser
IntroductionThe purpose of this module is to provide an effective (and simple) Audio API through Gears. Requirements
ScopeThe scope of this document is limited to items (1) and (2) above. Items (3) and (4) are briefly discussed. High level DesignThe Audio API is designed similar to the HTMLAudioElement in the HTML5 specification. The audio playback, transform, mixing and recording all happen in the CPU (since CPUs today are fast and basic audio mixing/processing takes very little usage).
Integer-only encoding needs to be supported so that this can work on mobiles (where FP emulation is way slower than realtime). The only open/free codec known with integer-only encoding seems to be speex. If royalty and license issues cause trouble with codecs, speex can be the fallback since most audio recorded will be human speech.
JS interfaceAudio PlaybackMedia class (borrowed from HTML5 spec): We model the API based on HTMLAudioElement in the HTML5 spec. Look at http://www.whatwg.org/specs/web-apps/current-work/#media5 for an explanation of the members of this class. We include an additional method to get a handle to the Blob, and an additional metadata field. Media class
{
// error state
readonly attribute MediaError error;
// network state
attribute DOMString src;
readonly attribute DOMString currentSrc;
const unsigned short EMPTY = 0;
const unsigned short LOADING = 1;
const unsigned short LOADED_METADATA = 2;
const unsigned short LOADED_FIRST_FRAME = 3;
const unsigned short LOADED = 4;
readonly attribute unsigned short networkState;
readonly attribute float bufferingRate;
readonly attribute TimeRanges buffered;
void load();
// ready state
const unsigned short DATA_UNAVAILABLE = 0;
const unsigned short CAN_SHOW_CURRENT_FRAME = 1;
const unsigned short CAN_PLAY = 2;
const unsigned short CAN_PLAY_THROUGH = 3;
readonly attribute unsigned short readyState;
readonly attribute boolean seeking;
// playback state
attribute float currentTime;
readonly attribute float duration;
readonly attribute boolean paused;
attribute float defaultPlaybackRate;
attribute float playbackRate;
readonly attribute TimeRanges played;
readonly attribute TimeRanges seekable;
readonly attribute boolean ended;
attribute boolean autoplay;
void play();
void pause();
// looping
attribute float start;
attribute float end;
attribute float loopStart;
attribute float loopEnd;
attribute unsigned long playCount;
attribute unsigned long currentLoop;
// cue ranges
void addCueRange(in DOMString className, in float start, in float end,
in boolean pauseOnExit,
in VoidCallback enterCallback, in VoidCallback exitCallback);
void removeCueRanges(in DOMString className);
// controls
attribute boolean controls;
attribute float volume;
attribute boolean muted;
// load from blob
void loadBlob(blob);
// metadata
attribute map metadata;
};Events from Media class: The media class also fires the following events (HTML5 does not exactly define some of these events and the handling framework yet). Section '3.2.9.12 - Event summary' in http://www.whatwg.org/specs/web-apps/current-work/ helps understand what these events are, and when they are fired. Handlers of these events can decide what to do on event notifications: begin, progress, loadedmetadata, loadedfirstframe, load, abort, error, emptied, stalled, play, pause, waiting, timeupdate, ended, dataunavailable, canshowcurrentframe, canplay, canplaythrough, ratechange, durationchange, volumechange Audio class : Media The additional properties below are sound channel specific properties.
// an object can be got from
// - google.gears.factory.create('beta.audio')
Audio class : Media
{
// TODO: prefer int (number of channels) over enumeration ?
// Channel type
const unsigned short MONO = 1;
const unsigned short STEREO = 2;
readonly attribute short channelType;
// current amplitude of left channel [0 to 1]
readonly attribute float leftPeak;
// current amplitude of right channel [0 to 1]
readonly attribute float rightPeak;
// tranform array attributes follow:
// how much of left input goes to the left output
const unsigned short LEFT_TO_LEFT = 1;
// how much of left input goes to the right output
const unsigned short LEFT_TO_RIGHT = 2;
// how much of right input goes to the left output
const unsigned short RIGHT_TO_LEFT = 3;
// how much of right input goes to the right output
const unsigned short RIGHT_TO_RIGHT = 4;
// bass control [1-10]
const unsigned short BASS = 5;
// treble control [1-10]
const unsigned short TREBLE = 6;
// An array specifying the transform to be applied to this audio.
// Array has a list of 'attribute:numeric' value pairs.
// Example. [RIGHT_TO_RIGHT:12, BASS:15]
attribute Array transform;
};Audio RecordingAudioRecorder class // an object of this class can be got from
// google.gears.factory.create('beta.audiorecorder')
AudioRecorder class
{
// ---- error state ----
readonly attribute AudioRecorderError error;
// ---- recording state ----
// says whether recorder is currently recording or not
readonly attribute boolean recording;
// says whether recorder is paused or not
readonly attribute boolean paused;
// the amount of sound detected by the microphone
// 0 - no sound detected to 100 - maximum sound detected
readonly attribute int activityLevel;
// specifies the length (in milli seconds) of the audio recorded
readonly attribute float duration;
// number of channels, currently can be 1 (mono) or 2 (stereo)
attribute int numberOfChannels;
// sample rate for the recording
attribute float sampleRate;
// sample type for the recording, possible values need to be defined
// signed 16 bit little endian linear PCM
const unsigned short S16_LE = 0;
attribute short sampleFormat;
// audio file type (container and codec), possible values need to be defined
attribute string type;
void record();
void pause();
void unpause();
void stop();
// ---- controls ----
// 0.0 - silent to 1.0 - loudest
attribute float volume;
attribute boolean muted;
// the amount of sound required to activate the microphone
// 0 - capture even minutest sound to 100 - capture only loudest sound
attribute int silenceLevel;
// ---- cue ranges ----
// provides ability to set callbacks at specific points in playback time.
// similar to API in Audio class. Look at HTML5 spec for explanation.
void addCueRange(in DOMString className, in float start, in float end,
in boolean pauseOnExit,
in VoidCallback enterCallback, in VoidCallback exitCallback);
void removeCueRanges(in DOMString className);
// ---- access blob ----
// returns handle to the blob object containing the audio data
Blob getBlob();
}; AudioRecorderError class AudioRecorderError class
{
// could not record with the specified recording state (channelType, ...)
const unsigned short AUDIO_RECORDER_ERR_ENCODE = 1;
// there is problem accessing the device or there is no device
const unsigned short AUDIO_RECORDER_ERR_DEVICE = 2;
readonly attribute unsigned short code;
}Events from AudioRecorder class: The AudioRecorder class throws the following events, under the stated preconditions:
Description of the various attributes TODO: Describe other attributes also.
Default values for the various attributes
Recording the audio The recording attribute represents whether the audio recorder is recording or not. The paused attribute represents whether the audio recorder is paused or not. The muted attribute indicates whether the audio recorder is capturing silence. The audio data that is captured by the audio recorder is according to the recording- state (channelType, sampleRate, bitsPerSample, format) and the recording controls (volume, muted). record()
pause()
unpause()
stop()
Privacy It must be clear to users when an application is using the AudioRecorder API. We could implement one or both of the following UI elements:
Use cases We want to be able to support the following use-cases w.r.t. recording:
Others (borrowed from HTML5 spec)interface TimeRanges {
readonly attribute unsigned long length;
float start(in unsigned long index);
float end(in unsigned long index);
};
interface VoidCallback {
void handleEvent();
};Deviations from HTML5
Sample codeSimple example - 2D playback flow: var audio = google.gears.factory.create('beta.audio');
audio.src = 'http://blahblahblob.com/sampleaudio.wav';
audio.load();
audio.play();
//a handle to the blob can be got by invoking, if needed.
var blob = audio.getBlob();Simple example - 2D record flow: var recorder = google.gears.factory.create('beta.audiorecorder');
recorder.record(); //asynchronous call
...
recorder.stop();
// handle to the blob containing the recorded data can be obtained
// after the recorder is stopped, the blob may then be uploaded to
// a server.
var blob = recorder.getBlob();Advanced topicsThoughts on 3D positioned audioFor 3D, it is essential to take advantage of hardware, and also provide a more sophisticated abstraction as compared to that of regular audio. For supporting 3D audio, we can provide a minimal JS wrapper on top of the OpenAL 1.1 API. We could add 3D position and velocity information to the Audio class, and make it similar to an OpenAL source. This should be an easy extension of the above API. Audio3D class : Audio A Audio3D object is similar to the notion of an openAL source. The current model assumes the AL_SOURCE_RELATIVE model where all positions and velocities of sources are with respect to the listener. This API needs to be extended to an absolute model too (This is what would be very useful in most games). // an object can be got from
// - google.gears.factory.create('beta.audio3D')
Audio3D class : Audio
{
// position of the source - in 3D space
attribute float xposition;
attribute float yposition;
attribute float zposition;
// velocity of the source - in 3D space
attribute float xspeed;
attribute float yspeed;
attribute float zspeed;
};Simple example - 3D sound play (Bill versus Mike) // Bill is on the left, moving towards Mike
var audio1 = google.gears.factory.create('beta.audio3D');
audio1.src = 'http://blahblahblob.com/Bill.wav';
audio1.xposition = -50;
audio1.yposition = 10;
audio1.xvelocity = 20;
audio1.load();
// Mike is on the right, moving towards Bill
var audio2 = google.gears.factory.create('beta.audio3D');
audio2.src = 'http://blahblahblob.com/Mike.wav';
audio2.load();
audio2.xposition = 50;
audio2.yposition = 10;
audio2.xvelocity = -25;
// mixing automatically happens when play (being asynchronous) is invoked - One can play these audio sources in separate event handlers.
audio1.play();
audio2.play();EditingAn utility class is added, to expose features like editing. interface MediaUtils
{
// extracts segment @copyStartTime to @copyEndTime of the @sourceBlob
// and inserts that into @insertTime of @destBlob
// supported only on raw uncompressed media
void insert(in Blob sourceBlob,
in Blob destBlob,
in int insertTime,
in int copyStartTime,
in int copyEndTime);
// deletes segment @startTime to @endTime in the @sourceBlob
// supported only on raw uncompressed media
void delete(in Blob sourceBlob,
in int startTime,
in int endTime);
// when passed a list of audioObjects, this method queues up play requests and
// fires them in one go - to avoid phasing effects.
// This method can be used to synchronize play of multiple objects.
void play(in Array mediaObjects);
// returns currently active/playing media objects.
Array getActiveMedia();
};
interface AudioUtils : MediaUtils
{
// no new members
};C++ designThreads:Consider a single GearsAudio object created from the GearsFactory. This might end up in 3 parallel threads of execution over the course of time. Find them below (along with a high level description of their broad functionality).
Class Design:Please note that only the important fields are described here. The class members less crucial to the understanding of how the various elements interplay are omitted in this discussion, for reasons of brevity. 1. MediaData class:A MediaData object holds the audio buffer data, and also states/properties that are shared by the various threads concurrently. This is also the protocol object passed to the player's callback function (referred to, as userData in portaudio documentation). This is a RefCounted object, as it is shared by multiple threads. Hence, all threads should access this object via scoped_refptrs. Note that, for a given GearsAudio object, a single MediaData object exists in memory for the scope of that GearsAudio object. The MediaData object is shared by the GearsMedia object in the main thread, the NetworkMediaRequest object in the network thread and the PaPlayer object in the player thread. Each of these objects hold a scoped_refptr to the shared MediaData object. class MediaData : public RefCounted {
private:
// The lock used for accessing or updating properties of this object in a thread safe way.
Mutex media_data_lock_;
// The media buffer. This is initially NULL.
// Either the network thread or the main thread (in case of load from blob)
// writes to this.
scoped_ptr<std::vector<uint8>> media_buffer_;
public:
// Method to append new data (possibly available from the stream) to the media buffer.
// Acquires media_data_lock_.
// Reads values in new_data and appends them to media_buffer_.
void AppendToMediaBuffer(std::vector<uint8>* new_data);
// Method to reset buffer data.
// Acquires media_data_lock_ and sets media_buffer_ to NULL
void ResetMediaBuffer();
// Method to get data present in the media buffer
// Acquires media_data_lock_.
// Gets data from start_pos of media_buffer of size length and appends onto the 'output'
// as long as index into buffer is valid.
// If index exceeds length of media_buffer_, fills zero thereafter till 'length' values
// are filled in 'out'
std::vector<uint8>* MediaData::GetMediaBufferData(int start_pos, int length)
// ....
// other common properties accessible by all threads
// for instance, last_error_.
}2. GearsMedia class:Abstracts HTML5 media, of which we are only implementing audio for now. GearsAudio, naturally extends from GearsMedia. GearsMedia handles all the Media functionality mentioned in the HTML5 spec. Since this functionality would be common to audio and video, no changes are expected here, if we add a GearsVideo class later. class GearsMedia {
public:
scoped_refptr<MediaData> media_data_;
protected:
virtual void play() = 0;
// The asynchronous request for media data
scoped_ptr<NetworkMediaRequest> network_media_request_;
// other properties required to be exposed as specified in the spec...
}3. GearsAudio class:This is the object created when passing 'beta.audio' to Factory. class GearsAudio
: public GearsMedia,
public ModuleImplBaseClassVirtual {
private:
// instantiated with a PaPlayer object for now through the PlayerFactory
scoped_ptr<PlayerInterface> player_;
// play method is delegated as below. Similarly for other methods/properties of the player,
// such as volume, playback rate, playback position (seek), etc..
virtual void play() {
player_->play();
}
}4. NetworkMediaRequest class:This is the class that takes care of asynchronously loading data from the network into the media_buffer_ field of the MediaData object. This class also implements the HttpRequest::Listener interface and provides implementations for the DataAvailable() and the ReadyStateChanged methods. Whenever new data comes in, this is extracted from the response body and appended to the media_data_ through the AppendToMediaBuffer() method. This class has a scoped_refptr to the media_data which is initialized on creation. class NetworkMediaRequest : HttpRequest::Listener {
public:
NetworkMediaRequest(std::string16 url, MediaData* media_data);
virtual ~NetworkMediaRequest();
// calls open of the httprequest asynchronously.
void AsyncOpen();
private:
HttpRequest* media_request_;
MediaListener* listener_;
scoped_refptr<MediaData> media_data_;
// Listener implementation
void DataAvailable(HttpRequest * source);
void ReadyStateChanged(HttpRequest *source);
}
5. PlayerInterface class:The abstract player interface. GearsAudio class is abstracted from how the player works. It just holds a scoped_ptr to an object that implements PlayerInterface and makes calls such as play(), pause(), etc.... The concrete player instance take care of the actual implementation. class PlayerInterface {
protected:
virtual void play() = 0;
virtual void pause() = 0;
// .... and other such methods and properties that the player should implement.
// eg., seek(), volume, player state (current playback position), playbackRate, etc...
// This is called by the player factory when the player instance is created.
// It does an addref to the media_data object of GearsMedia.
Init(MediaData* media_data) {
media_data_.reset(media_data);
}
private:
scoped_refptr<MediaData> media_data_;
}6. PlayerFactory class:The factory singleton class that returns the instance of the player that we want to use. Right now, only possible value is "portaudio". Later, new players can figure in here, based on different libraries. class PlayerFactory {
public:
PlayerInterface* GetPlayerInstance (std::string16 player_name, MediaData* media_data)
// instantiate the correct player.
// if (player_name is 'portaudio') {
// PlayerInterface *my_player = new PaPlayer();
// }
// AddRef to media_data
my_player->Init(media_data);
return my_player;
}
// private singleton constructor and public getInstance method for the PlayerFactory class.
}7. PaManager class:This is a portaudio specific module. A singleton class, that does two things:
class PaManager {
public:
// Player and recorder threads need to register with the manager.
// When first thread registers, PaManager calls Pa_Init()
void RegisterAudioThread();
// When last thread unregisters, PaManager calls Pa_Terminate().
// Both these methods use a lock to update thread_count.
void UnregisterAudioThread();
// Thread safe wrapper for portaudio close stream.
PaError Pa_SafeCloseStream( PaStream *stream ) {
MutexLock locker(&pa_lock_);
return Pa_CloseStream(stream);
}
// ....
// Similarly define safe methods for other stream methods such as
// OpenDefaultStream, StartStream, StopStream, etc...
private:
Mutex pa_lock_;
// Also have private constructor and a public getInstance method for the
// singleton PaManager instance
}8. PaPlayer class:This is a portaudio specific module. It represents the portaudio based audio player. class PaPlayer : public PlayerInterface {
public:
// PlayerFactory is a friend. See below.
friend class PlayerFactory;
// methods inherited from the interface...
// These use the safe methods on the PaManager
// virtual void play();
// virtual void pause();
// The portaudio callback that does all the magic...
void PaCallback(<params>) {
// code to copy audio buffer into portaudio's 'out' buffer.
}
// This method has a static signature (as required by portaudio).
// We delegate this call to the non-static method above using
// the object handle
static void paCallback(<params>, void *userData) {
PaPlayer *player = reinterpret_cast<PaPlayer>(userData);
player->PaCallback(<params>);
}
private:
// private constructor that only PlayerFactory can access.
PaPlayer() {
// RegisterAudioThread() with global pa_manager
}
~PaPlayer() {
// UnregisterAudioThread() with global pa_manager
}
}Locking semantics:The media_data:media_data_ is not NULL whenever a player thread or a network thread (or in general, whoever needs to read/write media_buffer_) is executing because they use a scoped_refptr to reference the media_data_ and hence have addRef()-ed it. (In other words, whenever PaCallback executes, media_data_ cannot be NULL. This can in fact be asserted in the callback function). Semantics for locking media_buffer:
How they all work together:Main thread:
Network thread (NetworkMediaRequest object):
Player thread:
|
Sign in to add a comment
Awesome!
I love where this can go.
What's the current state of this API?
It seems to be exactly what's needed to make browser-based games written in Javascript much more exciting and immersive. If I'm correct, there is no way to play sounds in browser-based games currently unless those games are written in Flash (or I assume a Java applet).
I don't think it's something that should generally be added to web pages since sounds and songs playing on web pages tend to be unexpected and annoying. However, it's better to have the capability than not and web designers can decide how to judiciously add sounds to their pages.
We're excited about this API but confused: is this operational? What steps would be/are needed to use this to record audio and save the result to our servers?
Thanks!
Have you looked at the outfox project?
http://code.google.com/p/outfox
It currently supports sounds and speech from JS. It would be great if Gears picked up support for both.
Yeah this project looks pretty interesting, but it's clearly not ready for prime-time yet (it's not part of Gears yet) which is too bad. Guess I'll have to rely on some Flash based version.
How cool wouldn't be if u could create a nationale song for your 'country' in games like travian?
I'm making a webgame my self (swedish only for the moment) and i would really like to have capabilities to play dynamic genarated sound.
ogg FTW :)
I play Travian (US5 aj00200) There isn't any place where sound could be used, but maybe in other games.
Is this project dead or alive?
Please include low-latency, full-duplex record/playback to support softphone and video applications.