My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
StreamNotes  
Luke's notes on RTMP streaming
Updated Nov 8, 2007 by king.sel...@gmail.com

Introduction

Notes of different parts of RTMP and streaming. Random order at the moment.

The information contained in this doc is based on my observations and thus assumptions should be tested and challenged. There is still quite a bit of stuff which is unknown or not fully understood.

Low Level Protocol

First part of this document details the low level structure of the RTMP protocol.

Handshake

TODO: handshake description

Headers

Packet headers have the following structure with sizes ranging from 1 to 14 bytes. Square brackets used to denote optional parts.

HeaderType:2, ChannelId:6, [ChannelIdExtra:8|16,] [Timer:24/unsigned, [Size:24, DataType:8, [StreamId:32/little]]]

Header Type

The header type is encoded in the first 2 bits. Different types denote different header sizes.

Type            Size       Value
--------------------------------
New             12 Bytes   00
Same Stream Id   8 Bytes   01
Timer Change     4 Bytes   10
Continue         1 Byte    11

Note: Size does not include possible extra 1-2 bytes taken by larger channel ids.

Channel Id

The remaining 6 bits of the first byte are used to store the channel id. With a range 0-63. Ids 0 and 1 are reserved to signal higher channels requiring more bytes to store. If 0 then the id is larger than 63 and takes an additional byte ( 64 + extra:8 ). If 1 then the id is larger than 319 and takes an additional byte ( 64 + extra:16 ).

Timer

Timer stores the packet time stamp in a 3 byte unsigned medium integer. For new headers the time stamp is absolute, otherwise its relative.

Size

RTMP body size. If the body size is larger than the chunk size (defaults to 128 bytes) then the body will be split into multiple packets. Often the following chunks will have a 1 byte continue header.

Data Type

A single byte is used to store the packet data type. 7 of the 20 types are unknown.

Chopped from ems.hrl

%% RTMP data 
-define(RTMP_TYPE_CHUNK_SIZE,     1).
%-define(RTMP_TYPE_UNKNOWN,       2).
-define(RTMP_TYPE_BYTES_READ,     3).
-define(RTMP_TYPE_PING,           4).
-define(RTMP_TYPE_BW_SERVER,      5).
-define(RTMP_TYPE_BW_CLIENT,      6).
%-define(RTMP_TYPE_UNKNOWN,       7).
-define(RTMP_TYPE_AUDIO,          8).
-define(RTMP_TYPE_VIDEO,          9).
%-define(RTMP_TYPE_UNKNOWN,      10).
%-define(RTMP_TYPE_UNKNOWN,      11).
%-define(RTMP_TYPE_UNKNOWN,      12).
%-define(RTMP_TYPE_UNKNOWN,      13).
%-define(RTMP_TYPE_UNKNOWN,      14).
-define(RTMP_FLEX_STREAM_SEND,   15).
-define(RTMP_FLEX_SHARED_OBJECT, 16).
-define(RTMP_FLEX_MESSAGE,       17).
-define(RTMP_TYPE_NOTIFY,        18).
-define(RTMP_TYPE_META_DATA,     18).
-define(RTMP_TYPE_SHARED_OBJECT, 19).
-define(RTMP_TYPE_INVOKE,        20).

Stream Id

The last part of the header is the stream id, this is stored as an little-endian integer. When the client calls createStream the server responds with the next available stream id. I think this is used by the client to direct the media packets to the correct NetStream object. On the server its used to map the incoming packet to the stream object.

Channels

One of the interesting features of RTMP is that it multiplexes data over multiple channels. Each stream has 3 channels, one each for audio, video, and data.

Channel Id, Use

0 Medium (1 byte extra) Channel Id
1 Large (2 byte extra) Channel Id
2 Used for pings, stream bytes read, etc.
3 Used for invoke calls.

After this 5 channels are used each stream streams.

4 Stream 1, Data
5 Stream 1, Video
6 Stream 1, Audio
7 Stream 1, Unknown
8 Stream 1, Unknown

9 Stream 2, Data
...

TODO: Improve map packet types to channel ids.

Chunk Size (1)

Packets over a configured size (defaults to 128) are split into chunks. Each with its own header.

Unknown (2)

This type is unknown. There was once a report of the player was sending this packet, I asked for more info but nothing was forthcoming. Perhaps we will find out what its for one day.

Stream Bytes Read (3)

Stream bytes read packets are sent by the client or server when receiving data. When the client is publishing the server must send these packets at regular intervals otherwise the client will close the connection. I think the default for the flash player is to send every 125000 bytes.

Pings (Protocol Signals) Packets (4)

Historically we called them pings, since the first one we discovered was like the standard ping. However in addition to this they also signal the stream state and buffer length. Not all the types are known.

Server (5) & Client (6) Bandwidth

There are two types for setting bandwidth for upstream and downstream.

Audio (8) & Video (9) Frames

Audio and Video frames are sent with timestamps down channels 4 and above. Encoded in the body of the packet is the raw frame (flv tag) data. The frame contains its own header which tells us what kind of audio of video frame it is. Certain types of frames can be dropped to compensate for slow network connections.

TODO: more info on frame types

Unknown (10-14)

These types are unknown.

Flex Types (15-17)

These types are reserved for Flex usage. Not needed for our implementation.

Notify / MetaData (18)

Notify is used for calls which do need or expect a reply. This makes it useful for sending events to the client.

Net Status Events

The server sends back status events that the client listens to. You can think of these as a standard layer over the top of the basic RTMP protocol. They do not effect media output on the client directly. For the full list see: ems.hrl.

TODO: describe structure.

Meta Data

Stream metadata is sent down the data channel of the stream to the onMetaData method.

Shared Object (19)

Shared objects are used to connect multiple clients to the same object. Any updates made to the properties of the object are sent to all the clients. Also there is a facility to send method calls to all the listening clients. Shared objects have many applications: Chat rooms, multiplayer games, real time data displays, etc.

TODO: List shared object event types.

Invoke (20)

Invoke is used for calls which expect a response.

TODO: describe invoke, args, result handler, etc.

Higher Level Protocol

Now we have covered the low level protocol we can move onto the higher level commands and sequence of events.

App Commands

Clients call methods when they connect and disconnect. But we cannot rely on disconnect being called since sometimes (in the case of dropped network connection) disconnect will not be called. In order to detect dropped connections we can ping the clients at regular intervals to check they are still responding.

  * connect

  * disconnect

TODO: connect parms

Most of the other commands are stream related (see below), any thing else, is a custom call using NC.call to the application.

Stream Commands

Stream commands (play, pause, seek, etc) are sent to the application. It's the applications responsibility to pass these onto the stream. Think of this as a chain of responsibility. e.g.

RTMP => App [=> PlayList] [=> ServerStream [=> StreamProvider]]

legend: => message passing, optional

createStream() % reserve a stream id (next in sequence)
  -> nextStreamID()

deleteStream(StreamID) % remove the stream and release id

%% following funs know stream id from header

publish(Name, Mode) % modes: record, append, live 

releaseStream(Name) % release a stream (assumed for published streams)

play(Name, Position, Length, FlagFlushPlayList) % all apart from name optional

pause(Flag, Position) % toggle pause, option jump to position

seek(Position) % jump to nearest keyframe

receiveAudio(Flag) % toggle audio

receiveVideo(Flag) % toggle video

closeStream() pass command to stream

Stream Commands Responses

I'm just using a made up notation for now, but we can switch to erlang.

TODO: error responses. TODO: check using debug proxy.

pause

signal(PING_STREAM_PLAYBUFFER_CLEAR)
status(NS_PAUSE_NOTIFY)

resume

signal(PING_STREAM_RESET)
signal(PING_STREAM_CLEAR)
status(NS_UNPAUSE_NOTIFY)

play

status(NS_PLAY_RESET)
status(NS_PLAY_START)
frame(KeyFrame)

stop

signal(PING_STREAM_PLAYBUFFER_CLEAR)
status(NS_PLAY_COMPLETE) if end of playlist
status(NS_PLAY_STOP)

seek

signal(PING_STREAM_PLAYBUFFER_CLEAR)
signal(PING_STREAM_RESET)
signal(PING_STREAM_CLEAR)
status(NS_SEEK_NOTIFY)
status(NS_PLAY_START)
frame(KeyFrame) or frame(BlankAudio)

Architecture Discussion

Push VS Pull

Provider -> Subscriber

FLV Indexing


Sign in to add a comment
Powered by Google Project Hosting