rfc9626xml2.original.xml   rfc9626.xml 
<?xml version="1.0" encoding="UTF-8"?> <?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE rfc >
<?rfc compact="yes"?>
<?rfc subcompact="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="yes"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocdepth="4"?>
<rfc category="exp" docName="draft-ietf-avtext-framemarking-16" ipr="trust200902 <!DOCTYPE rfc [
" submissionType="IETF"> <!ENTITY nbsp "&#160;">
<!ENTITY zwsp "&#8203;">
<!ENTITY nbhy "&#8209;">
<!ENTITY wj "&#8288;">
]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="exp" docName="draft-ie
tf-avtext-framemarking-16" number="9626" consensus="true" updates="" obsoletes="
" ipr="trust200902" submissionType="IETF" symRefs="true" tocInclude="true" tocDe
pth="4" version="3" xml:lang="en">
<front> <front>
<title abbrev="Video Frame Marking">Video Frame Marking RTP Header Extension </title> <title abbrev="Video Frame Marking">Video Frame Marking RTP Header Extension </title>
<seriesInfo name="RFC" value="9626"/>
<author fullname="Mo Zanaty" initials="M" surname="Zanaty"> <author fullname="Mo Zanaty" initials="M" surname="Zanaty">
<organization>Cisco Systems</organization> <organization>Cisco Systems</organization>
<address> <address>
<postal> <postal>
<street>170 West Tasman Drive</street> <street>170 West Tasman Drive</street>
<city>San Jose</city> <city>San Jose</city>
<region>CA</region> <region>CA</region>
<code>95134</code> <code>95134</code>
<country>US</country> <country>United States of America</country>
</postal> </postal>
<email>mzanaty@cisco.com</email> <email>mzanaty@cisco.com</email>
</address> </address>
</author> </author>
<author initials="E." surname="Berger" fullname="Espen Berger"> <author initials="E." surname="Berger" fullname="Espen Berger">
<organization>Cisco Systems</organization> <organization>Cisco Systems</organization>
<address> <address>
<email>espeberg@cisco.com</email> <email>espeberg@cisco.com</email>
</address> </address>
</author> </author>
<author fullname="Suhas Nandakumar" initials="S" surname="Nandakumar"> <author fullname="Suhas Nandakumar" initials="S" surname="Nandakumar">
<organization>Cisco Systems</organization> <organization>Cisco Systems</organization>
<address> <address>
<postal> <postal>
<street>170 West Tasman Drive</street> <street>170 West Tasman Drive</street>
<city>San Jose</city> <city>San Jose</city>
<region>CA</region> <region>CA</region>
<code>95134</code> <code>95134</code>
<country>US</country> <country>United States of America</country>
</postal> </postal>
<email>snandaku@cisco.com</email> <email>snandaku@cisco.com</email>
</address> </address>
</author> </author>
<date day="04" month="March" year="2024"/> <date month="August" year="2024"/>
<area>Applications</area> <area>WIT</area>
<keyword>Internet-Draft</keyword> <workgroup>avtcore</workgroup>
<!-- [rfced] Please insert any keywords (beyond those that appear in
the title) for use on https://www.rfc-editor.org/search. -->
<keyword>example</keyword>
<abstract> <abstract>
<t>This document describes a Video Frame Marking RTP header extension used to <t>This document describes a Video Frame Marking RTP header extension used to
convey information about video frames that is critical for error recovery convey information about video frames that is critical for error recovery
and packet forwarding in RTP middleboxes or network nodes. It is most and packet forwarding in RTP middleboxes or network nodes. It is most
useful when media is encrypted, and essential when the middlebox or node useful when media is encrypted and essential when the middlebox or node
has no access to the media decryption keys. It is also useful for has no access to the media decryption keys. It is also useful for
codec-agnostic processing of encrypted or unencrypted media, while it also codec-agnostic processing of encrypted or unencrypted media, while it also
supports extensions for codec-specific information.</t> supports extensions for codec-specific information.</t>
</abstract> </abstract>
</front> </front>
<middle> <middle>
<section anchor="intro">
<section title="Introduction" anchor="intro"> <name>Introduction</name>
<t>Many widely deployed RTP <xref target="RFC3550" /> topologies <t>Many widely deployed RTP <xref target="RFC3550"/> topologies
<xref target="RFC7667" /> used in modern voice and video <xref target="RFC7667"/> used in modern voice and video
conferencing systems include a centralized component that acts as an RTP s witch. conferencing systems include a centralized component that acts as an RTP s witch.
It receives voice and video streams from each participant, which may be en crypted using It receives voice and video streams from each participant, which may be en crypted using
SRTP <xref target="RFC3711" />, or extensions that provide participants wi Secure Real-time Transport Protocol (SRTP) <xref target="RFC3711"/> or ext
th ensions that provide participants with
private media <xref target="RFC8871" /> private media <xref target="RFC8871"/>
via end-to-end encryption where the switch has no access to media decrypti on keys. via end-to-end encryption where the switch has no access to media decrypti on keys.
The goal is to provide a set of streams back to The goal is to provide a set of streams back to
the participants which enable them to render the right media content. In a the participants, which enable them to render the right media content. For
simple video configuration, for example, the goal will be that each partic example, in a
ipant simple video configuration, the goal will be that each participant
sees and hears just the active speaker. In that case, the goal of the swit ch is to sees and hears just the active speaker. In that case, the goal of the swit ch is to
receive the voice and video streams from each participant, determine the a ctive receive the voice and video streams from each participant, determine the a ctive
speaker based on energy in the voice packets, possibly using the client-to -mixer speaker based on energy in the voice packets, possibly using the client-to -mixer
audio level RTP header extension <xref target="RFC6464" />, and select the audio level RTP header extension <xref target="RFC6464"/>, and select the
corresponding video corresponding video
stream for transmission to participants; see <xref target="rtpswitch" /> stream for transmission to participants; see <xref target="rtpswitch"/>.
.</t> </t>
<t>In this document, an "RTP switch" is used as shorthand for the terms
<t>In this document, an "RTP switch" is used as a common short term for th
e terms
"switching RTP mixer", "source projecting middlebox", "switching RTP mixer", "source projecting middlebox",
"source forwarding unit/middlebox" and "video switching MCU" as "source forwarding unit/middlebox" and "video switching Multipoint Control
discussed in <xref target="RFC7667" />.</t> Unit (MCU)", as
discussed in <xref target="RFC7667"/>.</t>
<figure title="RTP switch" anchor="rtpswitch"><artwork><![CDATA[ <figure anchor="rtpswitch">
<name>RTP Switch</name>
<artwork><![CDATA[
+---+ +------------+ +---+ +---+ +------------+ +---+
| A |<---->| |<---->| B | | A |<---->| |<---->| B |
+---+ | | +---+ +---+ | | +---+
| RTP | | RTP |
+---+ | Switch | +---+ +---+ | Switch | +---+
| C |<---->| |<---->| D | | C |<---->| |<---->| D |
+---+ +------------+ +---+ +---+ +------------+ +---+
]]>
</artwork></figure>
<t>In order to properly support switching of video streams, the RTP switch t ]]></artwork>
ypically needs </figure>
<t>In order to properly support the switching of video streams, the RTP sw
itch typically needs
some critical information about video frames in order to start and stop forw arding streams. some critical information about video frames in order to start and stop forw arding streams.
<list style="symbols"> </t>
<t>Because of inter-frame dependencies, it should ideally switch video s <ul>
treams at a point <li>
<!--[rfced] Please review whether "e.g." in the following should
instead be "i.e.":
Original:
Because of inter-frame dependencies, it should ideally switch video
streams at a point where the first frame from the new speaker can be
decoded by recipients without prior frames, e.g. switch on an
intra-frame.
-->
<t>Because of inter-frame dependencies, it should ideally switch video
streams at a point
where the first frame from the new speaker can be decoded by recipients without prior where the first frame from the new speaker can be decoded by recipients without prior
frames, e.g switch on an intra-frame.</t> frames, e.g., switch on an intra-frame.</t>
<t>In many cases, the switch may need to drop frames in order to realize </li>
congestion control <li>
techniques, and needs to know which frames can be dropped with minimal i <t>In many cases, the switch may need to drop frames in order to reali
mpact to video quality.</t> ze congestion control
<t>For scalable streams with dependent layers, the switch may need to se techniques, and it needs to know which frames can be dropped with minima
lectively forward l impact to video quality.</t>
</li>
<li>
<t>For scalable streams with dependent layers, the switch may need to
selectively forward
specific layers to specific recipients due to recipient bandwidth or dec oder limits.</t> specific layers to specific recipients due to recipient bandwidth or dec oder limits.</t>
</list> </li>
</t> </ul>
<t>Furthermore, it is highly desirable to do this in a payload format-agno
<t>Furthermore, it is highly desirable to do this in a payload format-agno stic way that is not
stic way which is not
specific to each different video codec. specific to each different video codec.
Most modern video codecs share common concepts around frame types and ot her critical information Most modern video codecs share common concepts around frame types and ot her critical information
to make this codec-agnostic handling possible.</t> to make this codec-agnostic handling possible.</t>
<t>It is also desirable to be able to do this for SRTP without requiring t he video switch to <t>It is also desirable to be able to do this for SRTP without requiring t he video switch to
decrypt the packets. SRTP will encrypt the RTP payload format contents a nd consequently this decrypt the packets. SRTP will encrypt the RTP payload format contents; consequently, this
data is not usable for the switching function without decryption, which may not even data is not usable for the switching function without decryption, which may not even
be possible in the case of end-to-end encryption of private media be possible in the case of end-to-end encryption of private media
<xref target="RFC8871" />.</t> <xref target="RFC8871"/>.</t>
<t>By providing meta-information about the RTP streams outside the encrypt ed media payload, an <t>By providing meta-information about the RTP streams outside the encrypt ed media payload, an
RTP switch can do codec-agnostic selective forwarding without decrypting t he payload. RTP switch can do codec-agnostic selective forwarding without decrypting t he payload.
This document specifies the necessary meta-information in an RTP header ex tension. This document specifies the necessary meta-information in an RTP header ex tension.
</t> </t>
</section>
<section title="Key Words for Normative Requirements">
<t>
The key words &quot;MUST&quot;, &quot;MUST NOT&quot;, &quot;REQUIRED&quo
t;,
&quot;SHALL&quot;, &quot;SHALL NOT&quot;, &quot;SHOULD&quot;, &quot;SHOU
LD NOT&quot;, &quot;RECOMMENDED&quot;, &quot;NOT RECOMMENDED&quot;, &quot;MAY&q
uot;, and
&quot;OPTIONAL&quot; in this document are to be interpreted as described
in
BCP 14 <xref target="RFC2119" /> <xref target="RFC8174" /> when, and on
ly when, they
appear in all capitals, as shown here. </t>
</section> </section>
<section>
<name>Requirements Language</name>
<section title="Frame Marking RTP Header Extension"> <t>
<t>This specification uses RTP header extensions as defined in <xref targe The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
t="RFC8285" />. A subset of "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>
",
"<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
"<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to
be
interpreted as described in BCP&nbsp;14 <xref target="RFC2119"/> <xref
target="RFC8174"/> when, and only when, they appear in all capitals, as
shown here.
</t>
</section>
<section>
<name>Frame Marking RTP Header Extension</name>
<t>This specification uses RTP header extensions as defined in <xref targe
t="RFC8285"/>. A subset of
meta-information from the video stream is provided as an RTP header extens ion to allow an RTP switch meta-information from the video stream is provided as an RTP header extens ion to allow an RTP switch
to do generic selective forwarding of video streams encoded with potential ly different video codecs.</t> to do generic selective forwarding of video streams encoded with potential ly different video codecs.</t>
<t>The Frame Marking RTP header extension is encoded
<t>The Frame Marking RTP header extension is encoded using the one-byte header or two-byte header as described in <xref target
using the one-byte header or two-byte header as described in <xref target ="RFC8285"/>.
="RFC8285" />. The one-byte header format is used for examples in this document.
The one-byte header format is used for examples in this memo.
The two-byte header format is used when other two-byte header extensions The two-byte header format is used when other two-byte header extensions
are present in the same RTP packet, since mixing one-byte and two-byte e xtensions are present in the same RTP packet since mixing one-byte and two-byte ex tensions
is not possible in the same RTP packet.</t> is not possible in the same RTP packet.</t>
<t>This extension is only specified for Source (not Redundancy) RTP Stream
<t>This extension is only specified for Source (not Redundancy) RTP Stre s
ams <xref target="RFC7656"/> that carry video payloads.
<xref target="RFC7656" /> that carry video payloads.
It is not specified for audio payloads, nor is it specified for Redu ndancy RTP Streams. It is not specified for audio payloads, nor is it specified for Redu ndancy RTP Streams.
The (separate) specifications for Redundancy RTP Streams often inclu de The (separate) specifications for Redundancy RTP Streams often inclu de
provisions for recovering any header extensions that were part of th e original source packet. provisions for recovering any header extensions that were part of th e original source packet.
Such provisions can be followed to recover the Frame Marking RTP hea der extension of the Such provisions can be followed to recover the Frame Marking RTP hea der extension of the
original source packet. original source packet.
Source packet frame markings may be useful when generating Redundanc y RTP Streams; Source packet frame markings may be useful when generating Redundanc y RTP Streams;
for example, the I (Independent Frame) and D (Discardable Frame) bit s, for example, the I (Independent Frame) and D (Discardable Frame) bit s,
defined in <xref target="mandatory-scalable" />, defined in <xref target="mandatory-scalable"/>,
can be used to generate extra or no redundancy, respectively, can be used to generate extra or no redundancy, respectively,
and redundancy schemes with source blocks can align source block bou ndaries with and redundancy schemes with source blocks can align source block bou ndaries with
independent frame boundaries as marked by the I bit. independent frame boundaries as marked by the I bit.
</t> </t>
<t>A frame, in the context of this specification, is the set of RTP pack <t>A frame, in the context of this specification, is the set of RTP packet
ets s
with the same RTP timestamp from a specific RTP synchronization source with the same RTP timestamp from a specific RTP Synchronization Source
(SSRC). (SSRC).
A frame within a layer is the set of RTP packets with the same RTP tim estamp, SSRC, A frame within a layer is the set of RTP packets with the same RTP tim estamp, SSRC,
Temporal ID (TID), and Layer ID (LID).</t> Temporal ID (TID), and Layer ID (LID).</t>
<section anchor="mandatory-scalable">
<section title="Long Extension for Scalable Streams" anchor="mandatory-sca <name>Long Extension for Scalable Streams</name>
lable"> <t>The following RTP header extension is <bcp14>RECOMMENDED</bcp14> for
<t>The following RTP header extension is RECOMMENDED for scalable streams scalable streams.
. It <bcp14>MAY</bcp14> also be used for non-scalable streams, in which
It MAY also be used for non-scalable streams, in which case TID, LID case the TID, LID, and TL0PICIDX <bcp14>MUST</bcp14> be 0 or omitted.
and TL0PICIDX MUST be 0 or omitted. The ID is assigned per <xref target="RFC8285"/>.
The ID is assigned per <xref target="RFC8285" />, The length is encoded as follows:</t>
and the length is encoded as L=2 which indicates 3 octets of data whe <ul>
n nothing is omitted, <li>L=2 to indicate 3 octets of data when nothing is omitted,</li>
or L=1 for 2 octets when TL0PICIDX is omitted, or L=0 for 1 octet whe <li>L=1 for 2 octets when TL0PICIDX is omitted, or</li>
n both LID and TL0PICIDX are omitted.</t> <li>L=0 for 1 octet when both the LID and TL0PICIDX are omitted.</li
<figure> ></ul>
<artwork><![CDATA[ <artwork><![CDATA[
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX | | ID=? | L=2 |S|E|I|D|B| TID | LID | TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
or or
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted) | ID=? | L=1 |S|E|I|D|B| TID | LID | (TL0PICIDX omitted)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
or or
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted) | ID=? | L=0 |S|E|I|D|B| TID | (LID and TL0PICIDX omitted)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
<t>The following information is extracted from the media payload and sen
t in the Frame Marking RTP header extension.
</t>
<dl newline="true">
<t>The following information are extracted from the media payload and se <dt>S: Start of Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 in th
nt in the Frame Marking RTP header extension. e first packet in a frame
<list style='symbols'> within a layer; otherwise, <bcp14>MUST</bcp14> be 0.</dd>
<t>S: Start of Frame (1 bit) - MUST be 1 in the first packet in a fr
ame <dt>E: End of Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 in the
within a layer; otherwise MUST be 0.</t> last packet in a frame
<t>E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame within a layer; otherwise, <bcp14>MUST</bcp14> be 0.
within a layer; otherwise MUST be 0. Note that the RTP header marker bit <bcp14>MAY</bcp14> be used t
Note that the RTP header marker bit MAY be used to infer the las o infer the last packet of the highest enhancement layer in payload formats with
t packet of the highest enhancement layer, in payload formats with such semantic such semantics.</dd>
s.</t>
<t>I: Independent Frame (1 bit) - MUST be 1 for a frame within a la <dt>I: Independent Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 f
yer that can be or a frame within a layer that can be
decoded independent of temporally prior frames, e.g. intra-frame, decoded independent of temporally prior frames, e.g., intra-frame
VPX keyframe, , VPX keyframe,
H.264 IDR <xref target="RFC6184" />, H.264 Instantaneous Decoding Refresh (IDR) <xref target="RFC6184"
H.265 IDR/CRA/BLA/RAP <xref target="RFC7798" />; />, or
otherwise MUST be 0. H.265 IDR / Clean Random Access (CRA) / Broken Link Access (BLA)
/ Random Access Point (RAP) <xref target="RFC7798"/>;
otherwise, <bcp14>MUST</bcp14> be 0.
Note that this bit only signals temporal independence, so it can be Note that this bit only signals temporal independence, so it can be
1 in spatial or quality enhancement layers that depend on tempora lly 1 in spatial or quality enhancement layers that depend on tempora lly
co-located layers but not temporally prior frames.</t> co-located layers but not temporally prior frames.</dd>
<t>D: Discardable Frame (1 bit) - MUST be 1 for a frame within a lay
er the sender knows can be discarded,
and still provide a decodable media stream; otherwise MUST be 0.
</t>
<t>B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if
the sender knows this frame within a layer only depends
on the base temporal layer; otherwise MUST be 0. When TID is 0 o
r if no scalability is used, this MUST be 0.</t>
<t>TID: Temporal ID (3 bits) - Identifies the temporal layer/sub-lay
er encoded,
starting with 0 for the base layer, and increasing with higher te
mporal fidelity.
If no scalability is used, this MUST be 0. It is implicitly 0 in
the short extension format.</t>
<t>LID: Layer ID (8 bits) - Identifies the spatial and quality layer
encoded,
starting with 0 for the base layer, and increasing with higher fi
delity.
If no scalability is used, this MUST be 0 or omitted to reduce le
ngth.
When omitted, TL0PICIDX MUST also be omitted. It is implicitly 0
in the short extension format
or when omitted in the long extension format.</t>
<t>TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is
0 and LID is 0, this is a cyclic counter labeling
base layer frames. When TID is not 0 or LID is not 0,
this indicates a dependency on the given index, such that this fr
ame within this layer
depends on the frame with this label in the layer with TID 0 and
LID 0.
If no scalability is used, or the cyclic counter is unknown, this
MUST be omitted to reduce length.
Note that 0 is a valid index value for TL0PICIDX.</t>
</list>
</t>
<t>The layer information contained in TID and LID convey useful aspects o <dt>D: Discardable Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 fo
f the layer structure that r a frame within a layer the sender knows can be discarded
and still provide a decodable media stream; otherwise, <bcp14>MU
ST</bcp14> be 0. </dd>
<dt>B: Base Layer Sync (1 bit)</dt><dd>When the TID is not 0, this <
bcp14>MUST</bcp14> be 1 if the sender knows this frame within a layer only depen
ds
on the base temporal layer; otherwise, <bcp14>MUST</bcp14> be 0.
When the TID is 0 or if no scalability is used, this <bcp14>MUST</bcp14> be 0.
</dd>
<dt>TID: Temporal ID (3 bits)</dt><dd>Identifies the temporal layer/
sub-layer encoded,
starting with 0 for the base layer and increasing with higher tem
poral fidelity.
If no scalability is used, this <bcp14>MUST</bcp14> be 0. It is i
mplicitly 0 in the short extension format.
</dd>
<dt>LID: Layer ID (8 bits)</dt><dd>Identifies the spatial and qualit
y layer encoded,
starting with 0 for the base layer and increasing with higher fid
elity.
If no scalability is used, this <bcp14>MUST</bcp14> be 0 or omitt
ed to reduce length.
When the LID is omitted, TL0PICIDX <bcp14>MUST</bcp14> also be om
itted. It is implicitly 0 in the short extension format
or when omitted in the long extension format.</dd>
<dt>TL0PICIDX: Temporal Layer 0 Picture Index (8 bits)</dt><dd>When
the TID is 0 and the LID is 0, this is a cyclic counter labeling
base layer frames. When the TID is not 0 or the LID is not 0,
the indication is that a dependency on the given index, such that
this frame within this layer
depends on the frame with this label in the layer with a TID 0 an
d LID 0.
If no scalability is used, or the cyclic counter is unknown, TL0P
ICIDX <bcp14>MUST</bcp14> be omitted to reduce length.
Note that 0 is a valid index value for TL0PICIDX.</dd>
</dl>
<t>The layer information contained in the TID and LID convey useful aspe
cts of the layer structure that
can be utilized in selective forwarding.</t> can be utilized in selective forwarding.</t>
<t>Without further information about the layer structure, <t>Without further information about the layer structure,
these TID/LID identifiers can only be used for relative priority of la yers these TID/LID identifiers can only be used for relative priority of la yers
and implicit dependencies between layers. and implicit dependencies between layers.
They convey a layer hierarchy with TID=0 and LID=0 identifying the bas e layer. They convey a layer hierarchy with TID = 0 and LID = 0 identifying the base layer.
Higher values of TID identify higher temporal layers with higher frame rates. Higher values of TID identify higher temporal layers with higher frame rates.
Higher values of LID identify higher spatial and/or quality layers wit h higher resolutions and/or bitrates. Higher values of LID identify higher spatial and/or quality layers wit h higher resolutions and/or bitrates.
Implicit dependencies between layers assume that a layer with a given Implicit dependencies between layers assume that a layer with a given
TID/LID MAY depend TID/LID <bcp14>MAY</bcp14> depend
on layer(s) with the same or lower TID/LID, but MUST NOT depend on lay on a layer or layers with the same or lower TID/LID, but they <bcp14>M
er(s) with higher TID/LID. UST NOT</bcp14> depend on a layer or layers with higher TID/LID.
</t><t> </t>
<t>
With further information, With further information,
for example, possible future RTCP SDES items that convey full layer st for example, possible future RTCP source description (SDES) items that
ructure information, it may convey full layer structure information, it may
be possible to map these TIDs and LIDs to specific absolute frame rate be possible to map these TIDs and LIDs to specific absolute frame rate
s, resolutions and bitrates, s, resolutions, bitrates, and explicit dependencies between layers.
as well as explicit dependencies between layers. Such additional layer information may be useful for forwarding decisio
Such additional layer information may be useful for forwarding decisio ns in the RTP switch
ns in the RTP switch,
but is beyond the scope of this memo. The relative layer information i s still useful but is beyond the scope of this memo. The relative layer information i s still useful
for many selective forwarding decisions even without such additional l ayer information. for many selective forwarding decisions, even without such additional layer information.
</t> </t>
</section> </section>
<section anchor="mandatory-non-scalable">
<section title="Short Extension for Non-Scalable Streams" anchor="mandator <name>Short Extension for Non-scalable Streams</name>
y-non-scalable"> <t>The following RTP header extension is <bcp14>RECOMMENDED</bcp14> for
<t>The following RTP header extension is RECOMMENDED for non-scalable str non-scalable streams.
eams.
It is identical to the shortest form of the extension for scalable st reams, It is identical to the shortest form of the extension for scalable st reams,
except the last four bits (B and TID) are replaced with zeros. except the last four bits (B and TID) are replaced with zeros.
It MAY also be used for scalable streams if the sender has limited or no It <bcp14>MAY</bcp14> also be used for scalable streams if the sender has limited or no
information about stream scalability. information about stream scalability.
The ID is assigned per <xref target="RFC8285" />, The ID is assigned per <xref target="RFC8285"/>;
and the length is encoded as L=0 which indicates 1 octet of data.</t> the length is encoded as L=0, which indicates 1 octet of data.</t>
<artwork><![CDATA[
<figure>
<artwork><![CDATA[
0 1 0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=0 |S|E|I|D|0 0 0 0| | ID=? | L=0 |S|E|I|D|0 0 0 0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
<t>The following information is extracted from the media payload and sen
<t>The following information are extracted from the media payload and se t in the Frame Marking RTP header extension.
nt in the Frame Marking RTP header extension.
<list style='symbols'>
<t>S: Start of Frame (1 bit) - MUST be 1 in the first packet in a fr
ame; otherwise MUST be 0.</t>
<t>E: End of Frame (1 bit) - MUST be 1 in the last packet in a frame
; otherwise MUST be 0.
SHOULD match the RTP header marker bit in payload formats with such
semantics for marking end of frame.</t>
<t>I: Independent Frame (1 bit) - MUST be 1 for frames that can be
decoded independent of temporally prior frames, e.g. intra-frame,
VPX keyframe,
H.264 IDR <xref target="RFC6184" />,
H.265 IDR/CRA/BLA/IRAP <xref target="RFC7798" />;
otherwise MUST be 0. </t>
<t>D: Discardable Frame (1 bit) - MUST be 1 for frames the sender kn
ows can be discarded,
and still provide a decodable media stream; otherwise MUST be 0.
</t>
<t>The remaining (4 bits) - are reserved/fixed values and not used f
or non-scalable streams;
they MUST be set to 0 upon transmission and ignored upon reception
.</t>
</list>
</t> </t>
</section>
<section title="Layer ID Mappings for Scalable Streams"> <dl newline="true">
<t> This section maps the specific Layer ID information contained in speci <dt>S: Start of Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 in th
fic scalable codecs to the generic LID and TID fields. </t> e first packet in a frame; otherwise, <bcp14>MUST</bcp14> be 0.</dd>
<t> Note that non-scalable streams have no Layer ID information and thus n
o mappings. </t>
<section title="VP9 LID Mapping"> <dt>E: End of Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 in the
<t> The VP9 <xref target="I-D.ietf-payload-vp9" /> last packet in a frame; otherwise, <bcp14>MUST</bcp14> be 0.
<bcp14>SHOULD</bcp14> match the RTP header marker bit in payload for
mats with such semantics for marking end of frame.</dd>
<dt>I: Independent Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 f
or frames that can be
decoded independent of temporally prior frames, e.g., intra-frame
, VPX keyframe,
H.264 IDR <xref target="RFC6184"/>, or
H.265 IDR/CRA/BLA/IRAP <xref target="RFC7798"/>;
otherwise, <bcp14>MUST</bcp14> be 0. </dd>
<dt>D: Discardable Frame (1 bit)</dt><dd><bcp14>MUST</bcp14> be 1 fo
r frames the sender knows can be discarded
and still provide a decodable media stream; otherwise, <bcp14>MU
ST</bcp14> be 0. </dd>
<dt>The remaining (4 bits)</dt><dd>These are reserved/fixed values a
nd not used for non-scalable streams;
they <bcp14>MUST</bcp14> be set to 0 upon transmission and ignored
upon reception.</dd>
</dl>
</section>
<section>
<name>LID Mappings for Scalable Streams</name>
<t> This section maps the specific Layer ID (LID) information contained
in specific scalable codecs to the generic LID and TID fields. </t>
<t> Note that non-scalable streams have no LID information; thus, they h
ave no mappings. </t>
<section>
<name>VP9 LID Mapping</name>
<t> The VP9 <xref target="RFC9628"/>
Spatial Layer ID (SID, 3 bits) and Temporal Layer ID (TID, 3 bits) Spatial Layer ID (SID, 3 bits) and Temporal Layer ID (TID, 3 bits)
in the VP9 payload descriptor are mapped to the generic LID and TID f ields in the VP9 payload descriptor are mapped to the generic LID and TID f ields
in the header extension as shown in the following figure.</t> in the header extension as shown in the following figure.</t>
<artwork><![CDATA[
<figure>
<artwork><![CDATA[
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0| SID | TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
<t> The S bit <bcp14>MUST</bcp14> match the B bit in the VP9 payload d
escriptor.</t>
<t> The E bit <bcp14>MUST</bcp14> match the E bit in the VP9 payload d
escriptor.</t>
<t> The I bit <bcp14>MUST</bcp14> match the inverse of the P bit in th
e VP9 payload descriptor.</t>
<t> The S bit MUST match the B bit in the VP9 payload descriptor.</t> <!--[rfced] Should "field" or some other noun follow
<t> The E bit MUST match the E bit in the VP9 payload descriptor.</t> "refresh_frame_flags" in this sentence? Or is this referring to
<t> The I bit MUST match the inverse of the P bit in the VP9 payload desc the flags (as the verb "are" is plural)?
riptor.</t>
<t> The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload unc Original:
ompressed header are all 0, otherwise it MUST be 0.</t> The D bit MUST be 1 if the refresh_frame_flags in the VP9 payload
<t> The B bit MUST be 0 if TID is 0; otherwise, if TID is not 0, it MUST uncompressed header are all 0, otherwise it MUST be 0.
match the U bit in the VP9 payload descriptor. Note: When using temporally neste -->
d scalability structures as recommended in <xref target="scalable-structures" />
, the B bit and VP9 U bit will always be 1 if TID is not 0, since it is always <t> The D bit <bcp14>MUST</bcp14> be 1 if the refresh_frame_flags in t
he VP9 payload uncompressed header are all 0; otherwise, it <bcp14>MUST</bcp14>
be 0.</t>
<t> The B bit <bcp14>MUST</bcp14> be 0 if the TID is 0; if the TID is
not 0, it <bcp14>MUST</bcp14> match the U bit in the VP9 payload descriptor. Not
e: when using temporally nested scalability structures as recommended in <xref t
arget="scalable-structures"/>, the B bit and VP9 U bit will always be 1 if the T
ID is not 0 since it is always
possible to switch up to a higher temporal layer in such nested struc tures.</t> possible to switch up to a higher temporal layer in such nested struc tures.</t>
<t> TID, SID and TL0PICIDX MUST match the correspondingly named fields in the VP9 payload descriptor, <t>The TID, SID, and TL0PICIDX <bcp14>MUST</bcp14> match the correspon dingly named fields in the VP9 payload descriptor,
with SID aligned in the least significant 3 bits of the 8-bit LID fie ld and zeros with SID aligned in the least significant 3 bits of the 8-bit LID fie ld and zeros
in the most significant 5 bits.</t> in the most significant 5 bits.</t>
</section>
<section>
<name>H265 LID Mapping</name>
</section> <t> The H265 <xref target="RFC7798"/> LayerID (6 bits), and TID (3 bit
s)
<section title="H265 LID Mapping"> from the Network Abstraction Layer (NAL) unit header are mapped to
<t> The H265 <xref target="RFC7798" /> LayerID (6 bits) and TID (3 bits) the generic LID and TID fields
from the NAL unit header are mapped to the generic LID and TID fiel
ds
in the header extension as shown in the following figure.</t> in the header extension as shown in the following figure.</t>
<figure>
<artwork><![CDATA[ <artwork><![CDATA[
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX | | ID=? | L=2 |S|E|I|D|B| TID |0|0| LayerID | TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
<t>The S and E bits MUST match the correspondingly named bits in PACI:PHE <!--[rfced] [*AD] We see several (similar) sentences like the example
S:TSCI payload structures.</t> below where it might be difficult for the reader to correclty
<t>The I bit MUST be 1 when the NAL unit type is 16-23 (inclusive) or 32- understand what part(s) of the sentence the keyword MUST applies
34 (inclusive), or an aggregation packet or fragmentation unit encapsulating any to. We wonder if a rewrite may be helpful to the reader,
of these types, otherwise it MUST be 0. These ranges cover intra (IRAP) frames possibly using a list... Please see the example below (again,
as well as other similar instances exist in the document) and let us know if
critical parameter sets (VPS, SPS, PPS).</t> an update like one of the following might work.
<t>The D bit MUST be 1 when the NAL unit type is 0, 2, 4, 6, 8, 10, 12, 1
4, or 38, or an aggregation packet or fragmentation unit encapsulating only thes
e types, otherwise it MUST be 0. These ranges cover non-reference frames as well
as filler data.</t>
<t>The B bit can not be determined reliably from simple inspection of pay
load headers, and therefore is determined by implementation-specific means. For
example, internal codec interfaces may provide information to set this reliably.
</t>
<t> TID and LayerID MUST match the correspondingly named fields in the H2
65 NAL unit header,
with LayerID aligned in the least significant 6 bits of the 8-bit LID
field and zeros
in the most significant 2 bits.</t>
</section> Original:
<section title="H264-SVC LID Mapping"> The D bit MUST be 1 when the NAL unit header NRI field is 0, or an
<t> The following shows H264-SVC <xref target="RFC6190" /> Layer encoding inf aggregation packet or fragmentation unit encapsulating only NAL units
ormation (3 bits for with NRI=0, otherwise it MUST be 0.
spatial/dependency layer, 4 bits for quality layer and 3 bits for temporal la
yer) mapped to the generic LID and TID fields.</t> Perhaps A (the "when" clause applies to both the D bit being set to 1 or NRI=0):
<t>The S, E, I and D bits MUST match the correspondingly named bits in PACSI
payload structures.</t> When the NAL unit header NRI field is 0, the D bit MUST be either 1 or
<t>The I bit MUST be 1 when the NAL unit type is 5, 7, 8, 13, or 15, an aggregation packet or fragmentation unit encapsulating only NAL
or an aggregation packet or fragmentation unit encapsulating any of these t units with NRI=0. When the NAL unit header NRI field is not set to 0,
ypes, otherwise it MUST be 0. These ranges cover intra (IDR) frames as well as the D bit MUST be 0.
Perhaps B (the "when" clause only applies to the D bit being 0):
The D bit MUST be:
-1 when the NAL unit header NRI field is 0,
-an aggregation packet or fragmentation unit encapsulating only NAL units
with NRI=0, or
- 0.
-->
<t>The S and E bits <bcp14>MUST</bcp14> match the correspondingly name
d bits in PACI:PHES:TSCI payload structures.</t>
<t>The I bit <bcp14>MUST</bcp14> be 1 when the NAL unit type is 16-23
(inclusive) or 32-34 (inclusive), or an aggregation packet or fragmentation unit
encapsulating any of these types; otherwise, it <bcp14>MUST</bcp14> be 0. These
ranges cover intra (IRAP) frames as well as
critical parameter sets (Video Parameter Set (VPS), Sequence Paramete
r Set (SPS), Picture Parameter Set (PPS)).</t>
<t>The D bit <bcp14>MUST</bcp14> be 1 when the NAL unit type is 0, 2,
4, 6, 8, 10, 12, 14, 38, or an aggregation packet or fragmentation unit encapsul
ating only these types; otherwise, it <bcp14>MUST</bcp14> be 0. These ranges cov
er non-reference frames as well as filler data.</t>
<t>The B bit cannot be determined reliably from simple inspection of p
ayload headers; therefore, it is determined by implementation-specific means. Fo
r example, internal codec interfaces may provide information to set this reliabl
y.</t>
<t>The TID and LayerID <bcp14>MUST</bcp14> match the correspondingly n
amed fields in the H265 NAL unit header,
with LayerID aligned in the least significant 6 bits of the 8-bit LID
field and zeros
in the most significant 2 bits.</t>
</section>
<section>
<name>H264 Scalable Video Coding (SVC) LID Mapping</name>
<t> The following shows H264-SVC <xref target="RFC6190"/> Layer encodi
ng information (3 bits for
spatial/dependency layer, 4 bits for quality layer, and 3 bits for temporal l
ayer) mapped to the generic LID and TID fields.</t>
<t>The S, E, I, and D bits <bcp14>MUST</bcp14> match the corresponding
ly named bits in Payload Content Scalability Information (PACSI) payload structu
res.</t>
<t>The I bit <bcp14>MUST</bcp14> be 1 when the NAL unit type is 5, 7,
8, 13, 15,
or an aggregation packet or fragmentation unit encapsulating any of these t
ypes; otherwise, it <bcp14>MUST</bcp14> be 0. These ranges cover intra (IDR) fra
mes as well as
critical parameter sets (SPS/PPS variants).</t> critical parameter sets (SPS/PPS variants).</t>
<t>The D bit MUST be 1 when the NAL unit header NRI field is 0, or an aggrega tion packet or fragmentation unit encapsulating only NAL units with NRI=0, other wise it MUST be 0. <t>The D bit <bcp14>MUST</bcp14> be 1 when the NAL unit header Network Remote Identification (NRI) field is 0, or an aggregation packet or fragmentati on unit encapsulating only NAL units with NRI=0; otherwise, it <bcp14>MUST</bcp1 4> be 0.
The NRI=0 condition signals non-reference frames.</t> The NRI=0 condition signals non-reference frames.</t>
<t>The B bit can not be determined reliably from simple inspection of payload <t>The B bit cannot be determined reliably from simple inspection of p
headers, and therefore is determined by implementation-specific means. For exam ayload headers; therefore, it is determined by implementation-specific means. Fo
ple, internal codec interfaces may provide information to set this reliably.</t> r example, internal codec interfaces may provide information to set this reliabl
y.</t>
<figure>
<artwork><![CDATA[ <artwork><![CDATA[
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX | | ID=? | L=2 |S|E|I|D|B| TID |0| DID | QID | TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
</section> </section>
<section>
<section title="H264 (AVC) LID Mapping"> <name>H264 Advanced Video Coding (AVC) LID Mapping</name>
<t> The following shows the header extension for H264 (AVC) <xref target="RF <t> The following shows the header extension for H264 (AVC) <xref tar
C6184" /> that contains get="RFC6184"/> that contains
only temporal layer information.</t> only temporal layer information.</t>
<t> The S bit MUST be 1 when the timestamp in the RTP header differs from the <t> The S bit <bcp14>MUST</bcp14> be 1 when the timestamp in the RTP h
timestamp eader differs from the timestamp
in the prior RTP sequence number from the same SSRC, otherwise it MUST be 0 in the prior RTP sequence number from the same SSRC; otherwise, it <bcp14>M
.</t> UST</bcp14> be 0.</t>
<t> The E bit MUST match the M bit in the RTP header.</t> <t> The E bit <bcp14>MUST</bcp14> match the M bit in the RTP header.</
<t>The I bit MUST be 1 when the NAL unit type is 5, 7, or 8, t>
or an aggregation packet or fragmentation unit encapsulating any of these t <t>The I bit <bcp14>MUST</bcp14> be 1 when the NAL unit type is 5, 7,
ypes, or 8,
otherwise it MUST be 0. These ranges cover intra (IDR) frames as well as or an aggregation packet or fragmentation unit encapsulating any of these t
ypes;
otherwise, it <bcp14>MUST</bcp14> be 0. These ranges cover intra (IDR) fram
es as well as
critical parameter sets (SPS/PPS).</t> critical parameter sets (SPS/PPS).</t>
<t>The D bit MUST be 1 when the NAL unit header NRI field is 0, <t>The D bit <bcp14>MUST</bcp14> be 1 when the NAL unit header NRI fie ld is 0,
or an aggregation packet or fragmentation unit encapsulating only or an aggregation packet or fragmentation unit encapsulating only
NAL units with NRI=0, otherwise it MUST be 0. NAL units with NRI=0; otherwise, it <bcp14>MUST</bcp14> be 0.
The NRI=0 condition signals non-reference frames.</t> The NRI=0 condition signals non-reference frames.</t>
<t>The B bit can not be determined reliably from simple inspection of payload <t>The B bit cannot be determined reliably from simple inspection of p
headers, and therefore is determined by implementation-specific means. For exam ayload headers; therefore, it is determined by implementation-specific means. Fo
ple, internal codec interfaces may provide information to set this reliably.</t> r example, internal codec interfaces may provide information to set this reliabl
<figure> y.</t>
<artwork><![CDATA[ <artwork><![CDATA[
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
</section> </section>
<section>
<section title="VP8 LID Mapping"> <name>VP8 LID Mapping</name>
<t> The following shows the header extension for VP8 <xref target="RFC7741" <t> The following shows the header extension for VP8 <xref target="RF
/> that contains C7741"/> that contains
only temporal layer information.</t> only temporal layer information.</t>
<t> The S bit MUST match the correspondingly named bit in the VP8 payload des <t> The S bit <bcp14>MUST</bcp14> match the correspondingly named bit
criptor when PID=0, otherwise it MUST be 0.</t> in the VP8 payload descriptor when PID=0; otherwise, it <bcp14>MUST</bcp14> be 0
<t> The E bit MUST match the M bit in the RTP header. </t> .</t>
<t> The I bit MUST match the inverse of the P bit in the VP8 payload header.< <t> The E bit <bcp14>MUST</bcp14> match the M bit in the RTP header. <
/t> /t>
<t> The D bit MUST match the N bit in the VP8 payload descriptor.</t> <t> The I bit <bcp14>MUST</bcp14> match the inverse of the P bit in th
<t> The B bit MUST match the Y bit in the VP8 payload descriptor. Note: When e VP8 payload header.</t>
using temporally nested scalability structures as recommended in <xref target="s <t> The D bit <bcp14>MUST</bcp14> match the N bit in the VP8 payload d
calable-structures" />, the B bit and VP8 Y bit will always be 1 if TID is not 0 escriptor.</t>
, since it is always
<!-- [rfced] Please review whether any of the notes in this document
should be in the <aside> element. It is defined as "a container for
content that is semantically less important or tangential to the
content that surrounds it" (https://authors.ietf.org/en/rfcxml-vocabulary#aside)
.
-->
<t> The B bit <bcp14>MUST</bcp14> match the Y bit in the VP8 payload d
escriptor. Note: when using temporally nested scalability structures as recommen
ded in <xref target="scalable-structures"/>, the B bit and VP8 Y bit will always
be 1 if the TID is not 0 since it is always
possible to switch up to a higher temporal layer in such nested structure s.</t> possible to switch up to a higher temporal layer in such nested structure s.</t>
<t> TID and TL0PICIDX MUST match the correspondingly named fields in the VP8 <t>The TID and TL0PICIDX <bcp14>MUST</bcp14> match the correspondingly
payload descriptor. </t> named fields in the VP8 payload descriptor. </t>
<figure>
<artwork><![CDATA[ <artwork><![CDATA[
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX | | ID=? | L=2 |S|E|I|D|B| TID |0|0|0|0|0|0|0|0| TL0PICIDX |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]></artwork></figure> ]]></artwork>
</section> </section>
<section>
<section title="Future Codec LID Mapping"> <name>Future Codec LID Mapping</name>
<t>The RTP payload format specification for future video codecs SHOULD inc <t>The RTP payload format specification for future video codecs <bcp14
lude a section describing >SHOULD</bcp14> include a section describing
the LID mapping and TID mapping for the codec.</t> the LID mapping and TID mapping for the codec.</t>
</section> </section>
</section>
</section> <section>
<name>Signaling Information</name>
<section title="Signaling Information"> <t>The URI for declaring this header extension in an extmap attribute is
<t>The URI for declaring this header extension in an extmap attribute is
"urn:ietf:params:rtp-hdrext:framemarking". It does not contain any "urn:ietf:params:rtp-hdrext:framemarking". It does not contain any
extension attributes. </t> extension attributes. </t>
<t>An example attribute line in SDP:</t> <t>An example attribute line in SDP:</t>
<figure> <artwork><![CDATA[
<artwork><![CDATA[
a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking
]]></artwork></figure> ]]></artwork>
</section>
<section>
<name>Usage Considerations</name>
</section> <!--[rfced] May we update this sentence as follows for the ease of the
reader? Note that the introductory "when" phrase mentions a
single frame while the recommendation mentions plural frames:
please consider if further updates are necessary.
<section title="Usage Considerations"> Original:
<t>The header extension values MUST represent what is already in the RTP p When an RTP switch needs to discard a received video frame due to
ayload.</t> congestion control considerations, it is RECOMMENDED that it
<t> When an RTP switch needs to discard a received video frame due to cong preferably drop frames marked with the D (Discardable) bit set, or the
estion control considerations, highest values of TID and LID, which indicate the highest temporal and
it is RECOMMENDED that it preferably drop frames marked with the D (Discar spatial/quality enhancement layers, since those typically have fewer
dable) bit set, dependenices on them than lower layers.
or the highest values of TID and LID, which indicate the highest tempora
l and spatial/quality enhancement layers, since those typically have fewer depen
denices on them than lower layers.</t>
<t> When an RTP switch wants to forward a new video stream to a receiver,
it is RECOMMENDED to
select the new video stream from the first switching point with the I (Ind
ependent) bit set in all spatial layers and forward the same.
An RTP switch can request a media source to generate a switching point by
sending
Full Intra Request (RTCP FIR) as defined in <xref target="RFC5104" />, for
example. </t>
<section title="Relation to Layer Refresh Request (LRR)"> Perhaps A:
<t>Receivers can use the Layer Refresh Request (LRR) <xref target="I-D.i When an RTP switch needs to discard a received video frame due to
etf-avtext-lrr" /> congestion control considerations, it is RECOMMENDED that it drop:
- frames marked with the D (Discardable) bit set, or
-frames with the highest values of TID and LID (which indicate the
highest temporal and spatial/quality enhancement layers) since those
typically have fewer dependencies on them than lower layers.
Perhaps B (to upddate the sg/pl switch):
When an RTP switch needs to discard received video frames due to
congestion control considerations, it is RECOMMENDED that it drop:
- frames marked with the D (Discardable) bit set, or
-frames with the highest values of TID and LID (which indicate the
highest temporal and spatial/quality enhancement layers) since those
typically have fewer dependencies on them than lower layers.
-->
<t>The header extension values <bcp14>MUST</bcp14> represent what is alr
eady in the RTP payload.</t>
<t> When an RTP switch needs to discard a received video frame due to co
ngestion control considerations,
it is <bcp14>RECOMMENDED</bcp14> that it preferably drop frames marked wit
h the D (Discardable) bit set,
or the highest values of TID and LID, which indicate the highest tempora
l and spatial/quality enhancement layers, since those typically have fewer depen
dencies on them than lower layers.</t>
<!--[rfced] Please clarify what "and forward the same" means in this text.
Original:
When an RTP switch wants to forward a new video stream to a receiver,
it is RECOMMENDED to select the new video stream from the first
switching point with the I (Independent) bit set in all spatial
layers and forward the same.
-->
<t> When an RTP switch wants to forward a new video stream to a receiver
, it is <bcp14>RECOMMENDED</bcp14> to
select the new video stream from the first switching point with the I (In
dependent) bit set in all spatial layers and forward the same.
<!--[rfced] How may we update this text to more easily illustrate the
1:1 mapping between initialism and expansion?
Original:
... source to generate a switching point by sending Full Intra
Request (RTCP FIR) as defined in [RFC5104]...
Perhaps:
... source to generate a switching point by sending RTCP Full Intra
Request (FIR) as defined in [RFC5104]...
-->
An RTP switch can request that a media source generate a switching point b
y sending
Full Intra Request (RTCP FIR) as defined in <xref target="RFC5104"/>, for
example. </t>
<section>
<name>Relation to Layer Refresh Request (LRR)</name>
<t>Receivers can use the Layer Refresh Request (LRR) <xref target="RFC
9627"/>
RTCP feedback message RTCP feedback message
to upgrade to a higher layer in scalable encodings. The TID/LID values to upgrade to a higher layer in scalable encodings. The TID/LID values
and formats used in LRR messages MUST correspond to the same values an and formats used in LRR messages <bcp14>MUST</bcp14> correspond to the
d formats same values and formats
specified in <xref target="mandatory-scalable" />. specified in <xref target="mandatory-scalable"/>.
</t> </t>
<t>Because frame marking can only be used with temporally-nested strea
ms, <!--[rfced] In the following, are "layer" and "refreshes" redundant
with what LRR stands for? Please let us know if any updates are
necessary.
Original:
Because frame marking can only be used with temporally-nested
streams, temporal-layer LRR refreshes are unnecessary for frame-
marked streams.
As expanded it would be:
Because frame marking can only be used with temporally nested
streams, temporal-layer Layer Refresh Request (LRR) refreshes are
unnecessary for frame-marked streams.
-->
<t>Because frame marking can only be used with temporally nested strea
ms,
temporal-layer LRR refreshes are unnecessary for frame-marked stream s. temporal-layer LRR refreshes are unnecessary for frame-marked stream s.
Other refreshes can be detected based on the I bit being set for the s pecific spatial layers. Other refreshes can be detected based on the I bit being set for the s pecific spatial layers.
</t> </t>
</section> </section>
<section title="Scalability Structures" anchor="scalable-structures"> <section anchor="scalable-structures">
<name>Scalability Structures</name>
<t>The LID and TID information is most useful for fixed scalability st ructures, <t>The LID and TID information is most useful for fixed scalability st ructures,
such as nested hierarchical temporal layering structures, where each temporal such as nested hierarchical temporal layering structures, where each temporal
layer only references lower temporal layers or the base temporal lay er. layer only references lower temporal layers or the base temporal lay er.
The LID and TID information is less useful, or even not useful at al l, The LID and TID information is less useful, or even not useful at al l,
for complex, irregular scalability structures that do not conform to common, for complex, irregular scalability structures that do not conform to common,
fixed patterns of inter-layer dependencies and referencing structure s. fixed patterns of inter-layer dependencies and referencing structure s.
Therefore it is RECOMMENDED to use LID and TID information for Therefore, it is <bcp14>RECOMMENDED</bcp14> to use LID and TID infor mation for
RTP switch forwarding decisions only in the case of temporally neste d RTP switch forwarding decisions only in the case of temporally neste d
scalability structures, and it is NOT RECOMMENDED for other scalability structures, and it is <bcp14>NOT RECOMMENDED</bcp14> for other
(more complex or irregular) scalability structures.</t> (more complex or irregular) scalability structures.</t>
</section>
</section> </section>
</section> </section>
</section> <section>
<name>Security and Privacy Considerations</name>
<section title="Security Considerations and Privacy Considerations" > <t>In "<xref target="RFC3711" format="title"/>" <xref target="RFC3711"/>,
<t>In the Secure Real-Time Transport Protocol (SRTP) <xref target="RFC3711 RTP header extensions are
" />, RTP header extensions are authenticated and optionally encrypted <xref target="RFC9335"/>.
authenticated and optionally encrypted <xref target="RFC9335" />.
When unencrypted header extensions are used, some metadata is When unencrypted header extensions are used, some metadata is
exposed and visible to middle boxes on the network path, exposed and visible to middleboxes on the network path,
while encrypted media data and metadata in encrypted header extensions are not exposed.</t> while encrypted media data and metadata in encrypted header extensions are not exposed.</t>
<t>The primary utility of this specification is for RTP switches to make p roper media forwarding decisions. <t>The primary utility of this specification is for RTP switches to make p roper media forwarding decisions.
RTP switches are the SRTP peers of endpoints, so they can access encrypted header extensions, RTP switches are the SRTP peers of endpoints, so they can access encrypted header extensions,
but not end-to-end encrypted private media payloads. Other middle boxes on but not end-to-end encrypted private media payloads. Other middleboxes on
the network path can only access the network path can only access
unencrypted header extensions, since they are not SRTP peers.</t> unencrypted header extensions since they are not SRTP peers.</t>
<t>RTP endpoints that negotiate this extension should consider whether:
<t>RTP endpoints which negotiate this extension should consider whether th </t>
is video frame marking metadata <ul><li>this video frame marking metadata
needs to be exposed to the SRTP peer only, in which case the header extens needs to be exposed to the SRTP peer only, in which case the header extens
ion can be encrypted; or whether ion can be encrypted; or</li>
other middle boxes on the network path also need this metadata, for exampl <li>other middleboxes on the network path also need this metadata, for exa
e, to optimize packet drop decisions mple, to optimize packet drop decisions
that minimize media quality impacts, in which case the header extension ca n be unencrypted, if the endpoint that minimize media quality impacts, in which case the header extension ca n be unencrypted, if the endpoint
accepts the potential privacy leakage of this metadata. For example, it wo accepts the potential privacy leakage of this metadata.</li>
uld be possible to determine </ul>
<t>
For example, it would be possible to determine
keyframes and their frequency in unencrypted header extensions. This infor mation can often be obtained via keyframes and their frequency in unencrypted header extensions. This infor mation can often be obtained via
statistical analysis of encrypted data. For example, keyframes are usually much larger than other frames, statistical analysis of encrypted data. For example, keyframes are usually much larger than other frames,
so frame size alone can leak this in the absence of any unencrypted metada ta. However, unencrypted metadata so frame size alone can leak this in the absence of any unencrypted metada ta. However, unencrypted metadata
provides a reliable signal rather than a statistical probability; so endpo ints should take that into consideration provides a reliable signal rather than a statistical probability; so endpo ints should take that into consideration
to balance the privacy leakage risk against the potential benefit of optim ized media delivery when deciding to balance the privacy leakage risk against the potential benefit of optim ized media delivery when deciding
whether to negotiate and encrypt this header extension.</t> whether to negotiate and encrypt this header extension.</t>
</section> </section>
<section title="Acknowledgements"> <section>
<t>Many thanks to Bernard Aboba, Jonathan Lennox, Stephan Wenger, Dale Wor <name>IANA Considerations</name>
ley, and Magnus Westerlund for their inputs.</t> <t>This document defines a new extension URI listed in the "RTP Compact He
</section> ader Extensions" subregistry of the
"Real-Time Transport Protocol (RTP) Parameters" registry, according to th
<section title="IANA Considerations"> e following data:</t>
<t>This document defines a new extension URI to the RTP Compact HeaderExte
nsions sub-registry of the
Real-Time Transport Protocol (RTP) Parameters registry, according to the
following data:</t>
<t>Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo </t> <t>Extension URI: urn:ietf:params:rtp-hdrext:framemarkinginfo </t>
<t>Description: Frame marking information for video streams </t> <t>Description: Frame marking information for video streams </t>
<t>Contact: mzanaty@cisco.com </t> <t>Contact: mzanaty@cisco.com </t>
<t>Reference: RFC XXXX</t> <t>Reference: RFC 9626</t>
<t>Note to RFC Editor: please replace RFC XXXX with the number of this RFC .</t>
</section> </section>
</middle> </middle>
<back> <back>
<references title="Normative References"> <references>
<?rfc include="reference.RFC.2119"?>
<?rfc include="reference.RFC.8174"?> <!-- [rfced] Would you like the references to be alphabetized or left
<?rfc include="reference.RFC.8285"?> in their current order?
<?rfc include="reference.RFC.6184"?> -->
<?rfc include="reference.RFC.6190"?>
<?rfc include="reference.RFC.7741"?> <name>References</name>
<?rfc include="reference.RFC.7798"?> <references>
</references> <name>Normative References</name>
<references title="Informative References"> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2
<?rfc include="reference.RFC.7656"?> 119.xml"/>
<?rfc include="reference.RFC.7667"?> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8
<?rfc include="reference.RFC.6464"?> 174.xml"/>
<?rfc include="reference.RFC.3550"?> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8
<?rfc include="reference.RFC.3711"?> 285.xml"/>
<?rfc include="reference.RFC.5104"?> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6
<?rfc include="reference.RFC.8871"?> 184.xml"/>
<?rfc include="reference.RFC.9335"?> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6
<?rfc include="reference.I-D.ietf-avtext-lrr"?> 190.xml"/>
<?rfc include="reference.I-D.ietf-payload-vp9"?> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7
741.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7
798.xml"/>
</references>
<references>
<name>Informative References</name>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7
656.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7
667.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6
464.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3
550.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3
711.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5
104.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8
871.xml"/>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9
335.xml"/>
<!-- [I-D.ietf-avtext-lrr]; Companion document -->
<reference anchor="RFC9627" target="https://www.rfc-editor.org/info/rfc9627">
<front>
<title>The Layer Refresh Request (LRR) RTCP Feedback Message</title>
<author initials="J." surname="Lennox" fullname="Jonathan Lennox">
<organization>Vidyo, Inc.</organization>
</author>
<author initials="D." surname="Hong" fullname="Danny Hong">
<organization>Vidyo, Inc.</organization>
</author>
<author initials="J." surname="Uberti" fullname="Justin Uberti">
<organization>Google, Inc.</organization>
</author>
<author initials="S." surname="Holmer" fullname="Stefan Holmer">
<organization>Google, Inc.</organization>
</author>
<author initials="M." surname="Flodman" fullname="Magnus Flodman">
<organization>Google, Inc.</organization>
</author>
<date month="August" year="2024" />
</front>
<seriesInfo name="RFC" value="9627" />
<seriesInfo name="DOI" value="10.17487/RFC9627"/>
</reference>
<!-- [I-D.ietf-payload-vp9]; Companion document -->
<reference anchor="RFC9628" target="https://www.rfc-editor.org/info/rfc9628">
<front>
<title>RTP Payload Format for VP9 Video</title>
<author initials="J." surname="Uberti" fullname="Justin Uberti">
<organization>Google, Inc.</organization>
</author>
<author initials="S." surname="Holmer" fullname="Stefan Holmer">
<organization>Google, Inc.</organization>
</author>
<author initials="M." surname="Flodman" fullname="Magnus Flodman">
<organization>Google, Inc.</organization>
</author>
<author initials="D." surname="Hong" fullname="Danny Hong">
<organization>Google, Inc.</organization>
</author>
<author initials="J." surname="Lennox" fullname="Jonathan Lennox">
<organization>8x8, Inc. / Jitsi</organization>
</author>
<date month="August" year="2024" />
</front>
<seriesInfo name="RFC" value="9628"/>
<seriesInfo name="DOI" value="10.17487/RFC9628"/>
</reference>
</references>
</references> </references>
<section numbered="false">
<name>Acknowledgements</name>
<t>Many thanks to <contact fullname="Bernard Aboba"/>, <contact fullname
="Jonathan Lennox"/>, <contact fullname="Stephan Wenger"/>, <contact fullname=
"Dale Worley"/>, and <contact fullname="Magnus Westerlund"/> for their inputs.<
/t>
</section>
<!-- [rfced] We had the following questions related to abbreviations
used throughout the document.
a) Please note that we have expanded these abbreviations as follows on
first use. Please let us know any objections.
MCU - Multipoint Control Unit (per RFC 7667)
SRTP - Secure Real-time Transport Protocol
IDR - Instantaneous Decoding Refresh (per RFC 6184)
SDES - source description
NAL - Network Abstraction Layer
CRA - Clean Random Access
BLA - Broken Link Access
RAP - Random Access Point
AVC - Advanced Video Coidng (per RFC 6184)
SVC - Scalable Video Coding (per RFC 6190)
PACSI - Payload Content Scalability Information
NRI - Network Remote Identification
VPS - Video Parameter Set
SPS - Sequence Parameter Set
PPS - Picture Parameter Set
b) Please clarify if/how we may expand the following abbreviations:
VPX
PACI - is this intentionally different from PACSI?
c) Should "intra (IDR)" frames instead be "IDR intra-frames"? This
formation occurs twice in this document.
d) Please note that the following similar abbreviations appear to be
differently treated with regard to punctuation:
H264 (AVC)
H264-SVC
We have expanded the abbreviations on first use, but please let us
know if/how these should be made uniform with regard to parens and
hypheantion.
See also our question regarding H264 vs. H.264.
e) We note that in Section 3.3.2, "LayerID" is used. Later, in Figure
8, we see "LayerId" (lowercase d). May these be made consistent? If
so, which is preferred? Further, could these actually be made "LID"
instead (we see TID in both figures in question, which seems similar)?
Please review our related cluster-wide AQ prior to responding.
-->
<!--[rfced] We had the following questions related to terminology used
throughout the document.
a) Two questions about the header extension:
Should this RTP header extension appear using "Video" throughout? We
see both of the following forms.
Video Frame Marking RTP header extension vs. Frame Marking RTP header extension
Secondly, in the Abstract, we see:
Original:
This document describes a Video Frame Marking RTP header extension
used to convey information about video frames that is critical for
error recovery and packet forwarding in RTP middleboxes or network
nodes.
Is the use of the indefinite article "a" intentional ("a Video Frame
Marking RTP header extension")? This seems (possibly) contradictory
with the capitalization of the proper noun and use in Section 3 (are
there more types of Video Frame Marking RTP header extensions?).
Please review.
-->
<!-- [rfced] Please review the "Inclusive Language" portion of the
online Style Guide
<https://www.rfc-editor.org/styleguide/part2/#inclusive_language>
and let us know if any changes are needed. Updates of this
nature typically result in more precise language, which is
helpful for readers.
Note that our script did not flag any words in particular, but this
should still be reviewed as a best practice.
-->
</back> </back>
</rfc> </rfc>
 End of changes. 95 change blocks. 
432 lines changed or deleted 771 lines changed or added

This html diff was produced by rfcdiff 1.48.