The work in this thesis focuses on the transport of panoramic and omnidirectional video formats that correspond to videos of a wide-angle coverage up to 360°. The transmission of such video formats sets stringent requirements on the video content generation, the transmission chain and the receiver side, as the high resolutions involved would lead to high bitrates if they were transmitted as traditional video and the immersive nature of the discussed applications require low-latency reactions.
The work discusses aspects related to panoramic video streaming and omnidirectional video streaming, aka. 360° video streaming. In general, the techniques and solutions presented within this thesis aim at reducing the transmitted bitrate of the panoramic or omnidirectional video compared to traditional streaming methods, without reducing the effective video quality.
In Chapter 1, an overview of panoramic video streaming and 360° video streaming is given. In this chapter, the state-of-the-art is summarized. The main idea is to divide the content into several tiles and to take into account their relevance for transmission. That is, in case of panoramic video streaming, to transmit only tiles that lie within the region of interest (RoI) of the user. In case of 360° video streaming, tiles are available at several resolutions. For 360° video streaming cases, all tiles of the whole omnidirectional video are transmitted, while their quality is decided based on the viewing orientation of the user.
Tiles within the field of view of the user are transmitted in high resolution while the rest are transmitted in low resolution. For both, tile-based panoramic video streaming and tile-based 360° video streaming, a quick reaction to users' interaction is also required to adapt the subset and/or qualities of the downloaded tiles.
Chapter 2 presents a parametric model that can be used to derive the optimal tile sizes into which a panoramic video is split. The model is derived based on the spatio-temporal characteristics of a specific video sequence. Two metrics, namely the spatial activity and temporal activity metrics, are used to predict the efficiency penalty of tiling the content into several tiles and based on it, the model determines the most efficient tile sizes for a streaming service, i.e. the tile sizes that minimize the transmitted bitrate.
Chapter 3 describes a bitstream processing method that allows combining independently encoded bitstreams into a single bitstream, thereby enabling the usage of a single hardware decoder for tile-based streaming. In order to be able to perform the described process, the original bitstreams must fulfil a set of constraints as described in the chapter.
The constraints, discussed within the chapter, apply to High Efficiency Video Coding (HEVC) Standard and its layered extensions Scalable High Efficiency Video Coding (SHVC) and Multiview High Efficiency Video Coding (MV-HEVC). With the described technique, the usage of so-called open GOP coding configurations is possible in case of SHVC, which allows for a more efficient coding of the video content interactive panoramic video services.
In Chapter 4, an algorithm is presented that has been developed for streaming of interactive panoramic video over HTTP. The main focus of the chapter is on a Dynamic Adaptive Streaming Over HTTP (DASH) Client that reacts with a very low-latency to users' interaction and adapts their decision on which tiles to download depending on the RoI of a user. Typically, DASH rate-adaptation algorithms are based on a buffer at the client side of several seconds, which helps overcoming throughput variations in the network. Such a solution is not possible for the described scenario, since changes on the RoI of a user need to be quickly reflected on the downloaded tiles. Therefore, a DASH rate-adaptation algorithm is described within Chapter 4 that works for the required small buffers.
In Chapter 5, a method is described that aims at optimizing the RAP period with which a tiled 360° video is encoded. On the one side, a shorter RAP period allows for a quicker adaptation of the downloaded tiles to user interaction. On the other side, the shorter the RAP period is, the more is reduced the coding efficiency of a video. A model is derived that can be used to determine the RAP period which a given 360° video should be encoded with so that the transmitted bitrate is minimized, while ensuring that most of the users watch the high-resolution video for most of the streaming session.
Chapter 6 provides an analysis of the impact of the end-to-end delay on the visual fidelity of the content watched by viewers. The chapter describes a prediction algorithm that improves the performance of the tile-based streaming system and maintains its gains for up to 1 second of end-to-end delay in the transmission chain. The developed algorithm consists of a prediction model based on the current viewing orientation of a user combined with a velocity-based unequal quality distribution mechanism.
Thesis
Dissertationsschrift
2021
Technische Universität Berlin
Auflage
Sprache
Verlagsort
Zielgruppe
Für höhere Schule und Studium
Für die Erwachsenenbildung
Für Beruf und Forschung
Produkt-Hinweis
Maße
Höhe: 21 cm
Breite: 14.8 cm
Gewicht
ISBN-13
978-3-96729-146-9 (9783967291469)
Schweitzer Klassifikation