Fragmented MP4 (fMP4) Support for HLS v4+: Let’s Shed Some Light on it
Starting with roughly two years ago Apple announced Support for Fragmented MP4s (fMP4) over HLS and since it has been hype. As in our previous Article outlining the different HLS Versions, fMP4 Support for HLS can be achieved starting from HLS v4 or (depending on the fMP4 flavour, see below).
So why so much hype about it and what does it mean specifically for the Industry?
Especially, how can you technically prepare and playback HLS based on fMP4? Just like years ago with the announced Support for HLS v4 we embarked on a journey to bring you, thirsty Reader, a consolidated technical feedback on all this.
We decided to make this excursus in the form of a Q&A between a non-existing Client and us, to the very effect of keeping this technical and useful for anyone out there wanting to learn more around this topic.
As of today, no single place (that we could find) on the Internet contains an extensive, technical, non-proprietary discussion about fMP4 over HLS so until that happens, here you will find a tasty appetizer.
First, a short recap on what HLS V4 introduces:
- Audio and Video can be specified separately (e.g. unmuxed together), introducing rendition groups.
- Introducing byte-ranges to access the content from a single file.
- Allowing special playlists containing only I-frames (i.e. access points).
These three main points, if you are familiar with HLS v3, do away with some – if not most – of HLS’ main constraints: the traditional issue of having to multiplex the same clip, if with different audio tracks and video renditions; the need to have a long list of different .ts chunks (this is especially relevant to this Article); the rigidness of pre-created I-frames playlists.
Ready? Steady? Go!
- We have an application that V4 (or higher) could address with the ability to play sections from a clip using the EXT-X-BYTERANGE parameter but I wanted to know what the support for this would be amongst players.
- Also, first HLS implementations supported TS containers but later support MP4 – I’m not really clear on this and either on the difference between MP4 and FMP4 and how to create the latter from the former.
1. EXT-X-BYTERANGE and Player Support
It’s very hard to jot down a list of Players, Browsers and OSes supporting a specific HLS version, given that there are so many out there and updates happen so often for all of them. Of course, if you have any specific player in mind for your use case we can happily dig this up for you; as we stand, we believe it is fair to say that Support for HLS v4 is pretty broad. Apple and its iOS 10 (macOS v10.12 or later, iOS 10 or later, and tvOS 10 or later to be more specific) finally also opened up to supporting EXT-X-BYTERANGE in a HLS Manifest (EXT-X-BYTERANGE is likely the main reason why you would upgrade from HLSv3 to v4) therefore, for as long as the .m3u8 Manifest is encoded correctly, we would bet 6 HLS Players out of 10 out there would support it (and that’s including open source ones). After all, HLS v4 is “just” 8 years old 🙂 …
2. fMP4 VS MP4 VS TS and what this means for the Industry
This point is tightly intertwined with the above one, as in order to support fMP4s on HLS the Version has to be at least 4. Why? Well, because in order to use fMP4s you are going to use the EXT-X-BYTERANGE (if with a single .mp4) and/or the EXT-X-MAP (if with a fragmented mp4). A little background first: up until now, Mpeg-DASH had an edge over HLS as you could use the former to interpret the latter (as well as other formats) but not vice-versa; now, well, you can also use HLS to interpret Mpeg-DASH (for as long as with MP4 containers) chunks.
As to the fMP4 VS TS part, the advantage is pretty clear as you don’t have to multiplex data over and over again (you can, actually you should – depending on the approach – split video tracks and audio tracks just like with Mpeg-DASH) and you can maximize your cache hit ratio (using CDNs for instance) because of the interoperability of the data across HLS and Mpeg-DASH for instance.
As to the fMP4 VS MP4 part instead, there literally is no difference! fMP4 is a Marketing friendly term to define nothing else than a .mp4 ( Part 12 of the MPEG-4 standard ) declared as fragment or fragmented.
By saying this we mean that there are two main approaches to packaging – and playing back – fragmented MP4s on HLS:
a) Single .mp4 file
See the manifest snippet below:
Please disregard the Version 7, as Version 4 is sufficient for the above parameters; what we do here is we use the same single .mp4 file “main.mp4” and we play it back as chunks thanks to the EXT-X-BYTERANGE. Intelligent use of byte ranges and offsets lets the player into thinking these are all split chunks, when in reality it’s the same mp4 (which, if the player doesn’t support HLSv4, could be used for Progressive Download a.k.a. “Pseudostreaming” as a fallback mechanism).
b) Chunked .mp4 file
See this other manifest snippet now:
Here, m4s are the actual fragmented mp4 and “init.mp4” (much smaller than the mp4 in scenario a) ) contains the information needed to parse them. There’s a great Article on how to package these with ffmpeg and we recommend you to have a read at it for a deeper dive into it.
I have a few more questions:
- For HLS, I was under the impression that the media had to be either TS or fMP4. Your first example below shows support for a single MP4 file.
- How are the BYTRANGE values calculated? These would have to point to the start of an i-Frame, correct?
- For the fMP4 version below, you refer to an INIT.MP4 file – what does this look like? Are the fileSequenceX.m4s files each little MP4 files in their own right or are they pointers to a single MP4 file?
- I created a DASH file using AWS ElasticTranscoder and was expecting to see some kind of playlist or pointers but the XML just contains all the files of different bitrates at full duration – am I missing something? It still seems to work, it switches from one profile to another.
- Our application is that we have a system that archives off-air recordings for multiple channels in 5-minute chunks as MP4 files at a single bitrate.
There is a client application that runs on Internet Explorer using Windows Media. The application is legacy and needs to continue to be supported but we also need to be able to produce ABR formats for other devices to allow access on phones etc. In the legacy application, the user can select a time range to view. The requested media is presented as an ASX manifest where each entry contains a chunk URL, STARTTIME and DURATION (both expressed as “hh:mm:ss.sss”). One of our issues is to be able to identify STARTTIME to coincide with an i-Frame but, if we assume we can do that, how would we be able to convert these values to BYTERANGE values for HLS?
1) You are correct in that media can be TS or fMP4 (.m4s, just like with Mpeg-DASH); however, when encoding a fMP4 stream if you set the -hls_flags option to single_file, the result will be a single .mp4 file with precalculated byteranges and offsets. Whilst in case of the m4s chunks you will need the “init.mp4” – from our previous example – to find information on how to parse all .m4s , in the latter example you will see that the #EXT-X-MAP is still there, but then thanks to the EXT-X-BYTERANGE all you do in the chunks is point to a sub-range of itself. For as long as you encode the files correctly, we don’t see any conflict arising in either scenario and you can opt for either based on your very needs.
2) A packager in this scenario would take care of it for you; very basically, if you set the chunks’ duration to 5 seconds (for instance), the byterange and offset will be applied to all chunks based on it by the packager (“chunk 1: byterange 0-x, offset 0; chunk 2: byterange x-2x, offset y and so on” where x equals the amount of data needed for 5s worth of media chunk, and we know it can vary from chunk to chunk).
3) This tiny init.mp4 file isn’t a mp4 file in its own right, as it doesn’t contain the media but rather the information on how to play the media (contained in the m4s). The m4s are also not mp4s in their own right as they contain the media but no info on how to be parsed (without init.mp4), unlike ts chunks. These are pointers to the mp4 insofar as the player needs info on how to parse them.
4) Could you share the commands/options used and ideally also a public location where to check this stream ourselves? We’d be glad to doublecheck this for you.
5) 5 Minutes chunks MP4 files in the legacy application and the fact that you also want to transcode these off-air recordings (for ABR purposes) make us think, in simple terms, that you are also open to re-encoding (transmuxing) these. If this is the case, then it’s easier. We don’t see why not use Software like ffmpeg to also generate playlists and i-Frames playlists of these chunked .mp4s (if we read between the lines correctly, you would lean towards a single .mp4 resulting file Solution).
The Software in question will automatically – and with no room for error – produce the X-EXT-BYTERANGE and Offsets needed by you, which you can then merge into Master Playlists (which will reference all byteranges and offsets from all 5 minutes long playlists, just like the ASX manifest does today).
Bear in mind that I-Frame Playlists on HLS V4+ have several Features and, especially with fMP4, you can produce a customized one (whereas with ts Chunks it’s generated automatically when you package it) and define custom target durations, or even integrate so-called “trick play” and use sync samples only. That is to say, that letting the STARTTIME coincide with an i-frame is entirely in your hands when/if re-packaging your source .mp4s .
You asked about the commands I used for the ElasticTranscoder – I used a Lambda function to do this. See the Python code. I wanted to be able to use the same Lambda function for different uses so I just created different JSON files containing the commands and just manually change in the create_job() command sent to ElasticTranscode.
The ones of interest in this discussion are the HLSOutputs and DASHOutputs (dict variables) but as you can see I just used presets for the different bitrates). In addition to this, I set up different transcode pipelines in ElasticTranscoder and different buckets for the input and output video – the input video buckets are what trigger the Lambda function so the transcode pipeline ID needs to be changed also. I’m sure the code could be a lot more elegant but I was just playing around. Please also see the HLS and DASH manifests created by Elastic Transcoder.
Oh and I also burned in the bitrate into the different profiles so I could see which one is being served. It seems to work for HLS as it starts with the lower bitrate and changes after the first segment but for DASH I always only seem to get the highest profile (maybe this is how DASH works?).
First of all, thanks for the exhaustive information; we’ll kick this all off by confirming that the DASH playback behavior, unlike HLS, is to run a raw bandwidth check on the client side at startup time and then, based on that and as declared in its Manifest file, to start with the appropriate bitrate. It looks like you had enough bandwidth – when testing on your end – to start the stream by the higher or highest bitrates.
Now down to the python script and manifest files: we like the python script and the overall serverless approach to your transcoding pipeline, and we appreciate that you used available presets when packaging the demo HLS and DASH channel. This is why you ended up producing a HLS playlist with .ts chunks instead of fMP4s .
In theory, all you should do is to create a custom preset, forking off from the preset that you used by specifying fmp4 – instead of ts – as a container. As always however, the devil lies in the details and it would appear that Elastic Transcoder by AWS at present doesn’t support doing this whereas, with regards to AWS Elemental Media Convert instead, we have come across mixed reports of it supporting HLS fMP4 or not.
We would encourage you to try this out on your end as things may well have changed since the last contributions to these threads, and to be honest even Elemental Media Package announced Support for CMAF well over a year ago, so we think that this shall work by now. What’s more, by switching to HLS using MP4 containers, you shall receive a single file for all media segments as you are probably looking forward to.
It’s a jungle out there when it comes to the Software to run for encoding HLS with MP4, please also bear in mind options like FFMpeg that have historically supported fMP4 for HLS.
Well what a read was that, wasn’t it? Moreover, we barely scratched the tip of the iceberg of fMP4 Support for HLS.
We presented a short “Executive” overview of why to even look into fMP4 over HLS these “early” days, its main advantages over HLS V3 and below, different approaches about encoding it and playing it back, main differences versus traditional TS chunks, how to make this work with existing, legacy, mp4 based streaming Applications, how to deal with custom I-Frame playlists, how an AWS based serverless application for it would look like and current state of support for it on AWS’ official Products (AWS Elastic Transcoder as well as AWS Elemental MediaPackage / MediaConvert ) as well as an EC2 + open source ffmpeg based approach; finally, we endeavored to provide you with links to the most disparate authoritative sources our feedback’s based on.
Of course, when you set to do that in a single Article, you end up with a “10000 feet high” overview of such a broad topic, but please bear in mind that this is our daily bread and butter; so, if this Article sparked some interest in you, we heartily invite you to contact us and let us consult you for more.