By: --- (---.delete@this.redheron.com), June 16, 2022 7:41 pm
Room: Moderated Discussions
Wes Felter (wmf.delete@this.felter.org) on June 16, 2022 4:57 pm wrote:
> Anon (lkasdfj.delete@this.fjdksalf.com) on June 16, 2022 3:06 pm wrote:
>
> > For those scratching your heads, this dude's an armchair quarterback who believes the main reason
> > Apple's chip architects put in many hardware codec blocks was to accelerate single stream encode/decode
> > to a higher degree than a single block could on its own. He just *knows* it must be possible to
> > split the work into multiple threads so that multiple codec "cores" can collaborate on it - some
> > software codecs are multithreaded so obviously that must be possible with any hardware codec design
> > too. Therefore, according to him, if Apple's system can't do that, it must be a terrible and embarrassing
> > failure of communication between hardware and software departments.
>
> It is possible to split video on GOP boundaries and encode in parallel,
> even over a cluster. For example, here's what Netflix does:
>
> "The media content is broken up into smaller chunks. Each of the chunk is a portion of the
> video, usually about 30 seconds to a couple of minutes in duration. In a massively parallel
> way, we encode these video chunks independently on our servers. Once all the chunks have finished
> the encoding process, they're reassembled to become a single encoded video asset."
>
> Maybe this can't be hidden under the existing VideoToolbox API so it probably requires app
> changes. And maybe those changes haven't been made which (outside of Final Cut) isn't Apple's
> fault. But I can understand why customers might be confused or frustrated if they are waiting
> on a single-stream encode workload which isn't faster on higher-end Macs.
Even something apparently as simple and basic as video encoding remains under substantial construction because Apple are doing so much that is novel.
For example a substantial part of video encoding (certainly, even still, for all the MPEG codecs; I would guess so also for ProRes) is knowing the optical flow between frames, something that has traditionally been expensive to compute. If you know optical flow reliably, you can in turn use this to drive many other traditionally expensive decisions like how to break up the tree of blocks to sub-blocks in a frame.
Now, something that was clear if you viewed the the right WWDC videos, is that Apple have a new Optical Flow net (actually I believe it's version 3 of this particular net) that's substantially better than what went before.
Point is, to simply claim "the video can be broken up via scheme X because that's what company Y does" is not helpful. It may be true that the video can be broken up that way, but there may also be very good reasons why Apple is doing things differently.
For example the Netflix scheme you describe may require vast amounts of physical DRAM, to an extent that is acceptable to Netflix (and to Apple for their internal encoding) but is considered not acceptable, at least right now, for the bulk of their users.
Alternatively it may simply be that Apple are working through these things one step at a time based on a schedule of what they believe is most important (probably informed by customer discussion) rather than what the internet believes is important.
If very interesting things can be done in terms of encoding by use of the NPU, that is clearly worth pursuing, but it also may mean that other interesting things worth doing (like making full use of multiple media encode engines in all circumstances) may have to wait a year or two. Apple aren't *just* trying to figure out all the ways to accelerate ProRes; they're also getting ready for VVC; they're also constantly working on the quality front (issues of HDR and higher frame rates); they're also working on the capture side (especially capture quality so as to help amateurs); they're also working on the networking side (eg faster random access).
It's fine to say that "they should do better"but I don't see anyone else running any faster in this space. Handbrake, for example, is a great app I use frequently, but I don't see them exploring how NPU (or even GPU) could be used to improve encoding; they're great at endlessly refining something that's already been done, not so great at exploring completely new ways of doing things.
Things just take time, especially when there's a lot of original work going on.
In other contexts (like GPU) one sees the same thing. Concepts were added to Metal two or three years ago; and it is only this year that we see higher-level APIs now exploiting those additions so as to make them available to non-specialists.
> Anon (lkasdfj.delete@this.fjdksalf.com) on June 16, 2022 3:06 pm wrote:
>
> > For those scratching your heads, this dude's an armchair quarterback who believes the main reason
> > Apple's chip architects put in many hardware codec blocks was to accelerate single stream encode/decode
> > to a higher degree than a single block could on its own. He just *knows* it must be possible to
> > split the work into multiple threads so that multiple codec "cores" can collaborate on it - some
> > software codecs are multithreaded so obviously that must be possible with any hardware codec design
> > too. Therefore, according to him, if Apple's system can't do that, it must be a terrible and embarrassing
> > failure of communication between hardware and software departments.
>
> It is possible to split video on GOP boundaries and encode in parallel,
> even over a cluster. For example, here's what Netflix does:
>
> "The media content is broken up into smaller chunks. Each of the chunk is a portion of the
> video, usually about 30 seconds to a couple of minutes in duration. In a massively parallel
> way, we encode these video chunks independently on our servers. Once all the chunks have finished
> the encoding process, they're reassembled to become a single encoded video asset."
>
> Maybe this can't be hidden under the existing VideoToolbox API so it probably requires app
> changes. And maybe those changes haven't been made which (outside of Final Cut) isn't Apple's
> fault. But I can understand why customers might be confused or frustrated if they are waiting
> on a single-stream encode workload which isn't faster on higher-end Macs.
Even something apparently as simple and basic as video encoding remains under substantial construction because Apple are doing so much that is novel.
For example a substantial part of video encoding (certainly, even still, for all the MPEG codecs; I would guess so also for ProRes) is knowing the optical flow between frames, something that has traditionally been expensive to compute. If you know optical flow reliably, you can in turn use this to drive many other traditionally expensive decisions like how to break up the tree of blocks to sub-blocks in a frame.
Now, something that was clear if you viewed the the right WWDC videos, is that Apple have a new Optical Flow net (actually I believe it's version 3 of this particular net) that's substantially better than what went before.
Point is, to simply claim "the video can be broken up via scheme X because that's what company Y does" is not helpful. It may be true that the video can be broken up that way, but there may also be very good reasons why Apple is doing things differently.
For example the Netflix scheme you describe may require vast amounts of physical DRAM, to an extent that is acceptable to Netflix (and to Apple for their internal encoding) but is considered not acceptable, at least right now, for the bulk of their users.
Alternatively it may simply be that Apple are working through these things one step at a time based on a schedule of what they believe is most important (probably informed by customer discussion) rather than what the internet believes is important.
If very interesting things can be done in terms of encoding by use of the NPU, that is clearly worth pursuing, but it also may mean that other interesting things worth doing (like making full use of multiple media encode engines in all circumstances) may have to wait a year or two. Apple aren't *just* trying to figure out all the ways to accelerate ProRes; they're also getting ready for VVC; they're also constantly working on the quality front (issues of HDR and higher frame rates); they're also working on the capture side (especially capture quality so as to help amateurs); they're also working on the networking side (eg faster random access).
It's fine to say that "they should do better"but I don't see anyone else running any faster in this space. Handbrake, for example, is a great app I use frequently, but I don't see them exploring how NPU (or even GPU) could be used to improve encoding; they're great at endlessly refining something that's already been done, not so great at exploring completely new ways of doing things.
Things just take time, especially when there's a lot of original work going on.
In other contexts (like GPU) one sees the same thing. Concepts were added to Metal two or three years ago; and it is only this year that we see higher-level APIs now exploiting those additions so as to make them available to non-specialists.
Topic | Posted By | Date |
---|---|---|
M2 benchmarks | - | 2022/06/15 12:27 PM |
You mean "absurd ARM"? ;-) (NT) | Rayla | 2022/06/15 02:18 PM |
It has PPC heritage :) (NT) | anon2 | 2022/06/15 02:55 PM |
Performance per clock | — | 2022/06/15 03:05 PM |
Performance per single clock cycle | hobold | 2022/06/16 05:12 AM |
Performance per single clock cycle | dmcq | 2022/06/16 06:59 AM |
Performance per single clock cycle | hobold | 2022/06/16 07:42 AM |
Performance per single clock cycle | Doug S | 2022/06/16 09:39 AM |
Performance per single clock cycle | hobold | 2022/06/16 12:36 PM |
More like cascaded ALUs | Paul A. Clayton | 2022/06/16 01:13 PM |
SuperSPARC ALU | Mark Roulo | 2022/06/16 01:57 PM |
LEA | Brett | 2022/06/16 02:52 PM |
M2 benchmarks | DaveC | 2022/06/15 03:31 PM |
M2 benchmarks | anon2 | 2022/06/15 05:06 PM |
M2 benchmarks | — | 2022/06/15 07:21 PM |
M2 benchmarks | --- | 2022/06/15 07:33 PM |
M2 benchmarks | Adrian | 2022/06/15 10:11 PM |
M2 benchmarks | Eric Fink | 2022/06/16 12:07 AM |
M2 benchmarks | Adrian | 2022/06/16 02:09 AM |
M2 benchmarks | Eric Fink | 2022/06/16 05:46 AM |
M2 benchmarks | Adrian | 2022/06/16 09:27 AM |
M2 benchmarks | --- | 2022/06/16 10:08 AM |
M2 benchmarks | Adrian | 2022/06/16 11:43 AM |
M2 benchmarks | Dummond D. Slow | 2022/06/16 01:03 PM |
M2 benchmarks | Adrian | 2022/06/17 03:34 AM |
M2 benchmarks | Dummond D. Slow | 2022/06/17 07:35 AM |
M2 benchmarks | none | 2022/06/16 10:14 AM |
M2 benchmarks | Adrian | 2022/06/16 12:44 PM |
M2 benchmarks | Eric Fink | 2022/06/17 02:05 AM |
M2 benchmarks | Anon | 2022/06/16 06:28 AM |
M2 benchmarks => MT | Adrian | 2022/06/16 11:04 AM |
M2 benchmarks => MT | Anon | 2022/06/18 02:38 AM |
M2 benchmarks => MT | Adrian | 2022/06/18 03:25 AM |
M2 benchmarks => MT | --- | 2022/06/18 10:14 AM |
M2 benchmarks | Doug S | 2022/06/16 09:49 AM |
M2 Pro at 3nm | Eric Fink | 2022/06/17 02:51 AM |
M2 benchmarks | Sean M | 2022/06/16 01:00 AM |
M2 benchmarks | Doug S | 2022/06/16 09:56 AM |
M2 benchmarks | joema | 2022/06/16 01:28 PM |
M2 benchmarks | Sean M | 2022/06/16 02:53 PM |
M2 benchmarks | Doug S | 2022/06/16 09:19 PM |
M2 benchmarks | Doug S | 2022/06/16 09:21 PM |
M2 benchmarks | --- | 2022/06/16 10:53 PM |
M2 benchmarks | Doug S | 2022/06/17 12:37 AM |
Apple’s STEM Ambitions | Sean M | 2022/06/17 04:18 AM |
Apple’s STEM Ambitions | --- | 2022/06/17 09:33 AM |
Mac Pro with Nvidia H100 | Tony Wu | 2022/06/17 06:37 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/17 10:37 PM |
Mac Pro with Nvidia H100 | Tony Wu | 2022/06/18 06:49 AM |
Mac Pro with Nvidia H100 | Dan Fay | 2022/06/18 07:40 AM |
Mac Pro with Nvidia H100 | Anon4 | 2022/06/20 09:04 AM |
Mac Pro with Nvidia H100 | Simon Farnsworth | 2022/06/20 10:09 AM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:32 AM |
Mac Pro with Nvidia H100 | Simon Farnsworth | 2022/06/20 11:20 AM |
Mac Pro with Nvidia H100 | Anon4 | 2022/06/20 04:16 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:19 AM |
Mac Pro with Nvidia H100 | me | 2022/06/18 07:17 AM |
Mac Pro with Nvidia H100 | Tony Wu | 2022/06/18 09:28 AM |
Mac Pro with Nvidia H100 | me | 2022/06/19 10:08 AM |
Mac Pro with Nvidia H100 | Dummond D. Slow | 2022/06/19 10:51 AM |
Mac Pro with Nvidia H100 | Elliott H | 2022/06/19 06:39 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/19 06:16 PM |
Mac Pro with Nvidia H100 | --- | 2022/06/19 06:56 PM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/19 11:00 PM |
Mac Pro with Nvidia H100 | --- | 2022/06/20 06:25 AM |
Mac Pro with Nvidia H100 | anon5 | 2022/06/20 08:41 AM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/20 07:22 PM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/20 07:13 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:19 PM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/22 12:06 AM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/22 09:18 AM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:38 AM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/20 07:17 PM |
Mac Pro with Nvidia H100 | Dummond D. Slow | 2022/06/20 05:46 PM |
Apple’s STEM Ambitions | noko | 2022/06/17 07:32 PM |
Quick aside: huge pages also useful for nested page tables (virtualization) (NT) | Paul A. Clayton | 2022/06/18 06:28 AM |
Quick aside: huge pages also useful for nested page tables (virtualization) | --- | 2022/06/18 10:16 AM |
Not this nonsense again | Anon | 2022/06/16 03:06 PM |
Parallel video encoding | Wes Felter | 2022/06/16 04:57 PM |
Parallel video encoding | Dummond D. Slow | 2022/06/16 07:16 PM |
Parallel video encoding | Wes Felter | 2022/06/16 07:49 PM |
Parallel video encoding | --- | 2022/06/16 07:41 PM |
Parallel video encoding | Dummond D. Slow | 2022/06/16 10:08 PM |
Parallel video encoding | --- | 2022/06/16 11:03 PM |
Parallel video encoding | Dummond D. Slow | 2022/06/17 07:45 AM |
Not this nonsense again | joema | 2022/06/16 09:13 PM |
Not this nonsense again | --- | 2022/06/16 11:18 PM |
M2 benchmarks-DDR4 vs DDR5 | Per Hesselgren | 2022/06/16 01:09 AM |
M2 benchmarks-DDR4 vs DDR5 | Rayla | 2022/06/16 08:12 AM |
M2 benchmarks-DDR4 vs DDR5 | Doug S | 2022/06/16 09:58 AM |
M2 benchmarks-DDR4 vs DDR5 | Rayla | 2022/06/16 11:58 AM |