By: Dummond D. Slow (mental.delete@this.protozoa.us), June 16, 2022 7:16 pm
Room: Moderated Discussions
Wes Felter (wmf.delete@this.felter.org) on June 16, 2022 4:57 pm wrote:
> Anon (lkasdfj.delete@this.fjdksalf.com) on June 16, 2022 3:06 pm wrote:
>
> > For those scratching your heads, this dude's an armchair quarterback who believes the main reason
> > Apple's chip architects put in many hardware codec blocks was to accelerate single stream encode/decode
> > to a higher degree than a single block could on its own. He just *knows* it must be possible to
> > split the work into multiple threads so that multiple codec "cores" can collaborate on it - some
> > software codecs are multithreaded so obviously that must be possible with any hardware codec design
> > too. Therefore, according to him, if Apple's system can't do that, it must be a terrible and embarrassing
> > failure of communication between hardware and software departments.
>
> It is possible to split video on GOP boundaries and encode in parallel,
> even over a cluster. For example, here's what Netflix does:
>
> "The media content is broken up into smaller chunks. Each of the chunk is a portion of the
> video, usually about 30 seconds to a couple of minutes in duration. In a massively parallel
> way, we encode these video chunks independently on our servers. Once all the chunks have finished
> the encoding process, they're reassembled to become a single encoded video asset."
>
> Maybe this can't be hidden under the existing VideoToolbox API so it probably requires app
> changes. And maybe those changes haven't been made which (outside of Final Cut) isn't Apple's
> fault. But I can understand why customers might be confused or frustrated if they are waiting
> on a single-stream encode workload which isn't faster on higher-end Macs.
Note that this might require prohibitive amounts of memory to store the video. IF you had 10bit 4:2:2 YUV (16 bits per luma, 16 bits per chroma U&V) colorspace, decoded video consumes almost 8 GB per 240 frame (10 secs) GOP at 4K. And to use 4 engines in parallel, you would use four such buffers.
Forget using "couple of minutes" long GOPs. This way. Such long GOPs would be major mess for editing video anyway. Actually, all non-intra videos with GOPs are trouble already?
It's true that such long GOPs are probably less common in pro/braodcast encoding. But still, you don't want to consume that many gigabytes on Apple M1/M2 devices that have just 8-24 GB of total RAM. Intra formats though? Doable there I guess.
> Anon (lkasdfj.delete@this.fjdksalf.com) on June 16, 2022 3:06 pm wrote:
>
> > For those scratching your heads, this dude's an armchair quarterback who believes the main reason
> > Apple's chip architects put in many hardware codec blocks was to accelerate single stream encode/decode
> > to a higher degree than a single block could on its own. He just *knows* it must be possible to
> > split the work into multiple threads so that multiple codec "cores" can collaborate on it - some
> > software codecs are multithreaded so obviously that must be possible with any hardware codec design
> > too. Therefore, according to him, if Apple's system can't do that, it must be a terrible and embarrassing
> > failure of communication between hardware and software departments.
>
> It is possible to split video on GOP boundaries and encode in parallel,
> even over a cluster. For example, here's what Netflix does:
>
> "The media content is broken up into smaller chunks. Each of the chunk is a portion of the
> video, usually about 30 seconds to a couple of minutes in duration. In a massively parallel
> way, we encode these video chunks independently on our servers. Once all the chunks have finished
> the encoding process, they're reassembled to become a single encoded video asset."
>
> Maybe this can't be hidden under the existing VideoToolbox API so it probably requires app
> changes. And maybe those changes haven't been made which (outside of Final Cut) isn't Apple's
> fault. But I can understand why customers might be confused or frustrated if they are waiting
> on a single-stream encode workload which isn't faster on higher-end Macs.
Note that this might require prohibitive amounts of memory to store the video. IF you had 10bit 4:2:2 YUV (16 bits per luma, 16 bits per chroma U&V) colorspace, decoded video consumes almost 8 GB per 240 frame (10 secs) GOP at 4K. And to use 4 engines in parallel, you would use four such buffers.
Forget using "couple of minutes" long GOPs. This way. Such long GOPs would be major mess for editing video anyway. Actually, all non-intra videos with GOPs are trouble already?
It's true that such long GOPs are probably less common in pro/braodcast encoding. But still, you don't want to consume that many gigabytes on Apple M1/M2 devices that have just 8-24 GB of total RAM. Intra formats though? Doable there I guess.
Topic | Posted By | Date |
---|---|---|
M2 benchmarks | - | 2022/06/15 12:27 PM |
You mean "absurd ARM"? ;-) (NT) | Rayla | 2022/06/15 02:18 PM |
It has PPC heritage :) (NT) | anon2 | 2022/06/15 02:55 PM |
Performance per clock | — | 2022/06/15 03:05 PM |
Performance per single clock cycle | hobold | 2022/06/16 05:12 AM |
Performance per single clock cycle | dmcq | 2022/06/16 06:59 AM |
Performance per single clock cycle | hobold | 2022/06/16 07:42 AM |
Performance per single clock cycle | Doug S | 2022/06/16 09:39 AM |
Performance per single clock cycle | hobold | 2022/06/16 12:36 PM |
More like cascaded ALUs | Paul A. Clayton | 2022/06/16 01:13 PM |
SuperSPARC ALU | Mark Roulo | 2022/06/16 01:57 PM |
LEA | Brett | 2022/06/16 02:52 PM |
M2 benchmarks | DaveC | 2022/06/15 03:31 PM |
M2 benchmarks | anon2 | 2022/06/15 05:06 PM |
M2 benchmarks | — | 2022/06/15 07:21 PM |
M2 benchmarks | --- | 2022/06/15 07:33 PM |
M2 benchmarks | Adrian | 2022/06/15 10:11 PM |
M2 benchmarks | Eric Fink | 2022/06/16 12:07 AM |
M2 benchmarks | Adrian | 2022/06/16 02:09 AM |
M2 benchmarks | Eric Fink | 2022/06/16 05:46 AM |
M2 benchmarks | Adrian | 2022/06/16 09:27 AM |
M2 benchmarks | --- | 2022/06/16 10:08 AM |
M2 benchmarks | Adrian | 2022/06/16 11:43 AM |
M2 benchmarks | Dummond D. Slow | 2022/06/16 01:03 PM |
M2 benchmarks | Adrian | 2022/06/17 03:34 AM |
M2 benchmarks | Dummond D. Slow | 2022/06/17 07:35 AM |
M2 benchmarks | none | 2022/06/16 10:14 AM |
M2 benchmarks | Adrian | 2022/06/16 12:44 PM |
M2 benchmarks | Eric Fink | 2022/06/17 02:05 AM |
M2 benchmarks | Anon | 2022/06/16 06:28 AM |
M2 benchmarks => MT | Adrian | 2022/06/16 11:04 AM |
M2 benchmarks => MT | Anon | 2022/06/18 02:38 AM |
M2 benchmarks => MT | Adrian | 2022/06/18 03:25 AM |
M2 benchmarks => MT | --- | 2022/06/18 10:14 AM |
M2 benchmarks | Doug S | 2022/06/16 09:49 AM |
M2 Pro at 3nm | Eric Fink | 2022/06/17 02:51 AM |
M2 benchmarks | Sean M | 2022/06/16 01:00 AM |
M2 benchmarks | Doug S | 2022/06/16 09:56 AM |
M2 benchmarks | joema | 2022/06/16 01:28 PM |
M2 benchmarks | Sean M | 2022/06/16 02:53 PM |
M2 benchmarks | Doug S | 2022/06/16 09:19 PM |
M2 benchmarks | Doug S | 2022/06/16 09:21 PM |
M2 benchmarks | --- | 2022/06/16 10:53 PM |
M2 benchmarks | Doug S | 2022/06/17 12:37 AM |
Apple’s STEM Ambitions | Sean M | 2022/06/17 04:18 AM |
Apple’s STEM Ambitions | --- | 2022/06/17 09:33 AM |
Mac Pro with Nvidia H100 | Tony Wu | 2022/06/17 06:37 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/17 10:37 PM |
Mac Pro with Nvidia H100 | Tony Wu | 2022/06/18 06:49 AM |
Mac Pro with Nvidia H100 | Dan Fay | 2022/06/18 07:40 AM |
Mac Pro with Nvidia H100 | Anon4 | 2022/06/20 09:04 AM |
Mac Pro with Nvidia H100 | Simon Farnsworth | 2022/06/20 10:09 AM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:32 AM |
Mac Pro with Nvidia H100 | Simon Farnsworth | 2022/06/20 11:20 AM |
Mac Pro with Nvidia H100 | Anon4 | 2022/06/20 04:16 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:19 AM |
Mac Pro with Nvidia H100 | me | 2022/06/18 07:17 AM |
Mac Pro with Nvidia H100 | Tony Wu | 2022/06/18 09:28 AM |
Mac Pro with Nvidia H100 | me | 2022/06/19 10:08 AM |
Mac Pro with Nvidia H100 | Dummond D. Slow | 2022/06/19 10:51 AM |
Mac Pro with Nvidia H100 | Elliott H | 2022/06/19 06:39 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/19 06:16 PM |
Mac Pro with Nvidia H100 | --- | 2022/06/19 06:56 PM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/19 11:00 PM |
Mac Pro with Nvidia H100 | --- | 2022/06/20 06:25 AM |
Mac Pro with Nvidia H100 | anon5 | 2022/06/20 08:41 AM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/20 07:22 PM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/20 07:13 PM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:19 PM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/22 12:06 AM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/22 09:18 AM |
Mac Pro with Nvidia H100 | Doug S | 2022/06/20 10:38 AM |
Mac Pro with Nvidia H100 | Sam G | 2022/06/20 07:17 PM |
Mac Pro with Nvidia H100 | Dummond D. Slow | 2022/06/20 05:46 PM |
Apple’s STEM Ambitions | noko | 2022/06/17 07:32 PM |
Quick aside: huge pages also useful for nested page tables (virtualization) (NT) | Paul A. Clayton | 2022/06/18 06:28 AM |
Quick aside: huge pages also useful for nested page tables (virtualization) | --- | 2022/06/18 10:16 AM |
Not this nonsense again | Anon | 2022/06/16 03:06 PM |
Parallel video encoding | Wes Felter | 2022/06/16 04:57 PM |
Parallel video encoding | Dummond D. Slow | 2022/06/16 07:16 PM |
Parallel video encoding | Wes Felter | 2022/06/16 07:49 PM |
Parallel video encoding | --- | 2022/06/16 07:41 PM |
Parallel video encoding | Dummond D. Slow | 2022/06/16 10:08 PM |
Parallel video encoding | --- | 2022/06/16 11:03 PM |
Parallel video encoding | Dummond D. Slow | 2022/06/17 07:45 AM |
Not this nonsense again | joema | 2022/06/16 09:13 PM |
Not this nonsense again | --- | 2022/06/16 11:18 PM |
M2 benchmarks-DDR4 vs DDR5 | Per Hesselgren | 2022/06/16 01:09 AM |
M2 benchmarks-DDR4 vs DDR5 | Rayla | 2022/06/16 08:12 AM |
M2 benchmarks-DDR4 vs DDR5 | Doug S | 2022/06/16 09:58 AM |
M2 benchmarks-DDR4 vs DDR5 | Rayla | 2022/06/16 11:58 AM |