Parallel video encoding

By: Dummond D. Slow (mental.delete@this.protozoa.us), June 17, 2022 7:45 am
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on June 16, 2022 11:03 pm wrote:
> Dummond D. Slow (mental.delete@this.protozoa.us) on June 16, 2022 10:08 pm wrote:
> > --- (---.delete@this.redheron.com) on June 16, 2022 7:41 pm wrote:
> > > Wes Felter (wmf.delete@this.felter.org) on June 16, 2022 4:57 pm wrote:
> > > > Anon (lkasdfj.delete@this.fjdksalf.com) on June 16, 2022 3:06 pm wrote:
> > > >
> > > > > For those scratching your heads, this dude's an armchair quarterback who believes the main reason
> > > > > Apple's chip architects put in many hardware codec blocks was to accelerate single stream encode/decode
> > > > > to a higher degree than a single block could on its own. He just *knows* it must be possible to
> > > > > split the work into multiple threads so that multiple codec "cores" can collaborate on it - some
> > > > > software codecs are multithreaded so obviously that must be possible with any hardware codec design
> > > > > too. Therefore, according to him, if Apple's system can't do that, it must be a terrible and embarrassing
> > > > > failure of communication between hardware and software departments.
> > > >
> > > > It is possible to split video on GOP boundaries and encode in parallel,
> > > > even over a cluster. For example, here's what Netflix does:
> > > >
> > > > "The media content is broken up into smaller chunks. Each of the chunk is a portion of the
> > > > video, usually about 30 seconds to a couple of minutes in duration. In a massively parallel
> > > > way, we encode these video chunks independently on our servers. Once all the chunks have finished
> > > > the encoding process, they're reassembled to become a single encoded video asset."
> > > >
> > > > Maybe this can't be hidden under the existing VideoToolbox API so it probably requires app
> > > > changes. And maybe those changes haven't been made which (outside of Final Cut) isn't Apple's
> > > > fault. But I can understand why customers might be confused or frustrated if they are waiting
> > > > on a single-stream encode workload which isn't faster on higher-end Macs.
> > >
> > > Even something apparently as simple and basic as video encoding remains under
> > > substantial construction because Apple are doing so much that is novel.
> > > For example a substantial part of video encoding (certainly, even still, for all the MPEG codecs; I would
> > > guess so also for ProRes) is knowing the optical flow between frames, something that has traditionally been
> > > expensive to compute. If you know optical flow reliably, you
> > > can in turn use this to drive many other traditionally
> > > expensive decisions like how to break up the tree of blocks to sub-blocks in a frame.
> > > Now, something that was clear if you viewed the the right WWDC videos, is that
> > > Apple have a new Optical Flow net (actually I believe it's version 3 of this
> > > particular net) that's substantially better than what went before.
> > >
> >
> > Note that optical flow is not actually useful here. I mean, it can be useful as predictor of movement
> > for further search refinement, but that is not that much useful and not saving that much time, as
> > it is that refinement that eats the massive amounts of expensive analysis - coarse motion search
> > is easy (after all you can do it in lowres pre-pass), but it is the quarterpel, 8thpel decision and
> > other refinement and mode decision where the search space of encoder decisions explodes.
> >
> > Why? Because for high quality compression, you don't need the "true" motion vectors.
> > You need to find vectors (and other prediction modes, like skips/no vectors, intra prediction...)
> > that will result in the combination of lowest amount of residual data at the best resulting
> > quality. Key to that is doing RDO optimization for mode decision.
> >
> > As encoder researchers/developers found out, true motion vectors are not the same as compression-optimal
> > vectors. If you search for motion vectors outside of the encoding loop and then just give them
> > to the encoder to use, you will get awful compression/quality combination.
>
> I understand what you are saying, but I have never read a claim that "the true optical flow
> vectors are not [close to] optimal", only the much weaker claim that "what the compression
> cares about is vectors that give small deltas, regardless of real optical flow".
> Does a paper exist discussing this exact point --
> - calculate perfect optimal flow
> - see how far "compression optimal" vectors differ from it
> - see how much time is saved by using it as a starting point
> ?
>
> A similar point holds for the block/subblock refinement. On the one hand you can say that it's all in
> what gives the fewest bits. On the other hand, you can use figure/ground segmentation to calculate a
> best block approximation to the moving semantic content, and once again ask the question as to how well
> that does compared to either existing heuristics or (clearly impractical generally) total search.
>
> Do you have an actual reference for that "As encoder researchers/developers found out, true motion
> vectors are not the same as compression-optimal vectors" or is it more a rough feeling?
> I am genuinely curious about this, because 20 years ago there was a belief that semantic segmentation
> of images was basically the holy grail of optimal video encoding, and optical flow plus figure ground
> segmentation (as delivered by neural net) is basically what they were talking about.
>

Old-time x264 developers, discussions in their dev IRC, mailing list, blogposts about new features and the like. Sadly, lots of that disappeared in the 404 abyss over the years or is not easy to find anymore. A long irc log of #x264dev was available from http://akuvian.org/src/x264/, some searching might find relevant discussions.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
M2 benchmarks-2022/06/15 12:27 PM
  You mean "absurd ARM"? ;-) (NT)Rayla2022/06/15 02:18 PM
    It has PPC heritage :) (NT)anon22022/06/15 02:55 PM
      Performance per clock2022/06/15 03:05 PM
        Performance per single clock cyclehobold2022/06/16 05:12 AM
          Performance per single clock cycledmcq2022/06/16 06:59 AM
            Performance per single clock cyclehobold2022/06/16 07:42 AM
          Performance per single clock cycleDoug S2022/06/16 09:39 AM
            Performance per single clock cyclehobold2022/06/16 12:36 PM
            More like cascaded ALUsPaul A. Clayton2022/06/16 01:13 PM
              SuperSPARC ALUMark Roulo2022/06/16 01:57 PM
                LEABrett2022/06/16 02:52 PM
  M2 benchmarksDaveC2022/06/15 03:31 PM
    M2 benchmarksanon22022/06/15 05:06 PM
    M2 benchmarks2022/06/15 07:21 PM
    M2 benchmarks---2022/06/15 07:33 PM
  M2 benchmarksAdrian2022/06/15 10:11 PM
    M2 benchmarksEric Fink2022/06/16 12:07 AM
      M2 benchmarksAdrian2022/06/16 02:09 AM
        M2 benchmarksEric Fink2022/06/16 05:46 AM
          M2 benchmarksAdrian2022/06/16 09:27 AM
            M2 benchmarks---2022/06/16 10:08 AM
              M2 benchmarksAdrian2022/06/16 11:43 AM
                M2 benchmarksDummond D. Slow2022/06/16 01:03 PM
                  M2 benchmarksAdrian2022/06/17 03:34 AM
                    M2 benchmarksDummond D. Slow2022/06/17 07:35 AM
            M2 benchmarksnone2022/06/16 10:14 AM
              M2 benchmarksAdrian2022/06/16 12:44 PM
            M2 benchmarksEric Fink2022/06/17 02:05 AM
        M2 benchmarksAnon2022/06/16 06:28 AM
          M2 benchmarks => MTAdrian2022/06/16 11:04 AM
            M2 benchmarks => MTAnon2022/06/18 02:38 AM
              M2 benchmarks => MTAdrian2022/06/18 03:25 AM
                M2 benchmarks => MT---2022/06/18 10:14 AM
      M2 benchmarksDoug S2022/06/16 09:49 AM
        M2 Pro at 3nmEric Fink2022/06/17 02:51 AM
    M2 benchmarksSean M2022/06/16 01:00 AM
      M2 benchmarksDoug S2022/06/16 09:56 AM
        M2 benchmarksjoema2022/06/16 01:28 PM
          M2 benchmarksSean M2022/06/16 02:53 PM
            M2 benchmarksDoug S2022/06/16 09:19 PM
              M2 benchmarksDoug S2022/06/16 09:21 PM
                M2 benchmarks---2022/06/16 10:53 PM
                  M2 benchmarksDoug S2022/06/17 12:37 AM
                  Apple’s STEM AmbitionsSean M2022/06/17 04:18 AM
                    Apple’s STEM Ambitions---2022/06/17 09:33 AM
                      Mac Pro with Nvidia H100Tony Wu2022/06/17 06:37 PM
                        Mac Pro with Nvidia H100Doug S2022/06/17 10:37 PM
                          Mac Pro with Nvidia H100Tony Wu2022/06/18 06:49 AM
                            Mac Pro with Nvidia H100Dan Fay2022/06/18 07:40 AM
                          Mac Pro with Nvidia H100Anon42022/06/20 09:04 AM
                            Mac Pro with Nvidia H100Simon Farnsworth2022/06/20 10:09 AM
                              Mac Pro with Nvidia H100Doug S2022/06/20 10:32 AM
                                Mac Pro with Nvidia H100Simon Farnsworth2022/06/20 11:20 AM
                              Mac Pro with Nvidia H100Anon42022/06/20 04:16 PM
                            Mac Pro with Nvidia H100Doug S2022/06/20 10:19 AM
                        Mac Pro with Nvidia H100me2022/06/18 07:17 AM
                          Mac Pro with Nvidia H100Tony Wu2022/06/18 09:28 AM
                            Mac Pro with Nvidia H100me2022/06/19 10:08 AM
                              Mac Pro with Nvidia H100Dummond D. Slow2022/06/19 10:51 AM
                                Mac Pro with Nvidia H100Elliott H2022/06/19 06:39 PM
                            Mac Pro with Nvidia H100Doug S2022/06/19 06:16 PM
                              Mac Pro with Nvidia H100---2022/06/19 06:56 PM
                                Mac Pro with Nvidia H100Sam G2022/06/19 11:00 PM
                                  Mac Pro with Nvidia H100---2022/06/20 06:25 AM
                                    Mac Pro with Nvidia H100anon52022/06/20 08:41 AM
                                      Mac Pro with Nvidia H100Sam G2022/06/20 07:22 PM
                                    Mac Pro with Nvidia H100Sam G2022/06/20 07:13 PM
                                      Mac Pro with Nvidia H100Doug S2022/06/20 10:19 PM
                                        Mac Pro with Nvidia H100Sam G2022/06/22 12:06 AM
                                          Mac Pro with Nvidia H100Doug S2022/06/22 09:18 AM
                                  Mac Pro with Nvidia H100Doug S2022/06/20 10:38 AM
                                    Mac Pro with Nvidia H100Sam G2022/06/20 07:17 PM
                              Mac Pro with Nvidia H100Dummond D. Slow2022/06/20 05:46 PM
                      Apple’s STEM Ambitionsnoko2022/06/17 07:32 PM
                      Quick aside: huge pages also useful for nested page tables (virtualization) (NT)Paul A. Clayton2022/06/18 06:28 AM
                        Quick aside: huge pages also useful for nested page tables (virtualization)---2022/06/18 10:16 AM
          Not this nonsense againAnon2022/06/16 03:06 PM
            Parallel video encodingWes Felter2022/06/16 04:57 PM
              Parallel video encodingDummond D. Slow2022/06/16 07:16 PM
                Parallel video encodingWes Felter2022/06/16 07:49 PM
              Parallel video encoding---2022/06/16 07:41 PM
                Parallel video encodingDummond D. Slow2022/06/16 10:08 PM
                  Parallel video encoding---2022/06/16 11:03 PM
                    Parallel video encodingDummond D. Slow2022/06/17 07:45 AM
            Not this nonsense againjoema2022/06/16 09:13 PM
              Not this nonsense again---2022/06/16 11:18 PM
  M2 benchmarks-DDR4 vs DDR5Per Hesselgren2022/06/16 01:09 AM
    M2 benchmarks-DDR4 vs DDR5Rayla2022/06/16 08:12 AM
      M2 benchmarks-DDR4 vs DDR5Doug S2022/06/16 09:58 AM
        M2 benchmarks-DDR4 vs DDR5Rayla2022/06/16 11:58 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?