For Chester: Nvidia’s RTX 4090 Launch

By: --- (---.delete@this.redheron.com), September 28, 2022 5:57 pm
Room: Moderated Discussions
Chester, I have no idea what's wrong with your setup but when I try to comment (either vai Safari or via Chrome) I always get the complaint "Nonce verification failed".

Since I can't post there, and since it took some time to write this and it may be of interest, I'll post here. Even David may find the essential point of interest insofar as it touches on ML (tech) vs "ML" (advertising buzzword).

---------------------

Do we have any details as to EXACTLY how ML/AI is used in this sort of upscaling?

I'd like to compare with the one case I do know something about, namely Apple.
Apple makes a big deal about image fusion in their camera, for example introducing the brand name Photonic Engine for this operation in the iPhone14 class phones.

I want to contrast the journalist claims about Photonic Engine with Apple's claims:
Journalist: "Photonic Engine leverages hardware inside the iPhone 14, iPhone 14 Plus, iPhone 14 Pro, and iPhone 14 Pro Max and applies some machine learning and iOS 16 software magic."
Apple: "Then we added the all-new Photonic Engine, our game-changing image pipeline. It allows Deep Fusion — which merges the best pixels from multiple exposures into one phenomenal photo — to happen earlier in the process on uncompressed images.
This preserves much more data to deliver brighter, more lifelike colors and beautifully detailed textures in less light than ever.'

Note the missing words from the Apple claims...

Let's go back earlier. In 2019 with the iPhone 11 Apple introduced Deep Fusion which WAS claimed (in some vague, unspecified way) to make use of the Neural Engine.
But Apple has backed off from this claim in their copy for the current design.

If we look through the patent stack (the most recent one is https://patents.google.com/patent/US20220253972A1, and if you compare it to the equivalent two years earlier, the primary difference, as Apple said, is that in the earlier patent we start processing on a sub-sampled image; in the newest version we start on the RAW image). So the patents kinda match what Apple has focussed on.

But what do the patents actually say? If you look at them, they are more or less "traditional" image processing, though in the wavelet rather than the fourier domain. The most interesting aspect for our purposes is that if we want to fuse two images, we do so by:
- dividing the image into tiles
- finding equivalent keypoints in each tile
- finding a warp that maps each tile to its correspondent such that the keypoints align
- fusing the warped image2 with image1.

That's fine (and certainly not trivial and nothing to complain about; the warping surely helps improve the quality of the fused images as either the camera moves slightly or the elements in the image move slightly between successive pictures). But there's not much scope there for AI/ML.
Or is there?
How EXACTLY are those keypoints chosen/detected?

As far as I can tell, what's happening is that the images run through (again somewhat 'traditional") vision processing in parallel with image capture, and it is this processing stage that chooses keypoints (probably starting with object detection, then maybe trying to match edges between frames and using keypoints as the maximally mismatching parts of edges, or something?)
Possibly this is improved by using neural net based object detection, rather than simpler edge detection?

The reason I go through all this is because
(a) it's pretty damn interesting, isn't it? :-)
(b) it's very different, IMHO, from what we think of as AI/ML. It's not exactly a lie to say that AI/ML is involved, but it's also somewhat misleading, at least compared to what we think of as AI/ML. The important point is finding good, informative matching key features, not any sort of "recognition" or "aesthetic judgement based on scanning trillions of photos" or whatever.
(c) my GUESS is that the sort of spatial and temporal upscaling described in the article essentially operates in the same way: find keypoints, find a warping function (now called optical flow; the goal is no longer to warp tile A to tile B, it's to find that warping function then apply it with half the parameter strength so that we interpolate to a time half-way between the two frames)
Once again very interesting tech, but somewhat different from what we think of as AI/ML, the primary primitive of interest again being object detection as a better version of edge detection.

So, given all this, I think it would be great if we could learn what everyone (Apple and Google on the camera side, nV and AMD and Intel on the DLSS side) are actually doing when they talk about AI/ML.

[People who follow this stuff will remember that in the late 1990s, MPEG4 had multiple codecs described in the spec; the basic block-based motion compensation we all know about, of course, but also object-based compression and warping-based compression. It will be interesting to see if, now that the tech to do this sort of computation in real-time is available, we see some sort of attempt to return to codecs based on those ideas.]
 Next Post in Thread >
TopicPosted ByDate
For Chester: Nvidia’s RTX 4090 Launch---2022/09/28 05:57 PM
  For Chester: Nvidia’s RTX 4090 LaunchChester2022/09/29 01:22 PM
    For Chester: Nvidia’s RTX 4090 Launch---2022/09/29 08:37 PM
      For Chester: Nvidia’s RTX 4090 LaunchChester2022/09/30 10:04 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊