For Chester: Nvidia’s RTX 4090 Launch

By: --- (, September 29, 2022 9:37 pm
Room: Moderated Discussions
Chester ( on September 29, 2022 2:22 pm wrote:
> --- ( on September 28, 2022 6:57 pm wrote:
> > Chester, I have no idea what's wrong with your setup but when I try to comment (either
> > vai Safari or via Chrome) I always get the complaint "Nonce verification failed".
> That's weird, I've seen other people post comments. Cheese seemed to have no trouble approving those.
> > Since I can't post there, and since it took some time to write this and it may
> > be of interest, I'll post here. Even David may find the essential point of interest
> > insofar as it touches on ML (tech) vs "ML" (advertising buzzword).
> >
> > ---------------------
> >
> > Do we have any details as to EXACTLY how ML/AI is used in this sort of upscaling?
> Nope, Nvidia has been tight lipped as usual about exactly how DLSS works. As far as
> what they've said, it's a machine learning model. Inputs are an optical flow field,
> a previous full res frame, and a current frame rendered at reduced resolution. Outputs
> are a full res current frame, and for DLSS3, an intermediate frame as well.
> Idk how exactly they trained it. Probably off reference full resolution frames

I think you're way too optimistic here, and that nV are doing (in essence) the same thing as Apple. There may be a better patent, but look at
which is essentially a spatial upscaling patent. This reads as very familiar because it's essentially the same thing as Apple's upscaling (in their Display Controller). Basically:
- upscale using traditional technology
- use some AI magic to make it better.

OK, what, EXACTLY is that AI magic? Neither clarify that part, but the essentials seem to be (once again) object detection, and once you have detected objects and edges, be selective in how you blend the material available. In the case of nV this is the current upsampled frame and one (or more) earlier frames.
[I forget what the inputs are in the case of Apple's Display Controller, but probably similar. Alternatively, rather than an earlier frame, you could create a second, modified, upscaled frame, eg via sharpening it, then use the neural net to choose between the "traditionally up-filtered" baseline and the sharpened image, presumably the latter giving better edges and certain types of textures.]

In other words what the AI is giving you is blending factors for each pixel. Not any sort of imagery. To quote from the patent:
"In at least one embodiment, this upscaled image 110 can be provided as input to a neural network 112 to determine one or more blending factors or blending weights. In at least one embodiment, this neural network can also determine at least some filtering to be applied when reconstructing or blending a current image with a prior image.

This ain't nothing. But, like I said for fusion, it also ain't what most people have in mind when they hear "AI-assisted upsampling"...

Yeah, yeah, I know this hits all your triggers, from Apple to patent exploration. And I don't care enough to get into a fight about it. But I do think there is less here than meets the eye (across the entire industry, from Apple to nV).

Don't get me wrong: I don't think this is a scandal! AI genuinely helps in lots of non-obvious ways in places like speech synthesis, and if "this one weird trick" with blending improves upscaling (better edges, sharper textures) hey, go for it. In a sense it's the fault of us, the public, for expecting that "AI in the context of imagery must mean something like DALL-E".
But I also think that when we, the public, learn something, even if it's not the full story, about what exactly AI means in each context, we might as well try to inform our fellow citizens.

Most helpful would be if the tech journalists reading this get a chance to talk to anyone (Apple [good luck!], nV, AMD, Intel, even ARM or QC) try pushing on this seriously, asking eg if a paper exists, or if my/your understanding of the patent trail is defective in some serious way.

> > I'd like to compare with the one case I do know something about, namely Apple.
> > Apple makes a big deal about image fusion in their camera, for example introducing
> > the brand name Photonic Engine for this operation in the iPhone14 class phones.
> I think DLSS and Photonic engine are different enough that no comparison makes sense. Nvidia is
> trying to upscale and generate new frames in real time. Apple is post processing camera images.
> > I want to contrast the journalist claims about Photonic Engine with Apple's claims:
> > Journalist: "Photonic Engine leverages hardware inside the iPhone 14, iPhone 14 Plus, iPhone
> > 14 Pro, and iPhone 14 Pro Max and applies some machine learning and iOS 16 software magic."
> > Apple: "Then we added the all-new Photonic Engine, our game-changing image pipeline.
> > It allows Deep Fusion — which merges the best pixels from multiple exposures into
> > one phenomenal photo — to happen earlier in the process on uncompressed images.
> > This preserves much more data to deliver brighter, more lifelike colors
> > and beautifully detailed textures in less light than ever.'
> Looks like a lot of marketing speak about a decades old technique of stacking images to
> reduce noise and increase dynamic range. Working "on uncompressed images" (raw files)
> is typical too, because you lose dynamic range once the image is processed to JPG.
> > But what do the patents actually say? If you look at them, they are more or less "traditional"
> > image processing, though in the wavelet rather than the fourier domain. The most interesting
> > aspect for our purposes is that if we want to fuse two images, we do so by:
> > - dividing the image into tiles
> > - finding equivalent keypoints in each tile
> > - finding a warp that maps each tile to its correspondent such that the keypoints align
> > - fusing the warped image2 with image1.
> Yeah, if you want to stack images, you want to find corresponding keypoints and transform
> the images so they line up. Nearly every modern cell phone seems to have some sort of
> image stacking mode, and they all seem to work well in most situations. Maybe they're
> using machine learning and the NPU to do alignment. Or maybe not. I don't know.
> Maybe doing it in tiles saves processing power or improves cache efficiency.
> Probably relevant to a phone where battery life is a high priority.
> > The reason I go through all this is because
> > (a) it's pretty damn interesting, isn't it? :-)
> It is, but it's also completely unrelated to what Nvidia is doing with DLSS.
> > (b) it's very different, IMHO, from what we think of as AI/ML. It's not exactly a lie to say that
> > AI/ML is involved, but it's also somewhat misleading, at least compared to what we think of as
> > AI/ML. The important point is finding good, informative matching key features, not any sort of
> > "recognition" or "aesthetic judgement based on scanning trillions of photos" or whatever.
> > (c) my GUESS is that the sort of spatial and temporal upscaling described in the article essentially
> > operates in the same way: find keypoints, find a warping function (now called optical flow;
> If you stretch it, they're kind of related in that both optical flow and matching keypoints involve finding
> matching features across images. But beyond that it's very different. If you're stacking images and matching
> keypoints, you don't care about creating velocity vectors to predict where an image feature will move to
> next. All you care about is aligning the images so you can stack them without obvious ghosting.
> In cell phones, your output image has to stand up to close scrutiny and only needs to be generated
> fast enough to prevent users from complaining. Obvious artifacts and ghosting need to be avoided
> at all costs, because they could trash an entire image. With NV DLSS, you need to get the image out
> in a matter of milliseconds, because time taken generating the image adds latency and nobody likes
> that when gaming. Artifacts should be minimized, but images only stay on screen for a few milliseconds
> before being replaced by an actually rendered frame. So, occasional artifacts (and early DLSS3 videos
> do show them) are probably acceptable as long as they don't happen too often.
> If you want to compare Apple's Photonic Engine to something, I think Google's Pixel lineup
> is a good start. They definitely have several image stacking modes (night sight, HDR+), and
> do a very good job of aligning images even if you aren't steady with holding the phone.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
For Chester: Nvidia’s RTX 4090 Launch---2022/09/28 06:57 PM
  For Chester: Nvidia’s RTX 4090 LaunchChester2022/09/29 02:22 PM
    For Chester: Nvidia’s RTX 4090 Launch---2022/09/29 09:37 PM
      For Chester: Nvidia’s RTX 4090 LaunchChester2022/09/30 11:04 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊