By: Adrian (a.delete@this.acm.org), September 30, 2022 7:30 am
Room: Moderated Discussions
Philippe (phil995511.delete@this.gmail.com) on September 30, 2022 5:14 am wrote:
> The NVIDIA Linux driver 515.65 and 515.76 are not compatible with Debian 11 on kernel
> 5.19.x !? We've been waiting for a fix for weeks, so what is Nvidia doing ?!?
>
> Can't wait for their drivers to finally be integrated into the kernel, after
> all these years of bothering to manage their proprietary drivers...
>
> https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/
Are you sure about 515.76 ?
When it was released more than a week ago, it was said that it should be compatible with Linux kernel 6.0, even if that cannot be certain, because the kernel 6.0 has not been released yet.
I have been using 515.65 for a long time, because an upgrade to it was necessary for installing the latest CUDA, and with older long term Linux kernels it works very well.
I certainly want for any privileged program that I use, like device drivers, to be open source, but nevertheless NVIDIA has provided for about 20 years high quality device drivers for Linux and FreeBSD, even if in closed source form. Among other hardware producers, only Intel has provided a similar amount of high-quality support work.
When having enough time for that, I would prefer a worse open-source device driver for Intel or AMD, which I can debug and improve myself, but for GPUs that would require more work than for any other peripherals, so when you need something that just works out of the box, NVIDIA has been frequently the only solution.
I also have a lot of sympathy for the difficulties faced by anyone who has to maintain an out-of-tree device driver against the random changes done by the kernel developers at each kernel version release, when header files are moved or renamed, definitions are moved from one header to another, structure members are added or deleted and function parameters are added or deleted.
This Linux device driver maintenance work could have been trivial, except that the kernel developers do not provide any documentation useful for those who need to migrate a device driver between kernel versions.
So whenever a new kernel version is released, you attempt to compile the device driver, but that fails.
After some work you identify the causes. For various kinds of moves and renames, it is relatively easy to determine how to change the device driver.
However, when structure members have been added or deleted, or function parameters have been added or deleted, it is much more difficult to discover what has to be done so that your device driver will work as before.
With a little search, you can discover in the Linux kernel e-mail lists the e-mail message which provided the offending patch which changed the structures or functions, but that message usually does not say anything about the rationale for the change, much less about what must to be done to convert the code which used the old structures or functions to code that uses the new structures or functions.
If the committing message provides no clues, additional searches through the kernel e-mail lists may find some discussions between developers about proposed changes, which might clarify the reason for the changes, but I have never found any useful instructions for device driver migration.
I prefer very much a Linux kernel that is not frozen to preserve legacy mistakes and that is improved continuously, but I believe that for any kernel change that breaks the existing device drivers there should always be some written instructions explaining how to convert the old drivers to work with new kernels, e.g. specifying what values must be put in new structure members or new function parameters to obtain the previous behavior, or what to do when structure members or function parameters disappear (some times those may be just deleted in the device drivers, but other times some new functions might need to be invoked to get the old behavior).
Until now, I have never seen such Linux kernel documentation. If it exists, it is well hidden.
> The NVIDIA Linux driver 515.65 and 515.76 are not compatible with Debian 11 on kernel
> 5.19.x !? We've been waiting for a fix for weeks, so what is Nvidia doing ?!?
>
> Can't wait for their drivers to finally be integrated into the kernel, after
> all these years of bothering to manage their proprietary drivers...
>
> https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/
Are you sure about 515.76 ?
When it was released more than a week ago, it was said that it should be compatible with Linux kernel 6.0, even if that cannot be certain, because the kernel 6.0 has not been released yet.
I have been using 515.65 for a long time, because an upgrade to it was necessary for installing the latest CUDA, and with older long term Linux kernels it works very well.
I certainly want for any privileged program that I use, like device drivers, to be open source, but nevertheless NVIDIA has provided for about 20 years high quality device drivers for Linux and FreeBSD, even if in closed source form. Among other hardware producers, only Intel has provided a similar amount of high-quality support work.
When having enough time for that, I would prefer a worse open-source device driver for Intel or AMD, which I can debug and improve myself, but for GPUs that would require more work than for any other peripherals, so when you need something that just works out of the box, NVIDIA has been frequently the only solution.
I also have a lot of sympathy for the difficulties faced by anyone who has to maintain an out-of-tree device driver against the random changes done by the kernel developers at each kernel version release, when header files are moved or renamed, definitions are moved from one header to another, structure members are added or deleted and function parameters are added or deleted.
This Linux device driver maintenance work could have been trivial, except that the kernel developers do not provide any documentation useful for those who need to migrate a device driver between kernel versions.
So whenever a new kernel version is released, you attempt to compile the device driver, but that fails.
After some work you identify the causes. For various kinds of moves and renames, it is relatively easy to determine how to change the device driver.
However, when structure members have been added or deleted, or function parameters have been added or deleted, it is much more difficult to discover what has to be done so that your device driver will work as before.
With a little search, you can discover in the Linux kernel e-mail lists the e-mail message which provided the offending patch which changed the structures or functions, but that message usually does not say anything about the rationale for the change, much less about what must to be done to convert the code which used the old structures or functions to code that uses the new structures or functions.
If the committing message provides no clues, additional searches through the kernel e-mail lists may find some discussions between developers about proposed changes, which might clarify the reason for the changes, but I have never found any useful instructions for device driver migration.
I prefer very much a Linux kernel that is not frozen to preserve legacy mistakes and that is improved continuously, but I believe that for any kernel change that breaks the existing device drivers there should always be some written instructions explaining how to convert the old drivers to work with new kernels, e.g. specifying what values must be put in new structure members or new function parameters to obtain the previous behavior, or what to do when structure members or function parameters disappear (some times those may be just deleted in the device drivers, but other times some new functions might need to be invoked to get the old behavior).
Until now, I have never seen such Linux kernel documentation. If it exists, it is well hidden.
Topic | Posted By | Date |
---|---|---|
NVIDIA Linux driver and Kernel 5.19 & 6.0 | Philippe | 2022/09/30 05:14 AM |
NVIDIA Linux driver and Kernel 5.19 & 6.0 | Maxwell | 2022/09/30 07:06 AM |
NVIDIA Linux driver and Kernel 5.19 & 6.0 | Marcus | 2022/09/30 07:20 AM |
NVIDIA Linux driver and Kernel 5.19 & 6.0 | Adrian | 2022/09/30 07:30 AM |
NVIDIA Linux driver and Kernel 5.19 & 6.0 | Marcus | 2022/10/01 03:12 AM |
NVIDIA Linux driver and Kernel 5.19 & 6.0 | Adrian | 2022/10/01 06:14 AM |
NVIDIA Linux driver and Kernel 5.19 & 6.0. Survival for Nvidia | Björn Ragnar Björnsson | 2022/10/01 07:21 PM |