Installing the CUDA Toolkit on Ubuntu
Lately, my interest for machine learning and artificial intelligence has revived. When I was at university, I followed some courses and specialisations in this field, but then during my career I hardly ever used any of it. Back in those years, complex neural nets and genetic algorithms took days to build, mainly because we didn’t have the computing power for that. But nowadays, things have changed, and such models can relatively quickly be built using a commodity graphics card.
To give my old workstation annex gaming PC a new meaning in life, why not try to employ its NVIDIA GT218 for some experiments? Sure, it isn’t a high-end card for todays standards, but at least it might be fun to try it. Only problem is: you can’t write arbitrary code and just run it on a graphics card. Luckily, NVIDIA distributes the CUDA toolkit which lets you do that.
If you’re up for a journey, continue reading…. Otherwise, skip to the end.
Preparations
Before even thinking of installing something, I had to make sure my machine was running a supported operating system. Since I have had good experiences with Ubuntu, and the machine had an old version of Ubuntu installed, I upgraded that to the latest Long Term Support (LTS) release: 16.04.2 at the time of writing. I choose Ubuntu 16.04 since it is an LTS release, which means it will still receive security patches, and since it is officially supported by NVIDIA.
Downloading
After that, I headed to the CUDA Toolkit download page and make the following choices:
|Operating System|Linux| |Architecture|x86_64| |Distribution|Ubuntu| |Version|16.04| |Installer Type|deb (network)|
I choose the deb (network) installer since it is the smallest to download and it will configure APT repositories for you. In case Nvidia decides to release updates to the toolkit, I hope this approach will make it easier to get them. The deb (local) will download everything upfront, and then you have to install another patch. It will probably work just fine, but I prefer this approach.
Installing
Installing the toolkit is pretty straightforward, and it listed on the download page as well:
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
Disabling Nouveau
If you want to use CUDA, you cannot use the open-source Nouveau drivers for NVIDIA graphics cards.
To blacklist them (meaning the Linux kernel will never load them), I created a file at
/etc/modprobe.d/blacklist-nouveau.conf
and put the following in it:
blacklist nouveau
options nouveau modeset=0
After that, you need to make sure the initial kernel image is also updated: sudo update-initramfs -u
.
Finally, I did a reboot, just to be sure, but I don’t think it is really necessary.
Trying out (the hard part)
Now comes the hardest part: trying to get it all to work.
See whether nvcc
works properly
The CUDA Toolkit comes with the NVIDIA CUDA Compiler, or nvcc
for short.
Let’s see if it works.
Running nvcc -V
told me it wasn’t installed, but I could install it installing the nvidia-cuda-toolkit
package.
See if the compiler can actually compile code
I don’t feel like writing C-code for the graphics card myself, but that isn’t necessary either.
The CUDA toolkit comes with some sample code, which can be copied to a directory of your choice by running the cuda-install-samples-8.0.sh
script, found in /usr/local/cuda-8.0/bin/
.
You need to give it a target directory to copy the samples to; I choose .
for my home directory.
I cd
‘ed into that folder and issued make
.
After a long wait (skipped here for brevity), I got
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/bin/ld: cannot find -lnvcuvid
collect2: error: ld returned 1 exit status
Makefile:381: recipe for target 'cudaDecodeGL' failed
make[1]: *** [cudaDecodeGL] Error 1
make[1]: Leaving directory '/home/maarten/NVIDIA_CUDA-8.0_Samples/3_Imaging/cudaDecodeGL'
Makefile:52: recipe for target '3_Imaging/cudaDecodeGL/Makefile.ph_build' failed
make: *** [3_Imaging/cudaDecodeGL/Makefile.ph_build] Error 2
Too bad: still no luck!
Specify where to find libnvcuvid
I had no clue where the libraries would be installed, so I called my old friend find
to the rescue.
Issuing find /usr/lib/ -name "*nvcuvid*"
revealed that a couple of files named libnvcuvid.so*
lived in /usr/lib/nvidia-375/
.
Now that is something I could use: LIBRARY_PATH=/usr/lib/nvidia-375/ make
.
Another long wait, and then finally:
make[1]: Leaving directory '/home/maarten/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/simpleCUFFT'
Finished building CUDA samples
Hurray!
But it’s easy to forget specifying this, so I updated my ~/.bashrc
and added the following (after the documentation):
export LD_LIBRARY_PATH=/usr/lib/nvidia-375
Does the sample code work?
The above make
command will produce quite some binaries in ./bin/x86_64/linux/release
(relative to the working directory).
An intesting one is deviceQuery
, which gave the following output:
./bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
Fix device files
According to the documentation, the device files that enables communication between the CUDA Driver and the kernel- mode portion of the NVIDIA Driver can sometimes not be created due to the system preventing setuid binaries. The guide also provides a fix for that: a custom script that should be after boot. Maybe, if I find time somewhere, I’ll create a nice init script for it, but for now, this should do:
#!/bin/bash
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "$?" -eq 0 ]; then
# Find out the major device number used by the nvidia-uvm driver
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
mknod -m 666 /dev/nvidia-uvm c $D 0
else
exit 1
fi
At first, the script didn’t work: the very first step (/sbin/modprobe nvidia
) failed with modprobe: ERROR: could not insert 'nvidia_375': No such device
.
Strange, since I am pretty sure the NVIDIA card is there!
On the NVIDIA DevTalk forum I found a post that helped out: it turned out there are a few other kernel modules that might interfere with the NVIDIA driver, in particular bbswitch.
According to the package manager, it’s an “Interface for toggling the power on NVIDIA Optimus video cards”.
Since my card isn’t an Optimus card, I figured I could safely remove the package using sudo aptitude remove bbswitch-dkms
.
Unfortunately, a reboot didn’t solve that either.
Look at kernel module loading
Searching the error messages I found so far (cudaGetDeviceCount returned 30
and could not insert 'nvidia_375': No such device
) pointed me into thinking something might be wrong with the kernel drivers.
So I tried to troubleshoot why the kernel modules couldn’t be loaded.
Running sudo modprobe --force-modversion nvidia-375-uvm
gave an interesting message
could not insert 'nvidia_375_uvm': Exec format error`
Running it again without the --force-modversion
gave
could not insert 'nvidia_375_uvm': Unknown symbol in module, or unknown parameter (see dmesg)
Now that’s interesting; I checked dmesg
to see what I could find there.
It displayed tons of identical messages:
[ 680.572990] NVRM: The NVIDIA GeForce 210 GPU installed in this system is
NVRM: supported through the NVIDIA 340.xx Legacy drivers. Please
NVRM: visit http://www.nvidia.com/object/unix.html for more
NVRM: information. The 375.66 NVIDIA driver will ignore
NVRM: this GPU. Continuing probe...
The link points to the NVIDIA Unix Driver Archive; is this a hint that my graphics card is getting old indeed?
Anyway, I decided to give this a try and installed the 340.xx drivers using sudo aptitude install nvidia-340
.
Now that’s a disappointment: apitude
suggests to remove cuda-8-0
.
So the driver that supports my graphics card is too old for Cuda 8?
Downgrade NVIDIA drivers, CUDA Toolkit and GCC
On a non-NVIDIA page, I found some kind of a compatibility matrix which told me that the 340.xx driver should match with CUDA 6.5.
So, as a final attempt, I downloaded an older version (6.5) of the CUDA Toolkit from the appropriate page.
I uninstalled all previously installed cuda stuff using aptitude
, up to the point where I could issue sudo aptitude install nvidia-340
without being greeted by a lot of conflicts.
Next, I installed CUDA 6.5 using sudo aptitude install cuda-samples-6-5
which happily installed some tooling as well.
Copying the samples to my home directory using /usr/local/cuda-6.5/bin/cuda-install-samples-6.5.sh .
.
According to the 6.5 documentation, I would need either 4.6 or 4.8.
On Ubuntu 16.04, it is still possible to install GCC 4.8 with sudo aptitude install gcc-4.8 g++-4.8
.
When they are installed, we need to tell make
to use it: GCC=g++-4.8 make
.
Another long wait before all samples are compiled.
Then running the deviceQuery
sample yields:
./bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 210"
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 1.2
Total amount of global memory: 511 MBytes (536150016 bytes)
( 2) Multiprocessors, ( 8) CUDA Cores/MP: 16 CUDA Cores
GPU Clock rate: 1402 MHz (1.40 GHz)
Memory Clock rate: 400 Mhz
Memory Bus Width: 64-bit
Maximum Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce 210
Result = PASS
Hurray! It finally works! I don’t even seem to need the script to fix device files anymore.
Conclusions
Time for a wrap-up. What did I learn? Two major things:
- Read the docs! A lot of mistakes and experiments could have been skipped by first reading the installation guide and other docs.
- Check support! Make sure to use the versions of the CUDA Toolkit that match with the NVIDIA driver and the graphics card that you have.
TL;DR
If you have a somewhat older card, first check the legacy NVIDIA driver listing to see which version is the latest to support your graphics card. Check the Rogue Wave Total View documentation for an unofficial compatibility matrix to find out which version of the CUDA Toolkit your driver will support. Follow the installation instructions for that version of the CUDA Toolkit.