Non-ROOT users configure a remote CUDA server for a deep learning environment, below is an example of Pytorch.
CUDA driver
Check the version of CUDA already installed on the server:
|
|
Showing version 11.4 here, but it doesn’t matter. Next to check the driver version, this relates to the latest CUDA version that we can install in the conda virtual environment.
|
|
Tips:
The CUDA Version shown here is 12.2, which is actually the latest CUDA version that the CUDA driver installed on the server supports to install, which means that the highest CUDA version we can install next is 12.2, and the CUDA version that has already been installed on this machine is 11.4.
Because we are a NON-ROOT user, we can’t change the installed driver version, but we can install the new CUDA version through the Conda virtual environment.
Conda
When configuring python-related environments for non-root users, I extremely recommend using the conda package manager for dependency management (in fact, it’s also recommended for those using R).
I’m installing the miniconda version here (the minimized version), but you can also install anaconda (the full version).
The blog has been written about related content before.
|
|
Pytorch
You can check the Pytorch history release at Pytorch History Release and combine it with your own needs (e.g., [Colossal-AI](https:// colossalai.org/zh-Hans/docs/get_started/installation/) requires PyTorch >= 1.11 and PyTorch <= 2.1). I ended up with pytorch==2.1.0 and CUDA==11.8.
|
|
Thinking:
As for why pytorch doesn’t redistribute for some cuda versions, in this issue one of pytorch’s developers explains that pytorch-cuda113 can be used on cuda114 This simply means that there is no need to re-release it, and that the 113 version can be used for the 114 version.
Check availability
|
|
CUDA
At Nvidia-Cuda look for the CUDA version corresponding to the installed Pytorch, in my case 11.8.
|
|
At this point, enter the nvcc -V
command to find that the installed version is now 11.8.
Cuda installed using conda
|
|
Extra
gcc clang g++ clang++
|
|
xformers
|
|
transformers & datasets & accelerate
|
|