Xmake Getting Started Tutorial 7, Developing and Building CUDA Programs
xmake is a lightweight and modern c/c++ project building tool based on Lua. It's main features are: easy to use syntax, easy to use project maintenance, and a consistent build experience across platforms.
In this article, we will explain in detail how to build CUDA programs and mixed compilation with c/c++ programs through xmake.
Preparing the Environment
First, we need to install the CUDA Toolkit SDK tool provided by NVIDIA. For related instructions and installation documentation, please refer to the official documentation: CUDA Toolkit Documentation.
After downloading and installing the CUDA SDK, on macOS it will be installed to the /Developer/NVIDIA/CUDA-x.x directory by default. On Windows, you can find the corresponding SDK directory through the CUDA_PATH environment variable, and on Linux it will be installed to the /usr/local/cuda directory by default.
Usually, xmake can automatically detect the default CUDA installation environment without any operation. You just need to execute the xmake command to automatically complete the compilation. Of course, if the SDK cannot be found, we can also manually specify the CUDA SDK environment directory:
$ xmake f --cuda=/usr/local/cuda-9.1/Or use the xmake g/global command to switch to global settings to avoid having to reconfigure every time you switch compilation modes.
$ xmake g --cuda=/usr/local/cuda-9.1/If you want to test xmake's detection support for the current CUDA environment, you can directly run:
$ xmake l detect.sdks.find_cuda
{
linkdirs = {
"/Developer/NVIDIA/CUDA-10.2/lib/stubs",
"/Developer/NVIDIA/CUDA-10.2/lib"
},
bindir = "/Developer/NVIDIA/CUDA-10.2/bin",
sdkdir = "/Developer/NVIDIA/CUDA-10.2",
includedirs = {
"/Developer/NVIDIA/CUDA-10.2/include"
}
}You can also help contribute related detection code find_cuda.lua to improve xmake's detection mechanism.
Creating a Project
Next, we can create an empty project to quickly experience it. xmake comes with a CUDA project template. Just specify the corresponding language to create a CUDA project:
$ xmake create -l cuda test
create test ...
[+]: xmake.lua
[+]: src/main.cu
[+]: .gitignore
create ok!The default created CUDA project is the simplest CUDA-based hello world project. Its source code structure is as follows:
├── src
│ └── main.cu
└── xmake.luaWe can also take a brief look at the content in xmake.lua:
-- define target
target("test")
set_kind("binary")
add_files("src/*.cu")
-- generate SASS code for SM architecture of current host
add_cugencodes("native")
-- generate PTX code for the virtual architecture to guarantee compatibility
add_cugencodes("compute_30")As you can see, except for the most basic .cu source file addition, the only difference from other c/c++ projects is the addition of add_cugencodes() to set the gencodes required by CUDA. We will explain this in detail below.
Compiling the Project
After the project is created, you just need to simply execute xmake to complete the compilation.
$ xmake
[00%]: ccache compiling.release src/main.cu
[99%]: devlinking.release test_gpucode.cu.o
[100%]: linking.release testNote: Starting from v2.2.7, xmake enables device-link build behavior by default. That is to say, an additional device-link step will be added during the compilation process:
[100%]: devlinking.release test_gpucode.cu.oAccording to the official statement, the main advantage of enabling device-link device code linking is that it can provide a more traditional code structure for your application, especially in C++, allowing you to control each build and link step under the premise of keeping the existing project structure unchanged, quickly enable GPU code, and achieve mixed compilation.
For this, please refer to NVIDIA's official description: Separate Compilation and Linking of CUDA C++ Device Code If you want to disable the device-link build logic, you can set it to disable via add_values("cuda.devlink", false).
Of course, we can also try to run this CUDA program directly:
$ xmake runProject Settings
And if the value inside is set to native, xmake will automatically detect the gencode corresponding to the CUDA device on the current host.
add_cuflags
This interface is mainly used to add compilation options related to cu code. If we need some more customized flag settings, we can call add_cuflags to directly set more raw compilation options, just like add_cxflags in c/c++.
For example:
add_cuflags("-gencode arch=compute_30,code=sm_30")add_culdflags
This interface is mainly used to add CUDA device link options. As mentioned above, after 2.2.7, xmake's default build behavior for CUDA programs will use device-link. If you need to set some link flags at this stage, you can set them through this interface. Because the final program linking will use ldflags and will not call nvcc, it will be linked directly through gcc/clang and other c/c++ linkers. So the flag settings for the device-link independent link stage are completed through this interface.
add_culdflags("-gencode arch=compute_30,code=sm_30")add_cugencodes
The add_cugencodes() interface is actually a simplified encapsulation of the add_cuflags("-gencode arch=compute_xx,code=compute_xx") compilation flag settings. The mapping relationship between its internal parameter values and the actual flags is as follows:
- compute_xx --> `-gencode arch=compute_xx,code=compute_xx`
- sm_xx --> `-gencode arch=compute_xx,code=sm_xx`
- sm_xx,sm_yy --> `-gencode arch=compute_xx,code=[sm_xx,sm_yy]`
- compute_xx,sm_yy --> `-gencode arch=compute_xx,code=sm_yy`
- compute_xx,sm_yy,sm_zz --> `-gencode arch=compute_xx,code=[sm_yy,sm_zz]`
- native --> match the fastest cuda device on current host,
eg. for a Tesla P100, `-gencode arch=compute_60,code=sm_60` will be added,
if no available device is found, no `-gencode` flags will be addedFor example:
add_cugencodes("sm_30")is equivalent to
add_cuflags("-gencode arch=compute_30,code=sm_30")
add_culdflags("-gencode arch=compute_30,code=sm_30")Isn't the above more concise? This is actually an auxiliary interface for simplified settings.
And if we set the native value, xmake will automatically detect the CUDA device on the current host, then quickly match its corresponding gencode settings, and automatically append them to the entire build process.
For example, if our host's current GPU is a Tesla P100 and can be automatically detected by xmake, then the following setting:
add_cugencodes("native")is equivalent to:
add_cugencodes("sm_60")CUDA/C/C++ Mixed Compilation
For mixed compilation, we just need to continue adding the corresponding c/c++ code files through the add_files interface. Isn't it simple?
target("test")
set_kind("binary")
add_files("src/*.cu")
add_files("src/*.c", "src/*.cpp")
add_cugencodes("native")Compilation Settings
When nvcc compiles internal c/c++ code, it actually calls the c/c++ compiler of the host environment to compile. For example, it defaults to using gcc/g++ on Linux, clang/clang++ on macOS, and cl.exe on Windows. If you want nvcc to use other compilers, such as using clang as the default c/c++ compiler on Linux, you need to specify the --ccbin= parameter setting. For this, please see: compiler-ccbin
In xmake, this is also supported. Just set xmake f --cu-ccbin=clang to switch to other compilers.
There are two other compilation parameters related to CUDA. Let me briefly introduce them:
xmake f --cu=nvcc --cu-ld=nvccAmong them, --cu is used to set the compiler for .cu code. The default is nvcc. However, clang now also supports compilation of .cu code, so you can switch the setting to try it. --cu-ld is used to set the device-link linker, while the final overall program link process still uses --ld for linking.