This project aims to simulate multiple cylindrical moving objects in an unified flow. The fluid behavior and the interaction between cylindrical solid object are described by Lattice-Boltzmann Method (LBM) and modified bouncing-back rule. The programs are written in C and OpenCL for CPU/GPU offloading support, with improved processing speed and memory management.
Before you start using this simulator, take a look at this note: What's the physics of this LBM simulation?.
A high resolution simulation result of above 3 cylinders example:
- Video: 3cir_1080p.mp4
- Device: NVIDIA GeForce GTX 660
- Resolution: 1920x1080 pixels.
- Compute time: 5637.89 seconds for 10000 iterations.
- Build from Source
- Quick Start
- Start a Simulation Step by Step
- Design and Analyze an Experiment
- Troubleshooting
Most C programs are written in ISO C. However some of the environmental configuration would be nasty for clang
when you are compiling OpenCL kernel program. glibc
is recommended instead. As for the The OpenCL driver, it really depends on the platform you have. You should check your OS instruction for the driver packages needed. In Archlinux they are
Runtime
-
OpenCL (For C binary)
- Intel GPU:
intel-compute-runtime
- Intel CPU:
intel-opencl-runtime<sup>AUR</sup>
- Nvidia GPU:
opencl-nvidia
- AMD GPU:
opencl-mesa
- AMD CPU: Not supported anymore.
- Intel GPU:
-
Tools (For shell script)
time
: Linux built-in, GNU version also works.gnuplot
: For data analysis and visualization.ffmpeg
: For MP4 video generation.clinfo
: good for monitoring all possible platform properties of the system.
Development
- ICD loader:
ocl-icd
- Headers:
opencl-headers
Be advice that the Opencl target version need to be defined in you host program as
#define CL_TARGET_OPENCL_VERSION 300
. while300
stands for the version 3.0.0. Additionally, C11 Atomic operations support for 64-bits integer is required by the force calculation insimulate_ocl.cl
, you will need to check if the device extension ofcl_khr_int64_atomics
is available for your desire platform.
The programs can be built and installed with:
$> cd LBM_CYMB/src
$> make
$> make install
make install
will copy all binaries and OpenCL kernel source file into LBM_CYMB/bin
. Clean all binaries with make clean
if you want a fresh make.
Inside the directory there has:
./
├── bin
├── exp_sets
├── img
├── physics.md
├── README.bak
├── README.md
├── schedule
├── simulator
├── speed_test
└── src
4 directories, 6 files
Three Bash wrapper scripts are written for different procedures:
To start an example simulation:
$ ./simulator exp_sets/example
[Parameters]: (* config value)
<Stratage>
1. Similarity for the Reynolds number
2. Spectify CL for CT
<Dimensionless>
* Mach number(MA): 0.300000
Reynolds number: 236.514634
Grid Reynolds number: 0.131397
<Lattice unit>
* Collision frequency(CF): 0.550000
Kinematic viscosity: 1.318182
* BCV D: 0.300000 Ux: 0.100000 Uy: 0.000000
* Size nx: 800 ny: 600
Speed of sound(Csl): 0.333333
<Conversion factors>
* Length(CL): 1.000000 (m/lattice space)
Time(CT): 0.001698 (secs/time step)
* Density(CD): 1.225000kg/m^3
Mass: 1.225000kg
Force: 1274490.000000kg*m/s^2
Spring constant: 1274490.000000kg/s^2
Damping constant: 1249.500000kg/s
<SI unit>
Kinematic viscosity: 448.181818m^2/s
Size width: 800.000000m height: 600.000000m
* Speed of sound(CS): 340.000000m/s
BCV D: 0.367500kg/m^3 Ux: 102.000000m/s Uy: 0.000000m/s
<Dirty tricks>
* REFUEL_RTO: 0.500000
* EAT_RTO: 0.050000
<Objects>
[spring] [damping] [mass] [Nau_freq] [Nau_cyc]
0: 127449.000000kg/s^2 12495.000000kg/s 1225.000000kg 1.623380Hz 0.615999s
1: 127449.000000kg/s^2 12495.000000kg/s 1225.000000kg 1.623380Hz 0.615999s
2: 127449.000000kg/s^2 12495.000000kg/s 1225.000000kg 1.623380Hz 0.615999s
Simulate......3/100
example
directory collect the environmental setups for the simulation. Visulaized data will be generated in example/output
with MP4 format, while the kinetic parameters like speeds and accelerations of the cylinders will be recorded in data
.
File | Describe |
---|---|
output/0.mp4 | Density matrix |
output/1.mp4 | Speed matrix in x axis |
output/2.mp4 | Speed matrix in y axis |
data | Kinetics data of cylinders |
If anything happen unexpected, please refer to Troubleshooting.
If multiple simulations need to be performed, write a list 3c.sch
like this
$ cat 3c.sch
exp_sets/3c_480p
exp_sets/3c_720p
exp_sets/3c_1080p
and the sequence can be initiated with:
$ ./schedule 3c.sch
Sometime before an experiment, benchmarking is necessary. Choose the experiment setup and all available devices and working group sizes will be tested.
$ ./speed_test ext_sets/example
...
...
...
2/0/80/60... N/A
2/0/40/30... N/A
2/0/32/24... 4.17
2/0/20/15... 4.31
2/0/16/12... 4.12
2/0/8/6... 5.10
2/0/4/3... 9.24
completed!
Results:
0/0/16/12 17.74
0/0/8/6 17.82
0/0/4/3 22.50
2/0/16/12 33.20
2/0/32/24 33.67
2/0/20/15 35.24
2/0/8/6 43.44
2/0/4/3 85.32
1/0/20/15 95.82
1/0/8/6 97.87
1/0/4/3 98.80
1/0/16/12 98.81
1/0/32/24 99.74
1/0/40/30 100.69
1/0/80/60 103.15
1/0/100/75 104.41
N/A
means that this setup cannot be adopted by OpenCL driver. According to the test above, following setup will have the best performance:
PLATFORM 0
DEVICE 0
WORK_ITES_0 16
WORK_ITES_1 12
Use clinfo -l
to find out what excetly the devices are:
Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) UHD Graphics 620 [0x3ea0]
Platform #1: Intel(R) OpenCL
`-- Device #0: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Platform #2: NVIDIA CUDA
`-- Device #0: NVIDIA GeForce MX150
However, be advice that even the experiment setup can be executed, the actual result may not be correct (exceed maximum flow speed in Lattice-Boltzmann assumption for example). please refer to Strange Video Output.
Following steps can be altered by your own needs, feel free to play around with it.
To decide a good resolution for the simulation, you should consider with:
-
The fluid phenomenon you are studying at.
The higher the resolution is, the more accurate simulation you get. However there dose not have a good calculation to tell you what resolution is enough for you, the only approach is benchmarking.
-
The device you are using.
All devices will have their own preferred work-group-size-multiple affected by the number of compute units and the size of cache. Check the value with
clinfo
. Most of the case, Intel CPU will go for 128 multiple, while GPU will go for 32 multiple. A simple approach is following the screen resolution, since it is how GPU is designed for. However it may not always be the best one for sure.
Inside an experiment setup directory we have:
.
├── a.bc
├── a.nd
├── data
├── default.conf
├── log
└── output
├── 0.mp4
├── 1.mp4
└── 2.mp4
1 directory, 8 files
To perform an experiment, you will need to tweak these files:
Name | Description |
---|---|
a.bc | Boundary condition file |
a.nd | Number Density file |
default.conf | Main configuration file |
And following files are used to collect and analyze data.
Name | Description |
---|---|
data | Kinetics data for cylinders |
log | Running conditions and status |
output/ | Result videos or ND files |
Number density matrix describe particle densities in different velocities and positions. A D2Q9 ND file, with 480x270 in size, is formatted as
480 270 9
[m0q0] [m0q1] [m0q2] ...
[m1q0] [m1q1] [m1q2] ...
[m2q0] [m2q1] [m2q2] ...
...
Use nd_gen
to make an initial ND file with following options:
Usage: nd_gen [OPTION]
Otions:
-x #, Numer of column
-y #, Number of row
-d #, Density (Default: 0.3)
-i #, Ux (Default: 0.12)
-j #, Uy (Default: 0)
-h , Print this help page
Example:
nd_gen -x 100 -y 30 -d 0.3 -i 0.1 -j 0 > exp_sets/example/a.nd
Boundary condition include following classes:
- BCV: Unified fluid that surround the simulation area.
- CY: Cylinder object.
For example:
bc_no 2
bc_nq 2
BCV {
dnt 1
ux 0.0
uy 0.0
}
CY {
spring 0.1
damp 0
mass 1000
rad 20
force 0 0
acc 0 0
vel 0 0
dsp 10 0
pos 250 70
}
CY {
spring 0.1
damp 0
mass 1000
rad 20
force 0 0
acc 0 0
vel 0 0
dsp 10 0
pos 250 150
}
Parameter | Description |
---|---|
bc_no | Number of objects |
bc_nq | Kinematic dimension of objects, usually 2 in 2D simulation. |
dnt | Density |
ux | Macro velocity in x axis |
uy | Macro velocity in y axis |
spring | Spring constant of the cylinder, in lattice unit |
force | Initial value of applied force of the cylinder, in lattice unit |
damp | Damping constant of the cylinder, in lattice unit |
mass | Mass of the cylinder, in lattice unit |
rad | Radius of the cylinder, in lattice unit |
force | Initial value of applied force of the cylinder, in lattice unit |
acc | Initial value of applied force of the cylinder, in lattice unit |
vel | Initial value of velocities of the cylinder, in lattice unit |
dsp | Initial value of displacements of the cylinder, in lattice unit |
pos | Initial value of positions of the cylinder, in lattice unit |
Following parameters are included in a configuration file:
Parameter | Description | Default |
---|---|---|
LOOP | Iteration of SKP . |
1 |
SKP | Iteration of fluid propagate by time step. | 1 |
CF | Collision frequency in Lattice unit | 1 |
CS | Speed of sound in SI unit. | 340 |
CL | Dimensional value of length | 1 |
CD | Dimensional value of Density | 1 |
MA | Mach number(Dimensionless) | 0.1 |
IS_MP4 | Make mp4 videos or not. | 0 |
IS_SAVE_DATA | Save final ND matrix or not. | 0 |
IS_FILE_OUTPUT | Save ND matrix for every loops or not. | 0 |
ND_FILE | ND filename | NULL |
BC_FILE | BC filename | NULL |
OUTPUT_DIR | Output filename | NULL |
PROGRAM_FILE | OpenCL source filename | NULL |
REFUEL_RTO | Refuel ratio | 0.8 |
EAT_RTO | Eat ratio | 0.01 |
LOG_FILE | Log filename | NULL |
DATA_FILE | Data filename | NULL |
IS_PAR_PRINT | Print log to stdout or not. | 1 |
IS_PROGRESS_PRINT | Print progress or not | 1 |
PL_MAX_D | Maximum density value for the jetcolormap ploting | 0.5 |
PL_MAX_UX | Maximum x velocity value for the jetcolormap ploting | 0.1 |
PL_MAX_UY | Maximum y velocity value for the jetcolormap ploting | 0.1 |
PLATFORM | Working platform for OpenCL | 0 |
DEVICE | Working device for OpenCL | 0 |
WORK_ITEM_0 | Work-group size in dimension 0 | 1 |
WORK_ITEM_1 | Work-group size in dimension 1 | 1 |
-
Platform & Device
A platform is an specific OpenCL implementation, e.g. Intel, AMD or Nvida CUDA. A device is the actual processor that perform the calculation, like
Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
,NVIDIA GeForce MX150
. -
Work-group & Work-item
A work-group is processed by a single compute unit in the device. For a CPU device it prefer a larger work-group size, while a GPU works opposite. The amount of work-item in a work-group has its limitation, check
Max work group size
for the value (useclinfo
). For multi-dimension work-group, the total amount of work-item cannot exceed Max work group size, and the work-item in each dimension will have their own limitation. CheckMax work item sizes
for the value.
Use the script speed_test
to choose the best environment setup. If the chosen work-group size exceeds the max allowed work-group size for the device, the result will not be printed. Example:
Run simulator
with desire experiment setup:
$> ./simulator exp_sets/1920x1080
If multiple experiments are going to be launched at once, use schedule
plus a plain text list file of experiment setups directory name, like:
$> cat exp_grp1
exp_sets/480x270_skp10
exp_sets/480x270_skp20
exp_sets/480x270_skp30
exp_sets/480x270_skp40
exp_sets/480x270_skp50
And launch with:
$> ./schedule exp_grp1
There are 4 types of results generated by program.
Usually the file path defined in DATA_FILE
is used to collect all kind of text-based data. By default, BC_print
will print all kinetic parameters of all objects in the following format:
#[obj][time][force][acc][vel][dsp][pos]
0 0.000000 10946785.783500 0.000000 2958.925152 0.000000 58.211965 0.000000 0.000000 0.00000 0 180.000000 135.000000
1 0.000000 10946785.783500 0.000000 2958.925152 0.000000 58.211965 0.000000 0.000000 0.00000 0 300.000000 135.000000
0 0.019608 8164637.838000 0.000000 2128.069005 0.000000 100.661622 0.000000 1.141411 0.00000 0 181.141411 135.000000
1 0.019608 8164637.838000 0.000000 2128.069005 0.000000 100.661622 0.000000 1.141411 0.00000 0 301.141411 135.000000
...
You can adjust this output with any format you want by changing the BC_print
function in lbmcl.c
.
If you set IS_MP4
to 1
in default.conf
, the macroscopic matrix of density, velocities in x and y axis in every loops will be printed as PPM image and stream to ffmpeg
with pipeline. Since the resolution of the experiment may be very big (something like 1920x1080), the disk usage will easily exceed 50GB if we save all images. Therefore, we do it in pipeline to minimize the number of saved matrix during the process and compress the output in MP4 format.
The maximum and minimum values for jetcolormaps are defined as:
- [Density] min: 0 max: PL_MAX_D
- [Ux] min:-1 * PL_MAX_UX max: PL_MAX_UX
- [Uy] min:-1 * PL_MAX_UY max: PL_MAX_UY
Sometime we want to inspect a phenomenon where the fluid is in its steady state, and it may cost 20k+ iterations to reach that state. In this situation you can run a simulation to steady state once and saved the final ND matrix as fin.nd
, which can be used as initial ND matrix a.nd
in future experiments.
With this option set to 1
the program will saved ND matrix in every iterations. As we mention before the disk usage will be massive, it is really not a good idea unless you are running a small test with a small number of iterations.
Here I will show you a working example for designing and analyzing an experiment. Let's say, today I am curious about the behaviors of two cylinders listed alongside the flow direction, with different separations between them as shown below:
The value we want to control is the initial separations between two cylinders, and all others will stay the same.
First, I will need to generate all experiment setups. Creating a new directory 2c_offset
in exp_sets
.
2c_offset/
├── cal_offset
├── mkexp_2c
├── offset
├── offset.c
├── plot_offset.p
└── tmp
├── a.bc
├── a.nd
├── data
├── default.conf
├── log
└── output
2 directories, 10 files
While tmp
stored the template of the flow condition as usual, cal_offset
, mkexp_2c
, offset
and plot_offset.p
will be explained later.
Tweak the parameters in this template to fit the fluid environments, like BCV
, LOOP
and a.nd
, etc. Benchmark with speed_test
to select the best platform & device configuration, and check the speed of fluid not exceeding three tenths of speed of sound by inspecting the MP4 videos. Next, change the value we want to control (separation in this example) with macros like C1X
and C2X
:
$> cat a.bc
bc_no 2
bc_nq 2
BCV {
dnt 0.3
ux 0.1
uy 0.0
}
CY {
spring 0.15
damp 1
mass 3000
rad 20
force 0 0
acc 0 0
vel 0 0
dsp 0 0
pos C1X 135
}
CY {
spring 0.15
damp 1
mass 3000
rad 20
force 0 0
acc 0 0
vel 0 0
dsp 0 0
pos C2X 135
}
And now we can write a small script named mkexp_2c
to automatic generate all the experiment setups by replacing the macros we just created:
#! /bin/bash
name="2c_offset"
dir=`pwd`
mid="240"
printf "" > 2c.sch
for i in `seq 40 10 160`;do
rm -r $dir/$name-$i 2> /dev/null
cp $dir/tmp -r $dir/$name-$i
sed -i "s/C1X/$((mid-i))/" $dir/$name-$i/a.bc
sed -i "s/C2X/$((mid+i))/" $dir/$name-$i/a.bc
printf "create exp_sets/$name-$i\n"
printf "$dir/$name-$i\n" >> 2c.sch
done
This script will also generate a schedule file named 2c.sch
, therefore you can start all of the simulations easily with:
./schedule 2c.sch
By default, all of the kinetic data are saved in debug
in each setups directory. Write a small script named cal_offset
to collect all of them with the desire column and generate a plot script plot_offfset.p
for Gnuplot as
#! /bin/bash
col="9"
pl="plot_offset.p"
echo "#! /usr/bin/gnuplot" > $pl
chmod a+x $pl
echo "set grid" >> $pl
echo "plot \\" >> $pl
ls | grep 2c_offset- | while read dir;do
offset=`echo $dir | sed 's/2c_offset-//'`
cut -f $col -d ' ' $dir/data | tail -n+2 | ./offset > .xoff_$offset
echo "'.xoff_$offset' w l, \\" >> $pl
done
echo "" >> $pl
echo "pause mouse close" >> $pl
Here I also wrote a small C program offset
to calculate the offsets in stream input from debug:
#include<stdio.h>
double a,b;
int main(){
while(scanf("%lf %lf",&a,&b) > 0){
printf("%lf\n",b-a);
}
}
And now we can execute the scripts and generate a chart with the variation of the distance between two cylinders in different initial offsets.
$> cd exp_stes
$> ./cal_offset
$> ./plot_offset.p
The data represent the amount of offset variations between two cylinders in different axis.
If you encounter something like this:
selected platform not exist: p4
real 0.44
user 0.23
sys 0.08
Which means the platform being specifed in ext_sets/example/default.conf
didn't exist. Check available platform with clinfo -l
, following information should be shown:
Platform #0: Intel(R) OpenCL HD Graphics
`-- Device #0: Intel(R) UHD Graphics 620 [0x3ea0]
Platform #1: Intel(R) OpenCL
`-- Device #0: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Platform #2: NVIDIA CUDA
`-- Device #0: NVIDIA GeForce MX150
If NVIDIA CUDA
is the desire one, change the value of PLATFORM
to 2
in the configuration file ext_sets/example/default.conf
:
..
PLATFORM 2
..
If the video crash on the edge of the cylinder, possible reasons are listed below.
Caused by the lake of the GPU memories in each working group. The force acting on the cylinders are collected by long
variables between working groups, and the force will be calculated at the end of each loop. Therefore, by reducing the force need to be collected in each loop will solve the problem.
- Reduce resolution in
a.nd
. - Reduce the size of the cylinder in
a.bc
. - Reduce SKP skip iterations
default.conf
. - Reduce
FC_OFFSET
inlbmcl.c
andsimulate_ocl.cl
Even the unified flow specified in a.bc
may not exceed 1 Mach, the resultant flow speed of the collision might be. Usually happen in high resolution ND matrix, since the lattice flow speed will be larger compared to a lower resolution one. If the object move too fast, the collided flow speed may also exceed the limitation of Lattice-Boltzmann distribution. Following adjustments that slow cylinder down may resolve the problem.
- Increase the mass of the cylinder.
- Increase the damping ratio of the cylinder.
- Reduce the speed of the unified flow.
- Increase the collision frequency.