Hi, today I will show you some measurement results for my Ph.D. I am working on the first publication about DarkNet on OpenCL. The source code of this project you can find at https://github.com/sowson/darknet. The IEEE publication has to be consistent and intelligent. I cannot put on too many graphics and big tables, but I have a public blog site. So, I can post it here. First things first, the battle heroes come on the stage.
My workstation is using 2x NVIDIA Titan RTX 24 GB DDR6 or 2x XFX AMD Radeon VII 16 GB HBM2, and basically on the Ubuntu 18.04 I did the measurements. First, I would like to show you a caparison of the backpropagation, part of the training process. Truth be told, I asked the community several times to measure the performance and compare OpenCL versions. It did not happen, so I decided to invest in GPUs from AMD and make the comparison myself. Now I will show you the mentioned comparison of timings of the backpropagation part.
Now let me show you last back propagation convolutional layer only, but with all sub kernels inside to give you the option to caparison and choose the best GPU for DarkNet on OpenCL.
Lovely result of the AMD right? But only with CLBlast instead of clBLAS. Looks like AMD has to fix these basic linear algebra subsystems. Otherwise, it does make no sense to use it. The last thing to mention is that I am comparing top mainstream GPUs from NVidia and AMD, and AMD, I believe thanks to HBM2 VRAM, is working super lovely.
Regarding the IEEE publication, I am working on it; there will be a lovely story about the journey of DarkNet on OpenCL, many viewpoints, measurements, results, conclusions, and more, so stay tuned. Thanks for reading!
p ;).