Hi, today I will show you some measurements results for my PhD. I am working of the first publication about DarkNet on OpenCL, the source code of this project you can find at https://github.com/sowson/darknet. The IEEE publication has to be consistent and smart. I cannot put on it too much graphics and big tables… but waits, I have a public blog site. So, I can post it here. First things first the battle heroes come on the stage.
My workstation is using 2x NVIDIA Titan RTX 24 GB DDR6 or 2x XFX AMD Radeon VII 16 GB HBM2 and basically on the Ubuntu 18.04 I did the measurements. First, I would like to show you caparison of the back propagation, part of the training process. Truth be told I asked several times community to measure the performance and compare OpenCL versions. It was not happed, so I decided to invest in GPUs from AMD and make comparison by myself. Now I will show you the mentioned comparison of timings of back propagation part.
Now let me show you last back propagation convolutional layer only, but with all sub kernels inside to give you option to caparison and choose the best GPU for DarkNet on OpenCL.
Very nice result of the AMD right? But only with CLBlast instead of clBLAS. Looks like AMD have to fix this basic linear algebra subsystems, otherwise it does make no sense to use it. Last thing to mention is that I am comparing top mainstream GPUs from NVidia and AMD and AMD I believe thanks to HBM2 VRAM is working super nice.
Regarding the IEEE publication, I am working on it, there will be a nice story about the journey of DarkNet on OpenCL, many viewpoints, measurements, results, conclusions and more, so stay tuned. Thanks for reading!
p ;).