DarkNet in OpenCL

Hi, Today I would like to announce that my GitHub fork at https://github.com/sowson/darknet has a new update, the fork is an advanced port of DarkNet CNN from CUDA to OpenCL and tested on macOS with eGPU from Sonnet named Breakaway RX 570 Puck and on my GreenPC it also supports Intel Iris GPU, OpenCV 3, and there are several use cases for it. Yolo3, Yolo2, Yolo1, CIFAR-10 solutions work fine, using demo from webcam also, from mp4 videos as well. The overall performance is quite nice. I achieved a 20 FPS level on Yolo2, so as far as I know, it is the fastest DarkNet in OpenCL on the planet. Most BLAS kernels I rewrote from scratches. For some, I used my own idea auto tuner. And there is one more thing. For training, I changed the pseudo-random solution to the permutation set solution. It means that from n pictures when you get n times picture, you can have the same only once, implementation is trivial, but it is a training game changer.

Yolo2WebCam

On the previous post I put information on how to install on macOS and/or CentOS GNU/Linux. this is still actual and up to date. Below I want to share with you all test cases commands to run this OpenCL port I made thanks to very smart people that share many versions on the GitHub.

Training to remind you where I was on May 2018.

And now you may use for all below test case commands.

Yolo1

Yolo1

# Yolo1 Test on built-in GPU
./darknet yolo test cfg/yolov1.cfg ../weights/yolov1.weights data/dog.jpg

# Yolo1 Demo from 1st WebCam on Computer on eGPU
./darknet yolo demo cfg/yolov1.cfg ../weights/yolov1.weights -i 1 -c 0

# Yolo1 Demo from MP4 Movie on eGPU
./darknet yolo demo cfg/yolov1.cfg ../weights/yolov1.weights ../movies/movie.mp4 -i 1

# Yolo1 Train from ../train folder on eGPU
../darknet/darknet yolo train yolov1.cfg voc.data extraction.conv.weights -i 1

Yolo2

Yolo2

# Yolo2 Test on built-in GPU
./darknet detect cfg/yolov2.cfg ../weights/yolov2.weights data/dog.jpg

# Yolo2 Demo from 1st WebCam on Computer on eGPU
./darknet detector demo cfg/coco.data cfg/yolov2.cfg ../weights/yolov2.weights -i 1 -c 0

# Yolo2 Demo from MP4 Movie on eGPU
./darknet detector demo cfg/coco.data cfg/yolov2.cfg ../weights/yolov2.weights ../movies/movie.mp4 -i 1

# Yolo2 Train from ../train folder on eGPU
../darknet/darknet detector train voc.data yolo-voc.2.0.cfg darknet19_448.conv.23 -i 1

Yolo3

Yolo3

# Yolo3 Test on built-in GPU
./darknet detect cfg/yolov3.cfg ../weights/yolov3.weights data/dog.jpg

# Yolo3 Demo from 1st WebCam on Computer on eGPU
./darknet detector demo cfg/coco.data cfg/yolov3.cfg ../weights/yolov3.weights -i 1 -c 0

# Yolo3 Demo from MP4 Movie on eGPU
./darknet detector demo cfg/coco.data cfg/yolov3.cfg ../weights/yolov3.weights ../movies/movie.mp4 -i 1

# Yolo3 Train from ../train folder on eGPU
../darknet/darknet detector train voc.data yolov3-voc.cfg darknet53.conv.74 -i 1

CIFAR-10

# CIFAR-10 training from ../cifar folder on CPU
../darknet/darknet classifier train cfg/cifar.data cfg/cifar_small.cfg -nogpu

# CIFAR-10 training from ../cifar folder on built-in GPU
../darknet/darknet classifier train cfg/cifar.data cfg/cifar_small.cfg

# CIFAR-10 training from ../cifar folder on eGPU
../darknet/darknet classifier train cfg/cifar.data cfg/cifar_small.cfg -i 1

# CIFAR-10 validation test on eGPU ../cifar folder on eGPU (_test cfg has batch=1)
../darknet/darknet classifier valid cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup -i 1

# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/train/35728_automobile.png

# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/6298_cat.png

# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/4882_frog.png

# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/2568_truck.png

# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/5238_bird.png

Enjoy!

p ;).

33 Replies to “DarkNet in OpenCL”

  1. @piotr.sowa,
    I would like to know that can I use this method in Raspberry pi 3 with help of Movidius neural stick for real time object recognition
    Thanks in advance

  2. @Hashir, it is not my priority for some time… but I am happy to announce that… Yolo3-spp now is supported 😀

    • @piotr.sowa
      Thanks for your valuable comment , nd also how can I do that yolo3-spp on pi, ist same as that of yolo3 or tinyYolo. What is meant by spp
      Thnq in advance

      • @Hashir, for you, I added RPI option to define in build options. Please clone my repo than edit Makefile on top and disable OPENCV=0 (optionally) and enable RPI=1 on your RPi. Then please try to install VC4CL, make the darknet and let me know how it goes. OK? I need your help because I do not have right now any free RPi for that tests…

  3. @piotr.sowa , after successfully installed VC4CL and downloaded darknet repo from your GitHub repo and after running I got two errors
    1) I did the same steps as u mentioned in the previous comment, that is I disabled GPU=0, GPU_FAST=0 nd OPENCV=0 nd RPI=1 after make the darknet I got following error even for GPU nd GPU fast nd opencv =1
    ibdarknet.a -o darknet -lm -lpthread libdarknet.a
    make: warning: Clock skew detected. Your build may be incomplete.
    2) bedside of that I just run the command mentioned above in CMD from Ur darknet folder
    ./darknet yolo test cfg/yolov1.cfg ../weights/yolov1.weights data/dog.jpg
    After running this I got again error

    pi@raspberrypi:~/Downloads/darknet-master $ ./darknet yolo test cfg/yolov1.cfg ../weights/yolov1.weights data/dog.jpg
    layer filters size input output
    0 conv 64 7 x 7 / 2 448 x 448 x 3 -> 224 x 224 x 64 0.944 BFLOPs
    1 max 2 x 2 / 2 224 x 224 x 64 -> 112 x 112 x 64
    2 conv 192 3 x 3 / 1 112 x 112 x 64 -> 112 x 112 x 192 2.775 BFLOPs
    3 max 2 x 2 / 2 112 x 112 x 192 -> 56 x 56 x 192
    4 conv 128 1 x 1 / 1 56 x 56 x 192 -> 56 x 56 x 128 0.154 BFLOPs
    5
    conv 256 3 x 3 / 1 56 x 56 x 128 -> 56 x 56 x 256 1.850 BFLOPs
    6 conv 256 1 x 1 / 1 56 x 56 x 256 -> 56 x 56 x 256 0.411 BFLOPs
    7 conv 512 3 x 3 / 1 56 x 56 x 256 -> 56 x 56 x 512 7.399 BFLOPs
    8 max 2 x 2 / 2 56 x 56 x 512 -> 28 x 28 x 512
    9 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
    10 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
    11 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
    12 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
    13 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
    14 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
    15 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
    16 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
    17 conv 512 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 512 0.411 BFLOPs
    18 conv 1024 3 x 3 / 1 28 x 28 x 512 -> 28 x 28 x1024 7.399 BFLOPs
    19 max 2 x 2 / 2 28 x 28 x1024 -> 14 x 14 x1024
    20 conv 512 1 x 1 / 1 14 x 14 x1024 -> 14 x 14 x 512 0.206 BFLOPs
    21 conv 1024 3 x 3 / 1 14 x 14 x 512 -> 14 x 14 x1024 1.850 BFLOPs
    22 conv 512 1 x 1 / 1 14 x 14 x1024 -> 14 x 14 x 512 0.206 BFLOPs
    23 conv 1024 3 x 3 / 1 14 x 14 x 512 -> 14 x 14 x1024 1.850 BFLOPs
    24 conv 1024 3 x 3 / 1 14 x 14 x1024 -> 14 x 14 x1024 3.699 BFLOPs
    25 conv 1024 3 x 3 / 2 14 x 14 x1024 -> 7 x 7 x1024 0.925 BFLOPs
    26 conv 1024 3 x 3 / 1 7 x 7 x1024 -> 7 x 7 x1024 0.925 BFLOPs
    27 conv 1024 3 x 3 / 1 7 x 7 x1024 -> 7 x 7 x1024 0.925 BFLOPs
    28 Segmentation fault

    Pls help me out to solve this
    Thanks in advance
    Regards
    Hashir

  4. @Hashir, Pls try GPU=1 GPU_FAST=1 RPI=1 and put for all the rest 0s. Then please go to file “src/opencl.c” and find a line with CL_DEVICE_TYPE_GPU and pls try to change it to CL_DEVICE_TYPE_ACCELERATOR. Then use Yolo2-Tiny, not Yolo1 and send the output of the detection test, ok? Looks like I forget that there is no GPU but ACCELERATOR. Pls let me know how it goes I am very interested result of your work :). Thanks!

  5. My test is as follows on CPU. But we need GPU on CPU is too slow I think. But I fail on VC4CL installation.

    root@raspberrypi:~/darknet# ./darknet detect cfg/yolov2.cfg ../weights/yolov2.weights data/dog.jpg
    layer filters size input output
    0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs
    1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
    2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs
    3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
    4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
    5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs
    6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
    7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
    8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
    9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
    10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
    11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
    12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
    13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
    14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
    15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
    16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
    17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
    18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
    19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
    20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
    21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
    22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
    23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
    24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
    25 route 16
    26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64 0.044 BFLOPs
    27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
    28 route 27 24
    29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024 3.987 BFLOPs
    30 conv 425 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 425 0.147 BFLOPs
    31 detection
    mask_scale: Using default ‘1.000000’
    Loading weights from ../weights/yolov2.weights…Done!
    data/dog.jpg: Predicted in 157.359320 seconds.
    dog: 81%
    truck: 74%
    bicycle: 83%

    p ;).

  6. @piotr.sowa, after putting GPU=1 GPU_FAST=1 RPI=1 and put for all the rest 0s, also changed CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_ACCELERATOR in opencl.c. after this i just re make again from darknet master directory. but unfortunately i got error

    pi@raspberrypi:~/Downloads/darknet-master $ make
    make: Warning: File ‘gemm.c’ has modification time 357508 s in the future
    gcc -Iinclude/ -Isrc/ -DGPU -DOPENCL -DRPI -DGPU_FAST -Wall -Wno-unknown-pragmas -Wno-unused-variable -Wfatal-errors -fPIC -O2 -DGPU -DOPENCL -DRPI -I/usr/include/ -I/usr/local/include/ -DGPU_FAST -c ./src/gemm.c -o obj/gemm.o
    ./src/gemm.c:170:20: fatal error: clBLAS.h: No such file or directory
    #include “clBLAS.h”
    ^
    compilation terminated.
    Makefile:113: recipe for target ‘obj/gemm.o’ failed
    make: *** [obj/gemm.o] Error 1

    thanks in advance
    regards
    hashir

  7. @piotr.sowa, after running yolov2tiny version I got output , but prediction was enterly different from the original image nd also I would like to know that yolo versions other than tiny yolo can run in pi ?
    My output is given below

    pi@raspberrypi:~/Downloads/darknet-master $ ./darknet detect cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights data/horses.jpg
    layer filters size input output
    0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
    1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
    2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
    3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
    4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
    5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
    6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
    7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
    8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
    9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
    10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
    11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
    12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
    13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
    14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125 0.043 BFLOPs
    15 detection
    mask_scale: Using default ‘1.000000’
    Loading weights from yolov2-tiny-voc.weights…Done!
    data/horses.jpg: Predicted in 38.437360 seconds.
    traffic light: 75%

  8. @Hashir, please see my output, but before you do the same please clone one more time my repo and then RPI=1 change in Makefile. It failed on my test, but it used OpenCL, I think even Yolo2-Tiny is too big for RPi, but you may try to train CIFAR-10 and test it… one more thing is that after detecting GPU it takes a few minutes to build all OpenCL kernels, so be patient. if you need test with no GPU use “-nogpu” parameter. Thanks!

    root@raspberrypi:~/darknet# ./darknet detect cfg/yolov2-tiny.cfg ../weights/yolov2-tiny.weights data/dog.jpg -thersh .1
    Device ID: 0
    Device name: VideoCore IV GPU
    Device vendor: Broadcom
    Device opencl availability: OpenCL 1.2 VC4CL 0.4
    Device opencl used: 0.4
    Device double precision: NO
    Device max group size: 12
    Device address bits: 32
    layer filters size input output
    0 could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT
    could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT
    could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT
    could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT
    conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
    1 could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT
    could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT
    could not push array to device. error: CL_OUT_OF_RESOURCES
    could not push array to device. error: CL_INVALID_MEM_OBJECT

    p ;).

  9. With the last commit, you should be able to calculate this. Unfortunately, it is slow, it calculates detection wrong now, but maybe in short time, VC4CL will be better. Fingers-crossed.

    root@raspberrypi:~/cifar# ../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg ../weights/cifar_small.weights data/cifar/test/4882_frog.png
    Device ID: 0
    Device name: VideoCore IV GPU
    Device vendor: Broadcom
    Device opencl availability: OpenCL 1.2 VC4CL 0.4
    Device opencl used: 0.4
    Device double precision: NO
    Device max group size: 12
    Device address bits: 32
    layer filters size input output
    0 conv 32 3 x 3 / 1 28 x 28 x 3 -> 28 x 28 x 32 0.001 BFLOPs
    1 max 2 x 2 / 2 28 x 28 x 32 -> 14 x 14 x 32
    2 conv 16 1 x 1 / 1 14 x 14 x 32 -> 14 x 14 x 16 0.000 BFLOPs
    3 conv 64 3 x 3 / 1 14 x 14 x 16 -> 14 x 14 x 64 0.004 BFLOPs
    4 max 2 x 2 / 2 14 x 14 x 64 -> 7 x 7 x 64
    5 conv 32 1 x 1 / 1 7 x 7 x 64 -> 7 x 7 x 32 0.000 BFLOPs
    6 conv 128 3 x 3 / 1 7 x 7 x 32 -> 7 x 7 x 128 0.004 BFLOPs
    7 conv 64 1 x 1 / 1 7 x 7 x 128 -> 7 x 7 x 64 0.001 BFLOPs
    8 conv 10 1 x 1 / 1 7 x 7 x 64 -> 7 x 7 x 10 0.000 BFLOPs
    9 avg 7 x 7 x 10 -> 10
    10 softmax 10
    Loading weights from ../weights/cifar_small.weights…Done!
    data/cifar/test/4882_frog.png: Predicted in 1.647542 seconds.
    16.96%: dog
    16.44%: deer

    p ;).

  10. I would like to try it with a Raspberry Pi, but I’m confused as to how to do it. Are there instructions as to what to get and build?

    • @PeterQuinn, please look into Makefile, there is instruction on top, install VC4CL, set OPENCV=0 and RPI=1, save the file, make and have fun :-).

      • Thanks. Lots of steps (and recursive instructions) but reasonable straightforward to install VC4CL.
        I have darknet compiled but it appears to hang.

        pi@raspi3:~/darknet $ sudo ./darknet detect cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg
        Device ID: 0
        Device name: VideoCore IV GPU
        Device vendor: Broadcom
        Device opencl availability: OpenCL 1.2 VC4CL 0.4
        Device opencl used: 0.4
        Device double precision: NO
        Device max group size: 12
        Device address bits: 32

        Any ideas?

        • oh. After a long wait (20 minutes?) it continues:

          layer filters size input output
          10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
          11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
          12 could not push array to device. error: CL_OUT_OF_RESOURCES
          could not push array to device. error: CL_INVALID_MEM_OBJECT
          could not push array to device. error: CL_OUT_OF_RESOURCES

  11. one more update -- I increased the memory available to the GPU to 768 and now it finishes loading the network. It still fails later though.

    terminate called after throwing an instance of ‘std::out_of_range’
    what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
    Aborted

    Any ideas?

    • @PeterQuinn, so for now only “-nogpu” switch works to me. I checked with the VC4CL author and this solution is still under development and does not work fine. Even with extended memory, because for example log, sqrt, pow functions are not implemented yet and there are critical to work on this. I updated also the source code, so please pull the latest version it may help. The error you posted is happening from time to time and is not deterministic.

  12. Thank you, I do not have ASUS Tinker Board, if it has OpenCL it should work, maybe with a little Makefile change. And yes I consider CLBlast, however, it failed with Intel Iris GPU and I do not want that.

  13. @piotr.sowa Hey, would I be able to run your openCL implementation of YOLOv3 on FPGA ? Do i need to make any changes ?

  14. Hello piotr.sowa. Your project is really cool. I want run your code on my computer. But after I build it with OpenCV=OFF GPU=ON GPU_FAST=ON CPU=OFF, other options are default. I run darknet and it show me as image below.
    http://i2.tiimg.com/692688/62e5261f35719984.png (sorry, I don’t know how to upload images into this text box)
    My platform is Ubuntu 18 desktop x64, AMD RX480 GPU, intel E5 2670 CPU. I install lib boost by apt-get, and build clBLAS with BUILD_TEST=OFF, other options are default.
    Could you give me some information about this issue? Thank you a lot.

    • Sorry, I do not have access to this screenshot / image. But, for AMD based platform you have to change some lines in Makefile, to do so, look for “cuda” statements. Thanks!

      • Piotr
        Thank you for quickly reply.
        Let me check it. If I have a father question, ping you.
        Thanks in Advance.
        Jin

  15. Hello Piotr Sowa.
    I’m trying to make you port to work properly with a Intel UHD Graphics 630 GPU in Linux, with the dog photo and the YoloV3-tiny the predictions takes 8 seconds while with a Intel i5 CPU it takes 1 seconds, checking the GPU usage I noticed a very little workload, its seem to be using only one core of the GPU.
    Any ideas about this issue? thanks in advance and thanks for this port!

  16. Piotr
    Thank you for quick reply.
    Let me check it. If I have a father question, ping you.
    Thanks in Advance.
    Jin

  17. Hi @piotr.sowa
    I’m trying to build your model in de1-soc board. Is there any guide for me to do it?
    Thank you in advance!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.