Hi, Today I would like to announce that my GitHub fork at https://github.com/sowson/darknet has a new update, the fork is an advanced port of DarkNet CNN from CUDA to OpenCL and tested on macOS with eGPU from Sonnet named Breakaway RX 570 Puck and on my GreenPC it also supports Intel Iris GPU, OpenCV 3, and there are several use cases for it. Yolo3, Yolo2, Yolo1, CIFAR-10 solutions work fine, using demo from webcam also, from mp4 videos as well. The overall performance is quite nice. I achieved a 20 FPS level on Yolo2, so as far as I know, it is the fastest DarkNet in OpenCL on the planet. Most BLAS kernels I rewrote from scratches. For some, I used my own idea auto tuner. And there is one more thing. For training, I changed the pseudo-random solution to the permutation set solution. It means that from n pictures when you get n times picture, you can have the same only once, implementation is trivial, but it is a training game changer.
On the previous post I put information on how to install on macOS and/or CentOS GNU/Linux. this is still actual and up to date. Below I want to share with you all test cases commands to run this OpenCL port I made thanks to very smart people that share many versions on the GitHub.
Training to remind you where I was on May 2018.
And now you may use for all below test case commands.
Yolo1
# Yolo1 Test on built-in GPU
./darknet yolo test cfg/yolov1.cfg ../weights/yolov1.weights data/dog.jpg
# Yolo1 Demo from 1st WebCam on Computer on eGPU
./darknet yolo demo cfg/yolov1.cfg ../weights/yolov1.weights -i 1 -c 0
# Yolo1 Demo from MP4 Movie on eGPU
./darknet yolo demo cfg/yolov1.cfg ../weights/yolov1.weights ../movies/movie.mp4 -i 1
# Yolo1 Train from ../train folder on eGPU
../darknet/darknet yolo train yolov1.cfg voc.data extraction.conv.weights -i 1
Yolo2
# Yolo2 Test on built-in GPU
./darknet detect cfg/yolov2.cfg ../weights/yolov2.weights data/dog.jpg
# Yolo2 Demo from 1st WebCam on Computer on eGPU
./darknet detector demo cfg/coco.data cfg/yolov2.cfg ../weights/yolov2.weights -i 1 -c 0
# Yolo2 Demo from MP4 Movie on eGPU
./darknet detector demo cfg/coco.data cfg/yolov2.cfg ../weights/yolov2.weights ../movies/movie.mp4 -i 1
# Yolo2 Train from ../train folder on eGPU
../darknet/darknet detector train voc.data yolo-voc.2.0.cfg darknet19_448.conv.23 -i 1
Yolo3
# Yolo3 Test on built-in GPU
./darknet detect cfg/yolov3.cfg ../weights/yolov3.weights data/dog.jpg
# Yolo3 Demo from 1st WebCam on Computer on eGPU
./darknet detector demo cfg/coco.data cfg/yolov3.cfg ../weights/yolov3.weights -i 1 -c 0
# Yolo3 Demo from MP4 Movie on eGPU
./darknet detector demo cfg/coco.data cfg/yolov3.cfg ../weights/yolov3.weights ../movies/movie.mp4 -i 1
# Yolo3 Train from ../train folder on eGPU
../darknet/darknet detector train voc.data yolov3-voc.cfg darknet53.conv.74 -i 1
CIFAR-10
# CIFAR-10 training from ../cifar folder on CPU
../darknet/darknet classifier train cfg/cifar.data cfg/cifar_small.cfg -nogpu
# CIFAR-10 training from ../cifar folder on built-in GPU
../darknet/darknet classifier train cfg/cifar.data cfg/cifar_small.cfg
# CIFAR-10 training from ../cifar folder on eGPU
../darknet/darknet classifier train cfg/cifar.data cfg/cifar_small.cfg -i 1
# CIFAR-10 validation test on eGPU ../cifar folder on eGPU (_test cfg has batch=1)
../darknet/darknet classifier valid cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup -i 1
# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/train/35728_automobile.png
# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/6298_cat.png
# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/4882_frog.png
# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/2568_truck.png
# CIFAR-10 test on built-in GPU (_test cfg has batch=1)
../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg backup/cifar_small.backup data/cifar/test/5238_bird.png
Enjoy!
p ;).
@piotr.sowa,
I would like to know that can I use this method in Raspberry pi 3 with help of Movidius neural stick for real time object recognition
Thanks in advance
@Hashir, it is not my priority for some time… but I am happy to announce that… Yolo3-spp now is supported 😀
@piotr.sowa
Thanks for your valuable comment , nd also how can I do that yolo3-spp on pi, ist same as that of yolo3 or tinyYolo. What is meant by spp
Thnq in advance
@Hashir, for you, I added RPI option to define in build options. Please clone my repo than edit Makefile on top and disable OPENCV=0 (optionally) and enable RPI=1 on your RPi. Then please try to install VC4CL, make the darknet and let me know how it goes. OK? I need your help because I do not have right now any free RPi for that tests…
@piotr.sowa , after successfully installed VC4CL and downloaded darknet repo from your GitHub repo and after running I got two errors
1) I did the same steps as u mentioned in the previous comment, that is I disabled GPU=0, GPU_FAST=0 nd OPENCV=0 nd RPI=1 after make the darknet I got following error even for GPU nd GPU fast nd opencv =1
ibdarknet.a -o darknet -lm -lpthread libdarknet.a
make: warning: Clock skew detected. Your build may be incomplete.
2) bedside of that I just run the command mentioned above in CMD from Ur darknet folder
./darknet yolo test cfg/yolov1.cfg ../weights/yolov1.weights data/dog.jpg
After running this I got again error
pi@raspberrypi:~/Downloads/darknet-master $ ./darknet yolo test cfg/yolov1.cfg ../weights/yolov1.weights data/dog.jpg
layer filters size input output
0 conv 64 7 x 7 / 2 448 x 448 x 3 -> 224 x 224 x 64 0.944 BFLOPs
1 max 2 x 2 / 2 224 x 224 x 64 -> 112 x 112 x 64
2 conv 192 3 x 3 / 1 112 x 112 x 64 -> 112 x 112 x 192 2.775 BFLOPs
3 max 2 x 2 / 2 112 x 112 x 192 -> 56 x 56 x 192
4 conv 128 1 x 1 / 1 56 x 56 x 192 -> 56 x 56 x 128 0.154 BFLOPs
5
conv 256 3 x 3 / 1 56 x 56 x 128 -> 56 x 56 x 256 1.850 BFLOPs
6 conv 256 1 x 1 / 1 56 x 56 x 256 -> 56 x 56 x 256 0.411 BFLOPs
7 conv 512 3 x 3 / 1 56 x 56 x 256 -> 56 x 56 x 512 7.399 BFLOPs
8 max 2 x 2 / 2 56 x 56 x 512 -> 28 x 28 x 512
9 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
10 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
11 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
12 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
13 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
14 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
15 conv 256 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 256 0.206 BFLOPs
16 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512 1.850 BFLOPs
17 conv 512 1 x 1 / 1 28 x 28 x 512 -> 28 x 28 x 512 0.411 BFLOPs
18 conv 1024 3 x 3 / 1 28 x 28 x 512 -> 28 x 28 x1024 7.399 BFLOPs
19 max 2 x 2 / 2 28 x 28 x1024 -> 14 x 14 x1024
20 conv 512 1 x 1 / 1 14 x 14 x1024 -> 14 x 14 x 512 0.206 BFLOPs
21 conv 1024 3 x 3 / 1 14 x 14 x 512 -> 14 x 14 x1024 1.850 BFLOPs
22 conv 512 1 x 1 / 1 14 x 14 x1024 -> 14 x 14 x 512 0.206 BFLOPs
23 conv 1024 3 x 3 / 1 14 x 14 x 512 -> 14 x 14 x1024 1.850 BFLOPs
24 conv 1024 3 x 3 / 1 14 x 14 x1024 -> 14 x 14 x1024 3.699 BFLOPs
25 conv 1024 3 x 3 / 2 14 x 14 x1024 -> 7 x 7 x1024 0.925 BFLOPs
26 conv 1024 3 x 3 / 1 7 x 7 x1024 -> 7 x 7 x1024 0.925 BFLOPs
27 conv 1024 3 x 3 / 1 7 x 7 x1024 -> 7 x 7 x1024 0.925 BFLOPs
28 Segmentation fault
Pls help me out to solve this
Thanks in advance
Regards
Hashir
@piotr.sowa, I did all the steps mentioned above in raspberry pi 3 nd Intel movidius neural stick
@Hashir, Pls try GPU=1 GPU_FAST=1 RPI=1 and put for all the rest 0s. Then please go to file “src/opencl.c” and find a line with CL_DEVICE_TYPE_GPU and pls try to change it to CL_DEVICE_TYPE_ACCELERATOR. Then use Yolo2-Tiny, not Yolo1 and send the output of the detection test, ok? Looks like I forget that there is no GPU but ACCELERATOR. Pls let me know how it goes I am very interested result of your work :). Thanks!
My test is as follows on CPU. But we need GPU on CPU is too slow I think. But I fail on VC4CL installation.
root@raspberrypi:~/darknet# ./darknet detect cfg/yolov2.cfg ../weights/yolov2.weights data/dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BFLOPs
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BFLOPs
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BFLOPs
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BFLOPs
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BFLOPs
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BFLOPs
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BFLOPs
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64 0.044 BFLOPs
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024 3.987 BFLOPs
30 conv 425 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 425 0.147 BFLOPs
31 detection
mask_scale: Using default ‘1.000000’
Loading weights from ../weights/yolov2.weights…Done!
data/dog.jpg: Predicted in 157.359320 seconds.
dog: 81%
truck: 74%
bicycle: 83%
p ;).
@piotr.sowa, after putting GPU=1 GPU_FAST=1 RPI=1 and put for all the rest 0s, also changed CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_ACCELERATOR in opencl.c. after this i just re make again from darknet master directory. but unfortunately i got error
pi@raspberrypi:~/Downloads/darknet-master $ make
make: Warning: File ‘gemm.c’ has modification time 357508 s in the future
gcc -Iinclude/ -Isrc/ -DGPU -DOPENCL -DRPI -DGPU_FAST -Wall -Wno-unknown-pragmas -Wno-unused-variable -Wfatal-errors -fPIC -O2 -DGPU -DOPENCL -DRPI -I/usr/include/ -I/usr/local/include/ -DGPU_FAST -c ./src/gemm.c -o obj/gemm.o
./src/gemm.c:170:20: fatal error: clBLAS.h: No such file or directory
#include “clBLAS.h”
^
compilation terminated.
Makefile:113: recipe for target ‘obj/gemm.o’ failed
make: *** [obj/gemm.o] Error 1
thanks in advance
regards
hashir
@piotr.sowa, after running yolov2tiny version I got output , but prediction was enterly different from the original image nd also I would like to know that yolo versions other than tiny yolo can run in pi ?
My output is given below
pi@raspberrypi:~/Downloads/darknet-master $ ./darknet detect cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights data/horses.jpg
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BFLOPs
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125 0.043 BFLOPs
15 detection
mask_scale: Using default ‘1.000000’
Loading weights from yolov2-tiny-voc.weights…Done!
data/horses.jpg: Predicted in 38.437360 seconds.
traffic light: 75%
@Hashir, please see my output, but before you do the same please clone one more time my repo and then RPI=1 change in Makefile. It failed on my test, but it used OpenCL, I think even Yolo2-Tiny is too big for RPi, but you may try to train CIFAR-10 and test it… one more thing is that after detecting GPU it takes a few minutes to build all OpenCL kernels, so be patient. if you need test with no GPU use “-nogpu” parameter. Thanks!
root@raspberrypi:~/darknet# ./darknet detect cfg/yolov2-tiny.cfg ../weights/yolov2-tiny.weights data/dog.jpg -thersh .1
Device ID: 0
Device name: VideoCore IV GPU
Device vendor: Broadcom
Device opencl availability: OpenCL 1.2 VC4CL 0.4
Device opencl used: 0.4
Device double precision: NO
Device max group size: 12
Device address bits: 32
layer filters size input output
0 could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
p ;).
With the last commit, you should be able to calculate this. Unfortunately, it is slow, it calculates detection wrong now, but maybe in short time, VC4CL will be better. Fingers-crossed.
root@raspberrypi:~/cifar# ../darknet/darknet classifier predict cfg/cifar.data cfg/cifar_small_test.cfg ../weights/cifar_small.weights data/cifar/test/4882_frog.png
Device ID: 0
Device name: VideoCore IV GPU
Device vendor: Broadcom
Device opencl availability: OpenCL 1.2 VC4CL 0.4
Device opencl used: 0.4
Device double precision: NO
Device max group size: 12
Device address bits: 32
layer filters size input output
0 conv 32 3 x 3 / 1 28 x 28 x 3 -> 28 x 28 x 32 0.001 BFLOPs
1 max 2 x 2 / 2 28 x 28 x 32 -> 14 x 14 x 32
2 conv 16 1 x 1 / 1 14 x 14 x 32 -> 14 x 14 x 16 0.000 BFLOPs
3 conv 64 3 x 3 / 1 14 x 14 x 16 -> 14 x 14 x 64 0.004 BFLOPs
4 max 2 x 2 / 2 14 x 14 x 64 -> 7 x 7 x 64
5 conv 32 1 x 1 / 1 7 x 7 x 64 -> 7 x 7 x 32 0.000 BFLOPs
6 conv 128 3 x 3 / 1 7 x 7 x 32 -> 7 x 7 x 128 0.004 BFLOPs
7 conv 64 1 x 1 / 1 7 x 7 x 128 -> 7 x 7 x 64 0.001 BFLOPs
8 conv 10 1 x 1 / 1 7 x 7 x 64 -> 7 x 7 x 10 0.000 BFLOPs
9 avg 7 x 7 x 10 -> 10
10 softmax 10
Loading weights from ../weights/cifar_small.weights…Done!
data/cifar/test/4882_frog.png: Predicted in 1.647542 seconds.
16.96%: dog
16.44%: deer
p ;).
I would like to try it with a Raspberry Pi, but I’m confused as to how to do it. Are there instructions as to what to get and build?
@PeterQuinn, please look into Makefile, there is instruction on top, install VC4CL, set OPENCV=0 and RPI=1, save the file, make and have fun :-).
Thanks. Lots of steps (and recursive instructions) but reasonable straightforward to install VC4CL.
I have darknet compiled but it appears to hang.
pi@raspi3:~/darknet $ sudo ./darknet detect cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg
Device ID: 0
Device name: VideoCore IV GPU
Device vendor: Broadcom
Device opencl availability: OpenCL 1.2 VC4CL 0.4
Device opencl used: 0.4
Device double precision: NO
Device max group size: 12
Device address bits: 32
Any ideas?
oh. After a long wait (20 minutes?) it continues:
layer filters size input output
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 could not push array to device. error: CL_OUT_OF_RESOURCES
could not push array to device. error: CL_INVALID_MEM_OBJECT
could not push array to device. error: CL_OUT_OF_RESOURCES
one more update -- I increased the memory available to the GPU to 768 and now it finishes loading the network. It still fails later though.
terminate called after throwing an instance of ‘std::out_of_range’
what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted
Any ideas?
@PeterQuinn, so for now only “-nogpu” switch works to me. I checked with the VC4CL author and this solution is still under development and does not work fine. Even with extended memory, because for example log, sqrt, pow functions are not implemented yet and there are critical to work on this. I updated also the source code, so please pull the latest version it may help. The error you posted is happening from time to time and is not deterministic.
Here you go… ;-).
p ;).
Nice work. Two questions…
Do you think this would also work on other single board computers than the RPi, for example on the ASUS Tinker Board (which has a Mali GPU)?
I believe you’re using clBLAS. Have you considered using CLBlast (https://github.com/CNugteren/CLBlast) instead?
Yes, I tested in the https://iblog.isowa.io/2019/02/02/darknet-in-opencl-on-asus-thinker-board-s/ this scenario, it works great :D. Sorry to wrong spell the name of this single board computer… but anyway feel free to use it with “Thinker” board. Thanks!
Thank you, I do not have ASUS Tinker Board, if it has OpenCL it should work, maybe with a little Makefile change. And yes I consider CLBlast, however, it failed with Intel Iris GPU and I do not want that.
That improvement was done and described at the https://iblog.isowa.io/2019/02/02/darknet-in-opencl-on-asus-thinker-board-s/ Thanks!
@piotr.sowa Hey, would I be able to run your openCL implementation of YOLOv3 on FPGA ? Do i need to make any changes ?
Sorry, I even do not know what FPGA is…
@piotr.sowa Would I be able to to run the yolov3 opencl without GPU ?
Yes, without recompilation only in invocation you need to provide the “-nogpu” parameter. Thanks!
Hello piotr.sowa. Your project is really cool. I want run your code on my computer. But after I build it with OpenCV=OFF GPU=ON GPU_FAST=ON CPU=OFF, other options are default. I run darknet and it show me as image below.
http://i2.tiimg.com/692688/62e5261f35719984.png (sorry, I don’t know how to upload images into this text box)
My platform is Ubuntu 18 desktop x64, AMD RX480 GPU, intel E5 2670 CPU. I install lib boost by apt-get, and build clBLAS with BUILD_TEST=OFF, other options are default.
Could you give me some information about this issue? Thank you a lot.
Sorry, I do not have access to this screenshot / image. But, for AMD based platform you have to change some lines in Makefile, to do so, look for “cuda” statements. Thanks!
Piotr
Thank you for quickly reply.
Let me check it. If I have a father question, ping you.
Thanks in Advance.
Jin
Hello Piotr Sowa.
I’m trying to make you port to work properly with a Intel UHD Graphics 630 GPU in Linux, with the dog photo and the YoloV3-tiny the predictions takes 8 seconds while with a Intel i5 CPU it takes 1 seconds, checking the GPU usage I noticed a very little workload, its seem to be using only one core of the GPU.
Any ideas about this issue? thanks in advance and thanks for this port!
Piotr
Thank you for quick reply.
Let me check it. If I have a father question, ping you.
Thanks in Advance.
Jin
Hi @piotr.sowa
I’m trying to build your model in de1-soc board. Is there any guide for me to do it?
Thank you in advance!