Deep Vision - Machine Learning Computer Vision for Processing (Library)

Hello Together

I’m happy to present you a project I’ve been working on for over a year now. With OpenCV for Processing we have a good base library for traditional image processing. However, these algorithms are more and more pushed into the background by machine learning.

Therefore I started a year ago to develop a library that (mainly) uses the Deep Neural Network (DNN) module of OpenCV to bring machine learning, especially CNN into the Processing world.

The library is not about training these networks, but about inferencing (executing a prediction) them.

:zap: The API of the library can and will still change, because it is just a pre-release.


Example: SSD MobileNet & Lightweight Open Pose

Installation

The library can be downloaded as usual from the Processing contribution manager.

Networks

At the moment, more than 25 different networks are implemented, from YOLO to Lightweight OpenPose to MIDAS. All of them have pre-trained weights. These networks are divided into different categories:

  • Object Detection :sparkles:
  • Object Segmentation
  • Object Recognition :blue_car:
  • Keypoint Detection :woman_playing_handball:t2:
  • Classification :cat2:
  • Depth Estimation :dark_sunglasses:
  • Image Processing

To keep the library small I developed a repository system that automatically downloads the files you need for the chosen network.

Performance

The implementation was optimized for the CPU and many networks run with 20-30 FPS on a good CPU. But it is also possible to use the CUDA backend of OpenCV. There you have to download a special version of the library, which only works on Linux x86/x64 and Windows and is quite big (2-5 GB).

Next Steps

At the moment I’m working on improving the speed on the CPU by optimizing the pre and post process. I am also experimenting with ONNX directly to have a second inference engine.

Additionally I’m still working on the documentation and more examples and I would be very happy if someone wants to contribute to it.

Thank you very much for testing & feedback.

20 Likes

This is amazing, really. I’ve been looking for something like this! :tada:

1 Like

This is amazing! Thank you so much. I will be using this to teach my students. Fantastic work

1 Like

Hi can I ask why i getting a red line at network.setup it saying :

OpenCV(4.5.1) modules\dnn\src\torch\THDiskFile.cpp:286: error: (-2:Unspecified error) read error: read 659639 blocks instead of 780300 in function ‘TH::THDiskFile_readFloat’

I did go though your github but I not sure why I getting this error, am I missing something?

I need more information about how you are using the library (code) and what are you trying to do. Otherwise I can not help you.

But usually if there is a problem with reading the weights, they are not downloaded correctly. I already implemented a behaviour to prevent this, but it’s not released yet.

To delete broken packages, use the following line of code before setting up the network:

deepVision.clearRepository()

Hello, I want use the library to play a video when a face is detected in the webcam and change the video to another one when there is no face in front of webcam.

I tried understanding but I am unable to declare the case using the syntax, when no face is detected in webcam.

For instance I want say if (no face is detected) { play video1 }; else { play video2}

How should I go about it

Check out the face-detection example: If you want to check if a face is currently present, just check how many faces have been detected:

if(detections.size() > 0) {
    // face is present
} else {
    // no face
}
1 Like

Hi! I´m exploring the Deep Vision library. I made a model using teachable machine and it kind of worked with Deep Vision using SSDMobileNetwork network. The thing is it only picks up one of the labels. Any ideas? What do you reckon? Thank you in advanced! This is AMAZING work!

Hi Florian, My name is Carlos Vaz. Iam a professor at a Federal University in Brazil. We are developing research that has the goal of using aerial drone images to count people and cars from top in open spaces. I found your library on Github and I want to know if it is possible to train it to do this task.

The library is an inferencing library which does not contain any support for training. But as described in the readme, it is of course possible to train a network and use it later in the library. But for areal object detection I would use a specific network anyway (for example), because the detection anchors and grids in the default networks are usually too big for small objects.

I would recommend you first try out the default networks (YOLO & MaskRCNN) and see if it works. It really depends on your images (how far away, which angle and so on).

Would it be possible to share your trained model with me so I could test it?