I want to create an application which can count how many football juggles (feet, knee, head etc.) has been performed through the webcam live feed. I have implemented pose estimation and object tracking but it is too slow to accurately track the movements and contact of the ball and feet. I also tried having a buffer which records the trajectory of the ball to detect a bounce but it is not very elegant either.
I need help to prepare a better strategy for this task and if possible faster tracking and pose estimation. Even minute improvements in accuracy would be very encouraging for me.
Here’s a version which uses color tracking (no counting) -
What defines a juggle? Is it a contact the produces a local minimum in the height of the ball? If the ball is on its way up and you head it so that it continues up, but in a different direction, does that count as a juggle?
@joymkj, can you provide us with a quick sketch so we can better grasp the concept you’re talking about?
Yes, a contact counts as juggle. Even pushing it in the same direction as you said.
I used several strategies. One of them is to measure the distance between the lower edge of the bounding box of the ball and the head/knee/foot (from pose estimation) in the sketch in my post. When it is positive and below a certain threshold, it counts as a juggle. It doesn’t do so good. I was thinking about doing some time series analysis but I am not sure how to do it real time
This might have worked better if the object tracking wasn’t so poor. Are there any library/tool to make object tracking easier? Ideally image segmentation would be the best solution since we can detect contact more accurately.
This might be difficult.
Instead as jeremy suggested just monitor the ball. A juggle occurs when the speed was positive towards y and now it’s negative.
How many cameras do you have, and where are they? Using one camera to estimate whether an upper body contact happened when the ball is between the camera and the player can be a very hard problem. It would be marginally easier with a depth camera like a Kinect. You could also try to guess at depth information with things like ball blob radius and inter-eye distance on face recognition. However, changes in the direction / speed / acceleration of the ball are probably your best indicator that meaningful contact happened – having a skeleton nearby is good supplementary evidence, but frame-by-frame skeletons are often garbage, even if you have multiple camera angles and depth isn’t a total question mark. Honestly I’d skip the pose thing initially except perhaps for trying to disambiguate if it touched a toe or the ground. But this is just guessing, not experience.
For the ground, keep in mind that you can project an estimated time that a juggle will fail – e.g. when the ball will touch the ground. If that doesn’t happen, a juggle (must have) happened. That doesn’t count three quick knees, but it does give you a clearer way of watching for when a streak ends.