Why does this AI confuse this bald referee with a ball?

This football club in Scotland didn’t want to pay the cameramen anymore. So they got an AI that controls the cameras to follow the ball.

But last weekend, it messed up. It confused the head of a bald referee with the ball and as a result, the cameras were following the referee instead of the ball (link to video).

Do you have an explanation of why this happened? And also, how would you fix it?

Looking forward to your answers.


Haha, the video is great. I’d say this happened because the threshold for the confidence score is too low. This means that the NN is classifying the head as a ball even when the confidence is low. Just increase the confidence.

1 Like

@tonio your solution could work, but if you look at the video the head of the referee looks quite similar to the ball on the field. So the NN might make a wrong prediction with high confidence.

I think using Kalman filters could also be a solution. The intuition is to model the movement of the ball taking into account the “probability” of where the ball is in a video frame. So using simple equations of motion, we keep track of where we expect the ball to be in the next frame, and if the observation is too different (the ball jumped to the referee’s head), we discard this prediction and use the next best prediction from the neural network.

This is just speaking on a high level, but it’d be interesting to see the approach implemented. Maybe it can be improved by accounting for relative ball size in the frame as well (so that close ups of bald people in the crowd are not picked up as a ball).


Hey everyone!

I think Kalman filters could work, but are painful af to implement.

Easier would be to do negative mining. Basically, you train your model with positive (is a ball) and negative (is not a ball) samples. Negative mining is about making sure that your model is trained on enough negative samples. The best way to do so is to look at your false-positives from training and then label those as negative.


Thanks for all the great answers! Adjusting the confidence really sounds like the easiest fix, but also quite not very effective. Do you guys have experience regarding Kalman filters v negative mining? Which one performs better? Which one is easier to implement?

The video is hilarious :rofl:

Regarding your last question @charlotte, we had a discussion about this internally at Hasty as well. I think that negative mining works great for images and is not too hard to do, but Kalman-filters should perform better for videos as they can also take context like movement, sound, etc into consideration. Generally speaking, as @jmith pointed out, negative mining is much easier to implement, though.

But I’d be also interested to learn if there are other approaches out there which we didn’t consider.

1 Like

Hi all,

I think the problem is more complex than it looks. It can happen that the camera can catch the spare ball and will focus on it, instead of the in-game ball. So basically you need to track the ball over frames in order to solve this problem accurately.