Week 4 (Jun 27 - Jul 1)
This week we continued system development, and decided to improve our architecture’s player detection. The existing architecture had two separate sets of logic regarding player detection. The bottom player was selected by detecting all people in the bottom half of the court, and simply selecting the largest human. The top player was selected by detecting all people in the top half of the court (including ball-boys, referee, etc), and tracking all of their movements over the course of the video segment. The person that moved the most was selected to be the player. The benefit of this type of model is that it does not necessitate any type of tennis-specific data annotation since many existing models perform well at general person detection. The disadvantage of such a model is that the top player is frequently not the highest-moving person in the frame. On many occasions, the top player stays in roughly the same location, and the ball-boy or ball-girl are the person who moves the most. This throws off the model, and the top player ends up being mis-detected. The bottom player faced fewer mis-detection issues.
We decided to retrain the final few layers of an object detection deep learning model using data that we annotated manually (many thanks to our colleagues in the lab who helped!). The data was labeled with bounding boxes for “top player”, and “bottom player”, and after training, we found the model performed very well at detecting the top player and bottom player directly, without needing any other logic.