Week 3 (Jun 20 - Jun 24)
This week, we found that the player detection architecture was not robust enough to consistently detect players. Our strategy for the week was to deep dive into the code and see if any simple optimizations could be made to improve the performance. From our deep dive, we found that the architecture employed a unique logic for detecting each player. The first step was simply to detect all persons in the top half and bottom half of the court respectively. The top player was identified by tracking each person and selecting the person who moved most. The bottom player was selected by choosing the largest bounding box from all persons detected in the bottom half of the frame.
We attempted to change the crop parameters such that the top half of the court excluded some of the audience, and most of the net (and thus the ball boys and referee). The bottom half of the court received a similar cropping treatment. This yielded slight improvements in performance, but the system was still not robust enough, particularly for the top player. The motion-detection based player selection for the top player was selecting the ball-boy, who typically moved a large amount in the last few seconds of every point. In order to combat this, we tried to crop the final frames from being used for the motion tracking. This improved the performance marginally as well. We then decided to download new tennis videos to test these improvements on, to ensure that we were not hyper-focused on a few sample videos specifically. We also wanted to check if alternative gameplay styles would perform well or not. I selected some videos for historical gameplay and recreational gameplay as examples of alternative applications, along with a few new videos of professional gameplay broadcasts. We found that the player detection still did not perform robustly enough, and so our goal for next week is to implement a deep neural network based solution.