Week 10 (Aug 15 - Aug 19)

This week was our user study week, in which we conducted interviews and demonstrations remotely to BLV people across the globe. The schedule was made to accommodate our participants, and so, we had user studies distributed throughout all times of the day and night. We had a full schedule, and so all members of the team supported the effort to ensure that user studies go smoothly. Each user study has two critical members from the team’s side: the user-study leader and the note-taker. The leader must ask the questions, interpret the answers, and ensure that the technical setup is working correctly given the hardware of the other person. The note-taker must listen intently for answers and quotes that are poignant, and note the timestamp so that the team can review those timestamps at a later time. This was a huge learning experience for me, as I had never conducted user studies before. It was eye opening to see the reactions of our participants and learn that our system truly added value and allowed BLV people to engage with tennis games! It was also fantastic learning about potential improvement areas through the perspective of our participants. Our user studies went smoothly, and we even managed to schedule a few more for next week!

Week 9 (Aug 8 - Aug 12)

This week we ran our code on all datasets (many samples from tennis videos) to highlight any lingering issues. We selected video samples that showed our system performing on a range of tennis match scenarios (some male players, female players, historical matches, grass courts, synthetic courts, clay courts, etc). After sifting through all the output videos, we had our sample videos ready for the user studies with BLV people. We then prepared our schedules to conduct our user studies. I decided to play a supplementary role in the first two days of user studies, and then lead user study sessions for the third, fourth, and fifth days. I’m excited to see the perspective of BLV people and understand if our system is truly adding value for tennis game awareness and understanding!

Week 8 (Aug 1 - Aug 5)

This week we continued the implementation of action recognition. My task was to utilize the trained model and integrate it within our architecture in order to correctly predict the action type for the players. Our secondary goal was to investigate if the action recognition would be robust enough to help us catch misdetections for frames that were assumed incorrectly to be “hit-frames” (frames in which a player is striking the ball). We decided to implement it in such a way that the frame is fed into the player detection model to create a bounding box around both top player and bottom player. We know that this particular frame shows a top player striking the ball, so we crop the relevant player using the bounding box, and feed that cropped image to the action recognition model. The model then predicts the action type, and saves it to a json file, to be later used as part of the spatialized audio.

Week 7 (Jul 25 - Jul 29)

This week we decided to implement a new feature: action recognition. The goal for this feature is to provide richer spatialized audio for the blind and low vision users. Instead of hearing a default noise to indicate a strike, a specialized noise could be dedicated to each category of tennis player action. In order to decide the scope of this feature, we decided to conduct a literature review and investigate the existing solutions for this problem. I was tasked with this literature review and found many papers that utilize richer inputs in order to recognize action types. For example, many models use HD video streams from 4 corners of the tennis match. Since our goal is to implement a solution for tennis broadcasts retroactively, which do not include HD videos from four cameras simultaneously, and sometimes do not include HD quality at all, this type of solution was not feasible.

Week 6 (Jul 18 - Jul 22)

This week our primary focus was to finalize our submission for the ACM Symposium on User Interface Software and Technology (UIST). We paused system development and focused on generating demonstration clips that would constitute our video submission to UIST. This entailed multiple steps. Firstly, I sifted through our existing output videos from both the baseline outputs (modern tennis broadcasts) and the application outputs (historical broadcasts, recreational videos uploaded to youtube.com, and gameplay from tennis video games).

Week 5 (Jul 11 - Jul 15)

This week we continued system development, with a special focus on solving a particular problem. Within our system architecture, we have implemented a deep learning model that detects and tracks small fast-moving objects like a tennis ball.

Week 4 (Jun 27 - Jul 1)

This week we continued system development, and decided to improve our architecture’s player detection. The existing architecture had two separate sets of logic regarding player detection. The bottom player was selected by detecting all people in the bottom half of the court, and simply selecting the largest human. The top player was selected by detecting all people in the top half of the court (including ball-boys, referee, etc), and tracking all of their movements over the course of the video segment. The person that moved the most was selected to be the player. The benefit of this type of model is that it does not necessitate any type of tennis-specific data annotation since many existing models perform well at general person detection. The disadvantage of such a model is that the top player is frequently not the highest-moving person in the frame. On many occasions, the top player stays in roughly the same location, and the ball-boy or ball-girl are the person who moves the most. This throws off the model, and the top player ends up being mis-detected. The bottom player faced fewer mis-detection issues.

Week 3 (Jun 20 - Jun 24)

This week, we found that the player detection architecture was not robust enough to consistently detect players. Our strategy for the week was to deep dive into the code and see if any simple optimizations could be made to improve the performance. From our deep dive, we found that the architecture employed a unique logic for detecting each player. The first step was simply to detect all persons in the top half and bottom half of the court respectively. The top player was identified by tracking each person and selecting the person who moved most. The bottom player was selected by choosing the largest bounding box from all persons detected in the bottom half of the frame.

Week 2 (Jun 13 - Jun 17)

After analyzing and categorizing the existing issues in the vision system and highlighting the need for improvement to the court detection, this week our goal was to deep dive on the court detection architecture and try to find optimizations that could enhance the performance. I spent a few days analyzing the code and creating intermediate results in an effort to isolate each function from the next and methodically seek out optimizations. I found that the existing system uses an off-the-shelf pixel intensity threshold to divide the image into black and white, but the white lines on the far end of the court were being detected as black instead of white.

Week 1 (Jun 6 - Jun 10)

This was my first week after onboarding and being familiarized with the project. My task was to evaluate the existing deep-learning implementation of court-detection, ball detection, and player detection. I started by analyzing the existing results and highlighting all false positives and false negatives. After concluding that court detection was the most critical model to perfect (as it impacts all downstream detection models), I conducted a deep dive investigation of the court detection architecture and code. The court detection model uses both traditional computer vision and deep learning methods to detect all lines on the court. I found an issue in one of the traditional computer vision elements (the black/white threshold after graying the image), which causes false negatives. After meeting with the team and discussing these findings, we selected this as the first code section to be improved.