Abstract
The presence of standard video editing practices in broadcast sports videos, like football, effectively means that such videos have stronger contextual priors than most generic videos. In this paper, we show that such information can be harnessed for automatic analysis of sports videos. Specifically, given an input video, we output per-frame information about camera angles and the events (goal, foul, etc.). Our main insight is that in the presence of temporal context (camera angles) for a video, the problem of event tagging (fouls, corners, goals, etc.) can be cast as per frame multi-class classification problem. We show that even with simple classifiers like linear SVM, we get significant improvement in the event tagging task when contextual information is included. We present extensive results for 10 matches from the recently concluded Football World Cup, to demonstrate the effectiveness of our approach.