Abstract
                                                                        In this paper, we address the problem of dance style classification to classify Indian dance or any dance in general.  We propose a 3-step deep learning pipeline. First, we extract 14 essential joint locations of the dancer from each video  frame, this helps us to derive any body region location within the frame, we use this in the second step which forms the  main part of our pipeline. Here, we divide the dancer into regions of important motion in each video frame. We then  extract patches centered at these regions. Main discriminative motion is captured in these patches. We stack the features  from all such patches of a frame into a single vector and form our hierarchical dance pose descriptor. Finally, in the third  step, we build a high level representation of the dance video using the hierarchical descriptors and train it using a  Recurrent Neural Network (RNN) for classification. Our novelty also lies in the way we use multiple representations for  a single video. This helps us to: (1) Overcome the RNN limitation of learning small sequences over big sequences such  as dance; (2) Extract more data from the available dataset for effective deep learning by training multiple representations.  Our contributions in this paper are three-folds: (1) We provide a deep learning pipeline for classification of any form of  dance; (2) We prove that a segmented representation of a dance video works well with sequence learning techniques for  recognition purposes; (3) We extend and refine the ICD dataset and provide a new dataset for evaluation of dance. Our  model performs comparable or better in some cases than the state-of-the-art on action recognition benchmarks.  Keywords: human activity recognition, dance style recognition, joint-localization, pose-descriptor, CNN, RNN, ICD.