Multi-Task Learning Organize (MTLN) of Skeleton Sequences Based 3D Action Recognition

Authors(2) :- T. Seshagiri , S. Varadarajan

Skeleton sequences provide 3D trajectories of human skeleton joints. The spatial temporal in sequence is very significant for action detection. Considering that deep convolution neural network (CNN) is very influential for feature learning in images, in this paper, we intend to transform a skeleton sequence into an image-based demonstration for spatial temporal information learning with CNN. Specifically, for each channel of the 3D coordinates, we distinguish the sequence into a clip with several gray images, which represent multiple spatial structural information of the joints. Those images are fed to a deep CNN to learn high-level features. The CNN features of all the three clips at the same time-step are concatenated in a feature vector. Each feature vector represents the temporal information of the entire skeleton sequence and one particular spatial relationship of the joints. Then we propose a Multi-Task Learning Network (MTLN) to jointly process the feature vectors of all time-steps in related for action detection. Investigational results clearly show the effectiveness of the proposed new representation and feature learning method for 3D action detection.

Authors and Affiliations

T. Seshagiri
Scholar, Rayalaseema University, Kurnool, Associate Professor, Shree Institute of Technical Education, Tirupati, Andhra Pradesh, India
S. Varadarajan
Professor, Department of Electronics &Communication Engineering, Svu Engineering College, Tirupati, Andhra Pradesh, India

Temporal pooling of CNN, MTLN, LSTM, HMM and CRF

  1. R. Caruana. Multitask learning. In Learning to find out, pages 95-133. Springer, 1998.
  2. CMU. CMU graphics lab motion capture folder. In 2013.
  3. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional formation element for generic visual detection. In International Conference on Machine Learning , pages 647-655, 2014.
  4. Y. Du, W. Wang, and L. Wang. Hierarchical recurrent neural network for skeleton base act detection. In IEEE Conference on PC Vision and Pattern detection , pages 1110-1118, 2015.
  5. G. Evangelidis, G. Singh, and R. Horaud. Skeletal quads: person act judgment with joint quadruples. In International Conference on Pattern identification , pages 4513-4518, 2014.
  6. R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for correct object detection and semantic Parts. In IEEE Conference on PC Vision and Pattern detection , pages 580-587, 2014.
  7. A. Graves. Neural networks. In Supervised approved Labeling with Recurrent Neural Networks, pages 15-35. Springer, 2012.
  8. A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6645-6649. IEEE, 2013.
  9. F. Han, B. Reily, W. Hoff, and H. Zhang. space-time demonstration of people base on 3d skeletal data: a review. arXiv preprint arXiv:1601.01006, 2016.
  10. X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. Match net: Unifying feature and metric learning for area based matching. In IEEE Conference on processor Vision and Pattern identification, pages 3279-3286, 2015.
  11. J.F. Hu, W.-S. Zheng, J. Lai, and J. Zhang. Jointly learning heterogeneous features for RGB-D activity detection. In IEEE Conference on Computer Vision and Pattern detection, pages 5344-5352, 2015.
  12. M. E. Hussein, M. Torki, M. A. Gowayyed, and M. El-Saban. Human action discovery using a temporal hierarchy of covariance descriptors on 3d joint points In IJCAI, volume 13, pages 2466-2472, 2013.
  13. Y. Ji, G. Ye, and H. Cheng. Interactive body part contrast mining for individual interaction detection. In IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pages 1-6. IEEE, 2014.
  14. P. Koniusz, A. Cherian, and F. Porikli. Tensor representations via kernel linearization for act detection from 3d skeletons. arXiv preprint arXiv:1604.00239, 2016.
  15. W. Li, L. Wen, M. Choo Chuah, and S. Lyu. Category-blind individual action recognition: A useful recognition system. In IEEE International Conference on Computer Vision (ICCV), pages 4444-4452, 2015.
  16. J. Liu, A. Shahroudy, D. Xu, and G. Wang. Spatio-temporal LSTM with belief gates for 3D human being action detection. In European Conference on PC Vision (ECCV), pages 816-833. Springer, 2016.
  17. M. Long and J. Wang. Learning transferable features with deep adaptation networks. CoRR, abs/1502.02791, 1:2, 2015.
  18. V. Nair and G. E. Hinton. Rectified linear units progress restricted boltz mann tools. In International Conference on Machine knowledge, pages 807-814, 2010.
  19. X. Peng and C. Schmid. Encoding feature maps of cnns for act detection. 2015.
  20. F. Radenovic, G. Tolias, and O. Chum. Cnn image recovery learns from bow: Unsupervised fine-tuning with solid examples. arXiv preprint arXiv:1604.02426, 2016.
  21. A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: an astounding baseline for detection. In IEEE Conference on PC Vision and Pattern detection Workshops ,pages 806-813, 2014.
  22. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet huge scale visual detection challenge, International Journal of Computer Vision, 115(3):211-252, 2015.
  23. A. Savitzky and M. J. Golay. smooth and partition of data by simplify slightest squares procedures. Analytical chemistry, 36(8):1627-1639, 1964.
  24. A. Shahroudy, J. Liu, T.-T. Ng, and G.Wang. NTU RGB+D: A large scale dataset for 3D human activity analysis. In IEEE Conference on PC Vision and Pattern detection , June 2016.
  25. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image identification. arXiv preprint arXiv:1409.1556, 2014.
  26. C. Sminchisescu, A. Kanaujia, and D. Metaxas. Conditional models for contextual person motion detection. Computer Vision and Image Understanding, 104(2):210-220, 2006.
  27. A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. In ACM International Conference on Multimedia, pages 689-692, 2015.
  28. R. Vemulapalli, F. Arrate, and R. Chellappa. Person act detection by representing 3d skeletons as points in a lie group. In IEEE Conference on Computer Vision and Pattern identification ,pages 588-595, 2014.
  29. J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining action let ensemble for action discovery with depth cameras. In IEEE Conference on Computer Vision and Pattern detection , pages 1290-1297, 2012.
  30. D. Weinland, R. Ronfard, and E. Boyer. Free view point action recognition using movement history volumes. Computer vision and image perceptive, 104(2):249-257, 2006.
  31. D. Wu and L. Shao. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition. In IEEE Conference on Computer Vision and Pattern Recognition , pages 724-731, 2014.
  32. L. Xia, C.C. Chen, and J. Aggarwal. View invariant human action recognition with histograms of 3D joint. In IEEE Conference on Computer Vision and Pattern Recognition Workshops , pages 20-27, 2012.
  33. X. Yang and Y. L. Tian. Eigen joints-based action recognition with naive-bayes-nearest-neighbor. In IEEE Computer Society Conference on Computer Vision and Pattern identification Workshops , pages 14-19, 2012.
  34. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How moveable are features in deep neural networks? In Advances in neural information processing systems, pages 3320-3328, 2014.
  35. K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras. Two-person interaction detection using body pose features and multiple occurrence learning. In IEEE Conference on Computer Vision and Pattern identification Workshops , pages 28-35, 2012.
  36. W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, and X. Xie. Co-occurrence feature learning for bones based action identification using regularized deep lstm networks. In AAAI Conference on Artificial Intelligence, 2016.

Publication Details

Published in : Volume 2 | Issue 5 | September-October 2017
Date of Publication : 2017-10-31
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 965-972
Manuscript Number : CSEIT1725225
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

T. Seshagiri , S. Varadarajan, "Multi-Task Learning Organize (MTLN) of Skeleton Sequences Based 3D Action Recognition", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 2, Issue 5, pp.965-972, September-October-2017.
Journal URL :

Article Preview