Skip to main content
Scheduled maintenance on Monday, June 3rd, with potential service disruption. Find out more.
Intended for healthcare professionals
Open access
Research article
First published online July 11, 2013

Learning human activities and object affordances from RGB-D videos

Abstract

Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our method on a challenging dataset comprising 120 activity videos collected from 4 subjects, and obtained an accuracy of 79.4% for affordance, 63.4% for sub-activity and 75.0% for high-level activity labeling. We then demonstrate the use of such descriptive labeling in performing assistive tasks by a PR2 robot.

References

Aggarwal JK, Ryoo MS (2011) Human activity analysis: A review. ACM Computer Surveys 43(3): 16.
Aksoy E, Abramov A, Worgotter F, Dellen B (2010) Categorizing object–action relations from semantic scene graphs. In: Proceedings of ICRA.
Aksoy EE, Abramov A, Dörr J, Ning K, Dellen B, Wörgötter F (2011) Learning the semantics of object-action relations by observation. The International Journal of Robotics Research 30: 1229–1249.
Aldoma A, Tombari F, Vincze M (2012) Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes. In: Proceedings of ICRA.
Anand A, Koppula HS, Joachims T, Saxena A (2012) Contextually guided semantic labeling and search for 3D point clouds. The International Journal of Robotics Research, in press.
Bollini M, Tellex S, Thompson T, Roy N, Rus D (2012) Interpreting and executing recipes with a cooking robot. In: Proceedings of ISER.
Choi C, Christensen HI (2012) Robust 3D visual tracking using particle filtering on the special Euclidean group: A combined approach of keypoint and edge features. The International Journal of Robotics Research 31: 498–519.
Collet A, Martinez M, Srinivasa SS (2011) The MOPED framework: object recognition and pose estimation for manipulation. The International Journal of Robotics Research 30: 1284–1306.
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of CVPR.
Diankov R (2010) Automated Construction of Robotic Manipulation Programs. PhD thesis, Carnegie Mellon University, Robotics Institute. http://www.programmingvision.com/rosen_diankov_thesis.pdf.
Felzenszwalb PF, Huttenlocher D (2004) Efficient graph-based image segmentation. International Journal of Computer Vision 59: 167–181.
Finley T, Joachims T (2008) Training structural svms when exact inference is intractable. In: Proceedings of ICML.
Fox E, Sudderth E, Jordan M, Willsky A (2011) Bayesian nonparametric inference of switching dynamic linear models. IEEE Transactions on Signal Processing 59: 1569–1585.
Gall J, Fossati A, van Gool L (2011) Functional categorization of objects using real-time markerless motion capture. In: Proceedings of CVPR.
Gibson J (1979) The Ecological Approach to Visual Perception. Houghton Mifflin.
Gupta A, Kembhavi A, Davis L (2009) Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31: 1775–1789.
Hammer P, Hansen P, Simeone B (1984) Roof duality, complementation and persistency in quadratic 0–1 optimization. Mathematical Programming 28: 121–155.
Henry P, Krainin M, Herbst E, Ren X, Fox D (2012) RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research 31: 647–663.
Hermans T, Rehg JM, Bobick A (2011) Affordance prediction via learned object attributes. In: ICRA: Workshop on Semantic Perception, Mapping, and Exploration.
Hoai M, De la, Torre F (2012) Maximum margin temporal clustering. In: Proceedings of International Conference on Artificial Intelligence and Statistics.
Hoai M, Lan Z, De la, Torre F (2011) Joint segmentation and classification of human actions in video. In: Proceedings of CVPR.
Jiang Y, Li Z, Chang S (2011a) Modeling scene and object contexts for human action retrieval with few examples. IEEE Transactions on Circuits and Systems for Video Technology 21: 674–681.
Jiang Y, Lim M, Saxena A (2012a) Learning object arrangements in 3D scenes using human context. In: Proceedings of ICML.
Jiang Y, Lim M, Zheng C, Saxena A (2012b) Learning to place new objects in a scene. The International Journal of Robotics Research 31: 1021–1043.
Jiang Y, Moseson S, Saxena A (2011b) Efficient grasping from rgbd images: Learning using a new rectangle representation. In: Proceedings of ICRA.
Jiang Y, Saxena A (2012) Hallucinating humans for learning robotic placement of objects. In: Proceedings of ISER.
Joachims T, Finley T, Yu C (2009) Cutting-plane training of structural SVMs. Machine Learning 77: 27–59.
Kjellström H, Romero J, Kragic D (2011) Visual object-action recognition: Inferring object affordances from human demonstration. Computer Vision and Image Understanding 115: 81–90.
Koller D, Friedman N (2009) Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press.
Konidaris G, Kuindersma S, Grupen R, Barto A (2012) Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research 31: 360–375.
Koppula H, Anand A, Joachims T, Saxena A (2011) Semantic labeling of 3D point clouds for indoor scenes. In: Proceedings of NIPS.
Koppula HS, Gupta R, Saxena A (2012) Human activity learning using object affordances from RGB-D videos. CoRR abs/1208.0967.
Kormushev P, Calinon S, Caldwell DG (2010) Robot motor skill coordination with EM-based reinforcement learning. In: Proceedings of IROS.
Krainin M, Henry P, Ren X, Fox D (2011) Manipulator and object tracking for in-hand 3D object modeling. The International Journal of Robotics Research 30: 1311–1327.
Lai K, Bo L, Ren X, Fox D (2011a) A large-scale hierarchical multi-view RGB-D object dataset. In: Proceedings of ICRA.
Lai K, Bo L, Ren X, Fox D (2011b) Sparse distance learning for object recognition combining RGB and depth information. In: Proceedings of ICRA.
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of CVPR.
Li C, Kowdle A, Saxena A, Chen T (2012) Towards holistic scene understanding: Feedback enabled cascaded classification models. IEEE Transactions on Pattern Analysis and Machine Intelligence 34: 1394–1408.
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Workshop on CVPR for Human Communicative Behavior Analysis.
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of CVPR.
Lopes M, Santos-Victor J (2005) Visual learning by imitation with motor representations. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 35: 438–449.
Matikainen P, Sukthankar R, Hebert M (2012) Model recommendation for action recognition. In: Proceedings of CVPR.
Miller S, van den Berg J, Fritz M, Darrell T, Goldberg K, Abbeel P (2011) A geometric approach to robotic laundry folding. The International Journal of Robotics Research 31: 249–267.
Moldovan B, van Otterlo M, Moreno P, Santos-Victor J, De Raedt L (2012) Statistical relational learning of object affordances for robotic manipulation. In: Latest Advances in Inductive Logic Programming,.
Montesano L, Lopes M, Bernardino A, Santos-Victor J (2008) Learning object affordances: From sensory–motor coordination to imitation. IEEE Transactions on Robotics 24: 15–26.
Ni B, Wang G, Moulin P (2011) Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: ICCV Workshop on Consumer Depth Cameras for Computer Vision.
Panangadan A, Mataric MJ, Sukhatme GS (2010) Tracking and modeling of human activity using laser rangefinders. International Journal of Social Robotics 2: 95–107.
Pandey A, Alami R (2012) Taskability graph: Towards analyzing effort based agent-agent affordances. In: IEEE RO-MAN, pp. 791–796.
Pandey AK, Alami R (2010) Mightability maps: A perceptual level decisional framework for co-operative and competitive human–robot interaction. In: Proceedings of IROS.
Pele O, Werman M (2008) A linear time histogram metric for improved SIFT matching. In: Proceedings of ECCV.
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: Proceedings of CVPR.
Ridge B, Skočaj D, Leonardis A (2009) Unsupervised learning of basic object affordances from object properties. In: Proceedings of the Fourteenth Computer Vision Winter Workshop (CVWW).
Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: Proceedings of CVPR.
Rosman B, Ramamoorthy S (2011) Learning spatial relationships between objects. The International Journal of Robotics Research 30: 1328–1342.
Rother C, Kolmogorov V, Lempitsky V, Szummer M (2007) Optimizing binary MRFs via extended roof duality. In: Proceedings of CVPR.
Rusu R, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
Rusu RB, Blodow N, Marton ZC, Beetz M (2009) Close-range scene segmentation and reconstruction of 3D point cloud maps for mobile manipulation in domestic environments. In: Proceedings of IROS.
Sadanand S, Corso J (2012) Action bank: A high-level representation of activity in video. In: Proceedings of CVPR.
Saxena A, Driemeyer J, Ng A (2008) Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27: 157.
Saxena A, Sun M, Ng AY (2009) Make3d: Learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31: 824–840.
Shi Q, Wang L, Cheng L, Smola A (2011) Human action segmentation and recognition using discriminative semi-Markov models. International Journal of Computer Vision 93: 22–32.
Shotton J, Fitzgibbon A, Cook M, et al. (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of CVPR.
Sun J, Moore JL, Bobick A, Rehg JM (2009) Learning visual object categories for robot affordance prediction. The International Journal of Robotics Research 29: 174–197.
Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from RGBD images. In: Proceedings of ICRA.
Sung JY, Ponce C, Selman B, Saxena A (2011) Human activity detection from rgbd images. In: AAAI workshop on Pattern, Activity and Intent Recognition (PAIR).
Tang K, Fei-Fei L, Koller D (2012) Learning latent temporal structure for complex event detection. In: Proceedings of CVPR.
Taskar B, Chatalbashev V, Koller D (2004) Learning associative Markov networks. In: Proceedings of ICML.
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of ICML.
Yang W, Wang Y, Mori G (2010) Recognizing human actions from still images with latent poses. In: Proceedings of CVPR.
Yao B, Fei-Fei L (2010) Modeling mutual context of object and human pose in human–object interaction activities. In: Proceedings of CVPR.
Yao B, Jiang X, Khosla A, Lin A, Guibas L, Fei-Fei L (2011) Action recognition by learning bases of action attributes and parts. In: Proceedings of ICCV.
Yu C, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of ICML.
Zhang H, Parker LE (2011) 4-dimensional local spatio-temporal features for human activity recognition. In: Proceedings of IROS.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: July 11, 2013
Issue published: July 2013

Keywords

  1. 3D perception
  2. human activity detection
  3. object affordance
  4. supervised learning
  5. spatio-temporal context
  6. personal robots

Rights and permissions

© The Author(s) 2013.
Creative Commons License (CC BY-NC 3.0)
This article is distributed under the terms of the Creative Commons Attribution-Non Commercial 3.0 License (http://www.creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(http://www.uk.sagepub.com/aboutus/openaccess.htm).
Request permissions for this article.

Authors

Affiliations

Hema Swetha Koppula
Department of Computer Science, Cornell University, USA
Rudhir Gupta
Department of Computer Science, Cornell University, USA
Ashutosh Saxena
Department of Computer Science, Cornell University, USA

Notes

Hema Swetha Koppula, Cornell University, Upson Hall, Cornell University, Ithaca, NY 14853, USA. Email: [email protected]

Metrics and citations

Metrics

Journals metrics

This article was published in The International Journal of Robotics Research.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 4239

*Article usage tracking started in December 2016


Altmetric

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores



Articles citing this one

Receive email alerts when this article is cited

Web of Science: 433 view articles Opens in new tab

Crossref: 474

  1. Grounded Affordance from Exocentric View
    Go to citation Crossref Google Scholar
  2. Interactor detection in multiview videos of parent-child play
    Go to citation Crossref Google Scholar
  3. OASNet: Object Affordance State Recognition Network With Joint Visual ...
    Go to citation Crossref Google Scholar
  4. AI Trainer  : Video-Based Squat Analysis
    Go to citation Crossref Google Scholar
  5. ADOSMNet: a novel visual affordance detection network with object shap...
    Go to citation Crossref Google Scholar
  6. IKEA Ego 3D Dataset: Understanding furniture assembly actions from ego...
    Go to citation Crossref Google Scholar
  7. Multi-Label Action Anticipation for Real-World Videos With Scene Under...
    Go to citation Crossref Google Scholar
  8. Multimodal action recognition: a comprehensive survey on temporal mode...
    Go to citation Crossref Google Scholar
  9. Depth Image Restoration for High-Level Vision Tasks
    Go to citation Crossref Google Scholar
  10. Task parse tree: Learning task policy from videos with task-irrelevant...
    Go to citation Crossref Google Scholar
  11. Human activity recognition in an end-of-life consumer electronics disa...
    Go to citation Crossref Google Scholar
  12. A Method to Detect Human Hands Moving Objects
    Go to citation Crossref Google Scholar
  13. Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Obje...
    Go to citation Crossref Google Scholar
  14. MECCANO: A multimodal egocentric dataset for humans behavior understan...
    Go to citation Crossref Google Scholar
  15. The multi-angle extended three-dimensional activities (META) stimulus ...
    Go to citation Crossref Google Scholar
  16. Open Set Video HOI detection from Action-centric Chain-of-Look Prompti...
    Go to citation Crossref Google Scholar
  17. Grounding 3D Object Affordance from 2D Interactions in Images
    Go to citation Crossref Google Scholar
  18. Learning Robot Manipulation Skills From Human Demonstration Videos Usi...
    Go to citation Crossref Google Scholar
  19. RPLNet: Object-Object Affordance Recognition via Relational Phrase Lea...
    Go to citation Crossref Google Scholar
  20. GFIE: A Dataset and Baseline for Gaze-Following from 2D to 3D in Indoo...
    Go to citation Crossref Google Scholar
  21. LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affor...
    Go to citation Crossref Google Scholar
  22. Putting People in Their Place: Affordance-Aware Human Insertion into S...
    Go to citation Crossref Google Scholar
  23. Few-shot human–object interaction video recognition with transformers
    Go to citation Crossref Google Scholar
  24. PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Env...
    Go to citation Crossref Google Scholar
  25. CholecTriplet2021: A benchmark challenge for surgical action triplet r...
    Go to citation Crossref Google Scholar
  26. A Multi-modal Framework for Robots to Learn Manipulation Tasks from Hu...
    Go to citation Crossref Google Scholar
  27. Toward human activity recognition: a survey
    Go to citation Crossref Google Scholar
  28. Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity D...
    Go to citation Crossref Google Scholar
  29. A Step Towards Automated Functional Assessment of Activities of Daily ...
    Go to citation Crossref Google Scholar
  30. SSRT: A Sequential Skeleton RGB Transformer to Recognize Fine-Grained ...
    Go to citation Crossref Google Scholar
  31. Learning Scene-Aware Spatio-Temporal GNNs for Few-Shot Early Action Pr...
    Go to citation Crossref Google Scholar
  32. A Comprehensive Survey of RGB-Based and Skeleton-Based Human Action Re...
    Go to citation Crossref Google Scholar
  33. Applications of Deep Learning-Based Methods on Surveillance Video Stre...
    Go to citation Crossref Google Scholar
  34. Partial Alignment of Time Series for Action and Activity Prediction
    Go to citation Crossref Google Scholar
  35. Fine-grained Affordance Annotation for Egocentric Hand-Object Interact...
    Go to citation Crossref Google Scholar
  36. Placing Human Animations into 3D Scenes by Learning Interaction- and G...
    Go to citation Crossref Google Scholar
  37. Skew-Robust Human-Object Interactions in Videos
    Go to citation Crossref Google Scholar
  38. Contact Part Detection From 3D Human Motion Data Using Manually Labele...
    Go to citation Crossref Google Scholar
  39. The VISTA datasets, a combination of inertial sensors and depth camera...
    Go to citation Crossref Google Scholar
  40. A Survey of Human Action Recognition and Posture Prediction
    Go to citation Crossref Google Scholar
  41. Language-guided graph parsing attention network for human-object inter...
    Go to citation Crossref Google Scholar
  42. Online suspicious event detection in a constrained environment with RG...
    Go to citation Crossref Google Scholar
  43. Distillation of human–object interaction contexts for action recogniti...
    Go to citation Crossref Google Scholar
  44. Recognition of Human-object Interaction in Video through a Two-stream ...
    Go to citation Crossref Google Scholar
  45. STIT: Spatio-Temporal Interaction Transformers for Human-Object Intera...
    Go to citation Crossref Google Scholar
  46. Human activity recognition in artificial intelligence framework: a nar...
    Go to citation Crossref Google Scholar
  47. Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object...
    Go to citation Crossref Google Scholar
  48. Deep learning and RGB-D based human action, human–human and human–obje...
    Go to citation Crossref Google Scholar
  49. A Versatile Affordance Modeling Framework Using Screw Primitives to In...
    Go to citation Crossref Google Scholar
  50. One-shot Video Graph Generation for Explainable Action Reasoning
    Go to citation Crossref Google Scholar
  51. Online human action detection and anticipation in videos: A survey
    Go to citation Crossref Google Scholar
  52. Understanding 3D Object Articulation in Internet Videos
    Go to citation Crossref Google Scholar
  53. Learning Affordance Grounding from Exocentric Images
    Go to citation Crossref Google Scholar
  54. Complex Video Action Reasoning via Learnable Markov Logic Network
    Go to citation Crossref Google Scholar
  55. Human Action Recognition and Prediction: A Survey
    Go to citation Crossref Google Scholar
  56. ChaLearn Looking at People: IsoGD and ConGD Large-Scale RGB-D Gesture ...
    Go to citation Crossref Google Scholar
  57. 3D Human Action Recognition: Through the eyes of researchers
    Go to citation Crossref Google Scholar
  58. Visual Affordance and Function Understanding
    Go to citation Crossref Google Scholar
  59. Spatio-temporal analysis and comparison of 3D videos
    Go to citation Crossref Google Scholar
  60. ANew VR Kitchen Environment for Recording Well Annotated Object Intera...
    Go to citation Crossref Google Scholar
  61. DTR-HAR: deep temporal residual representation for human activity reco...
    Go to citation Crossref Google Scholar
  62. A semi-supervised deep learning based video anomaly detection framewor...
    Go to citation Crossref Google Scholar
  63. Skeleton Graph-Neural-Network-Based Human Action Recognition: A Survey
    Go to citation Crossref Google Scholar
  64. A Vision-Based Measure of Environmental Effects on Inferring Human Int...
    Go to citation Crossref Google Scholar
  65. Improved human-object interaction detection through skeleton-object re...
    Go to citation Crossref Google Scholar
  66. Egocentric Activity Recognition and Localization on a 3D Map
    Go to citation Crossref Google Scholar
  67. Human-Object Interaction Detection: A Survey of Deep Learning-Based Me...
    Go to citation Crossref Google Scholar
  68. Graphing the Future: Activity and Next Active Object Prediction Using ...
    Go to citation Crossref Google Scholar
  69. Human Intent Prediction for Human-Robot Collaboration
    Go to citation Crossref Google Scholar
  70. Learning for action-based scene understanding
    Go to citation Crossref Google Scholar
  71. Task Recognition in Human-Robot Collaboration for Consumer Electronics...
    Go to citation Crossref Google Scholar
  72. Angular Features-Based Human Action Recognition System for a Real Appl...
    Go to citation Crossref Google Scholar
  73. We Know Where They Are Looking at From the RGB-D Camera: Gaze Followin...
    Go to citation Crossref Google Scholar
  74. Human Action Recognition From Various Data Modalities: A Review
    Go to citation Crossref Google Scholar
  75. Geometric Features Informed Multi-person Human-Object Interaction Reco...
    Go to citation Crossref Google Scholar
  76. Hallucinating Pose-Compatible Scenes
    Go to citation Crossref Google Scholar
  77. NTU-DensePose: A New Benchmark for Dense Pose Action Recognition
    Go to citation Crossref Google Scholar
  78. Switching Structured Prediction for Simple and Complex Human Activity ...
    Go to citation Crossref Google Scholar
  79. Action Recognition with Fusion of Multiple Graph Convolutional Network...
    Go to citation Crossref Google Scholar
  80. Modeling Object’s Affordances via Reward Functions
    Go to citation Crossref Google Scholar
  81. Spatio-Temporal Interaction Graph Parsing Networks for Human-Object In...
    Go to citation Crossref Google Scholar
  82. RGB+D and deep learning-based real-time detection of suspicious event ...
    Go to citation Crossref Google Scholar
  83. Detecting Human-Object Relationships in Videos
    Go to citation Crossref Google Scholar
  84. Seeing the Unseen: Predicting the First-Person Camera Wearer’s Locatio...
    Go to citation Crossref Google Scholar
  85. Inferring object properties from human interaction and transferring th...
    Go to citation Crossref Google Scholar
  86. ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detec...
    Go to citation Crossref Google Scholar
  87. HOIsim: Synthesizing Realistic 3D Human-Object Interaction Data for Hu...
    Go to citation Crossref Google Scholar
  88. A Generalized Earley Parser for Human Activity Parsing and Prediction
    Go to citation Crossref Google Scholar
  89. The KIT Bimanual Manipulation Dataset
    Go to citation Crossref Google Scholar
  90. Scene-Perception Graph Convolutional Networks for Human Action Predict...
    Go to citation Crossref Google Scholar
  91. Deep learning approaches for human-centered IoT applications in smart ...
    Go to citation Crossref Google Scholar
  92. M&M: Recognizing Multiple Co-evolving Activities From Multi-source Vid...
    Go to citation Crossref Google Scholar
  93. Exploring the usefulness of Object Affordances through a Knowledge bas...
    Go to citation Crossref Google Scholar
  94. Robotics Dexterous Grasping: The Methods Based on Point Cloud and Deep...
    Go to citation Crossref Google Scholar
  95. Skeleton-based structured early activity prediction
    Go to citation Crossref Google Scholar
  96. Variable structure Human Intention Estimator with mobility and vision ...
    Go to citation Crossref Google Scholar
  97. 3D AffordanceNet: A Benchmark for Visual Object Affordance Understandi...
    Go to citation Crossref Google Scholar
  98. Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes
    Go to citation Crossref Google Scholar
  99. Learning Asynchronous and Sparse Human-Object Interaction in Videos
    Go to citation Crossref Google Scholar
  100. Two-stream 2D/3D Residual Networks for Learning Robot Manipulations fr...
    Go to citation Crossref Google Scholar
  101. Spatial–Temporal Relation Reasoning for Action Prediction in Videos
    Go to citation Crossref Google Scholar
  102. Action recognition using kinematics posture feature on 3D skeleton joi...
    Go to citation Crossref Google Scholar
  103. Activity recognition in video sequences over qualitative abstracts of ...
    Go to citation Crossref Google Scholar
  104. Endowing Robots with Longer-term Autonomy by Recovering from External ...
    Go to citation Crossref Google Scholar
  105. Spatio-temporal multi-factor model for individual identification from ...
    Go to citation Crossref Google Scholar
  106. Deep Full-Body HPE for Activity Recognition from RGB Frames Only
    Go to citation Crossref Google Scholar
  107. Inferring Tasks and Fluents in Videos by Learning Causal Relations
    Go to citation Crossref Google Scholar
  108. Late Fusion of Bayesian and Convolutional Models for Action Recognitio...
    Go to citation Crossref Google Scholar
  109. IPT: A Dataset for Identity Preserved Tracking in Closed Domains
    Go to citation Crossref Google Scholar
  110. Video Activity Recognition Based on Objects Detection Using Recurrent ...
    Go to citation Crossref Google Scholar
  111. Visual Methods for Sign Language Recognition: A Modality-Based Review
    Go to citation Crossref Google Scholar
  112. Action Prediction During Human-Object Interaction Based on DTW and Ear...
    Go to citation Crossref Google Scholar
  113. Inferring Semantic Object Affordances from Videos
    Go to citation Crossref Google Scholar
  114. An Robot Vision Grasping Network Based on Inception-Lite
    Go to citation Crossref Google Scholar
  115. Joint Object Affordance Reasoning and Segmentation in RGB-D Videos
    Go to citation Crossref Google Scholar
  116. Predicting Task-Driven Attention via Integrating Bottom-Up Stimulus an...
    Go to citation Crossref Google Scholar
  117. Exploration of Spatial and Temporal Modeling Alternatives for HOI
    Go to citation Crossref Google Scholar
  118. Evaluation and Optimization of Dual-Arm Robot Path Planning for Human–...
    Go to citation Crossref Google Scholar
  119. Video Action Understanding
    Go to citation Crossref Google Scholar
  120. OSD: An Occlusion Skeleton Dataset for Action Recognition
    Go to citation Crossref Google Scholar
  121. Automatic Employability Test for Factory Workers using Collaborative F...
    Go to citation Crossref Google Scholar
  122. Sensor-based and vision-based human activity recognition: A comprehens...
    Go to citation Crossref Google Scholar
  123. Predicting human navigation goals based on Bayesian inference and acti...
    Go to citation Crossref Google Scholar
  124. Human-in-the-Loop Robot Control for Human-Robot Collaboration: Human I...
    Go to citation Crossref Google Scholar
  125. Sustainable Wearable System: Human Behavior Modeling for Life-Logging ...
    Go to citation Crossref Google Scholar
  126. A knowledge-driven layered inverse reinforcement learning approach for...
    Go to citation Crossref Google Scholar
  127. EV-Action: Electromyography-Vision Multi-Modal Action Dataset
    Go to citation Crossref Google Scholar
  128. Rate-Invariant Modeling in Lie Algebra for Activity Recognition
    Go to citation Crossref Google Scholar
  129. ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize D...
    Go to citation Crossref Google Scholar
  130. LIGHTEN
    Go to citation Crossref Google Scholar
  131. FPHA-Afford: A Domain-Specific Benchmark Dataset for Occluded Object A...
    Go to citation Crossref Google Scholar
  132. NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understan...
    Go to citation Crossref Google Scholar
  133. Estimation of Sweet Pepper Crop Fresh Weight with Convolutional Neural...
    Go to citation Crossref Google Scholar
  134. Object affordance detection with relationship-aware network
    Go to citation Crossref Google Scholar
  135. Thermal comfort measurement using thermal-depth images for robotic mon...
    Go to citation Crossref Google Scholar
  136. Prediction of Human Activities Based on a New Structure of Skeleton Fe...
    Go to citation Crossref Google Scholar
  137. Anticipating Human Intention for Full-Body Motion Prediction in Object...
    Go to citation Crossref Google Scholar
  138. Symbolic Learning and Reasoning With Noisy Data for Probabilistic Anch...
    Go to citation Crossref Google Scholar
  139. Hierarchical and parameterized learning of pick-and-place manipulation...
    Go to citation Crossref Google Scholar
  140. Learning to infer human attention in daily activities
    Go to citation Crossref Google Scholar
  141. Fine-grained action plausibility rating
    Go to citation Crossref Google Scholar
  142. An Ensemble of Knowledge Sharing Models for Dynamic Hand Gesture Recog...
    Go to citation Crossref Google Scholar
  143. A Survey of Human Action Analysis in HRI Applications
    Go to citation Crossref Google Scholar
  144. Generating 3D People in Scenes Without People
    Go to citation Crossref Google Scholar
  145. Context-Aware Human Motion Prediction
    Go to citation Crossref Google Scholar
  146. Beyond the Self: Using Grounded Affordances to Interpret and Describe ...
    Go to citation Crossref Google Scholar
  147. Roombots extended: Challenges in the next generation of self-reconfigu...
    Go to citation Crossref Google Scholar
  148. Simultaneous Learning from Human Pose and Object Cues for Real-Time Ac...
    Go to citation Crossref Google Scholar
  149. KETO: Learning Keypoint Representations for Tool Manipulation
    Go to citation Crossref Google Scholar
  150. Fine-Grained Action Recognition by Motion Saliency and Mid-Level Patch...
    Go to citation Crossref Google Scholar
  151. Learning task-oriented grasping for tool manipulation from simulated s...
    Go to citation Crossref Google Scholar
  152. Vision-based human action recognition: An overview and real world chal...
    Go to citation Crossref Google Scholar
  153. A multimodal approach for human activity recognition based on skeleton...
    Go to citation Crossref Google Scholar
  154. Semantic Relational Object Tracking
    Go to citation Crossref Google Scholar
  155. Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmen...
    Go to citation Crossref Google Scholar
  156. Double-layer conditional random fields model for human action recognit...
    Go to citation Crossref Google Scholar
  157. Action Recognition Based on the Fusion of Graph Convolutional Networks...
    Go to citation Crossref Google Scholar
  158. Text Like Classification of Skeletal Sequences for Human Action Recogn...
    Go to citation Crossref Google Scholar
  159. Human Activity Understanding
    Go to citation Crossref Google Scholar
  160. Social Activity Recognition on Continuous RGB-D Video Sequences
    Go to citation Crossref Google Scholar
  161. Dynamic graph convolutional networks
    Go to citation Crossref Google Scholar
  162. Prognosing Human Activity Using Actions Forecast and Structured Databa...
    Go to citation Crossref Google Scholar
  163. Learning Dynamic Spatio-Temporal Relations for Human Activity Recognit...
    Go to citation Crossref Google Scholar
  164. Learning Object-Action Relations from Bimanual Human Demonstration Usi...
    Go to citation Crossref Google Scholar
  165. A Comparative Review of Recent Kinect-Based Action Recognition Algorit...
    Go to citation Crossref Google Scholar
  166. Human Intention Estimation using Fusion of Pupil and Hand Motion
    Go to citation Crossref Google Scholar
  167. Representation Learning on Visual-Symbolic Graphs for Video Understand...
    Go to citation Crossref Google Scholar
  168. LEMMA: A Multi-view Dataset for L Earning Multi-agent Multi-task Activ...
    Go to citation Crossref Google Scholar
  169. Learning attentive dynamic maps (ADMs) for Understanding Human Actions
    Go to citation Crossref Google Scholar
  170. Motion Segmentation via Generalized Curvatures
    Go to citation Crossref Google Scholar
  171. Learning and Comfort in Human–Robot Interaction: A Review
    Go to citation Crossref Google Scholar
  172. A weighting scheme for mining key skeletal joints for human action rec...
    Go to citation Crossref Google Scholar
  173. Simitate: A Hybrid Imitation Learning Benchmark
    Go to citation Crossref Google Scholar
  174. Improving Robot Success Detection using Static Object Data
    Go to citation Crossref Google Scholar
  175. Towards Action Prediction Applying Deep Learning
    Go to citation Crossref Google Scholar
  176. Early Action Prediction by Soft Regression
    Go to citation Crossref Google Scholar
  177. Explainable Video Action Reasoning via Prior Knowledge and State Trans...
    Go to citation Crossref Google Scholar
  178. Resolving 3D Human Pose Ambiguities With 3D Scene Constraints
    Go to citation Crossref Google Scholar
  179. YogaNet: 3-D Yoga Asana Recognition Using Joint Angular Displacement M...
    Go to citation Crossref Google Scholar
  180. A Survey on Predicting Resident Intentions Using Contextual Modalities...
    Go to citation Crossref Google Scholar
  181. An Ontology for Human-Human Interactions and Learning Interaction Beha...
    Go to citation Crossref Google Scholar
  182. A New Bayesian Modeling for 3D Human-Object Action Recognition
    Go to citation Crossref Google Scholar
  183. Human Motion Prediction Based on Object Interactions
    Go to citation Crossref Google Scholar
  184. Help by Predicting What to Do
    Go to citation Crossref Google Scholar
  185. Compositional Learning of Human Activities With a Self-Organizing Neur...
    Go to citation Crossref Google Scholar
  186. Affordance-based altruistic robotic architecture for human–robot colla...
    Go to citation Crossref Google Scholar
  187. A Survey of Knowledge Representation in Service Robotics
    Go to citation Crossref Google Scholar
  188. Two-Stream Network with 3D Common-Specific Framework for RGB-D Action ...
    Go to citation Crossref Google Scholar
  189. A benchmark image dataset for industrial tools
    Go to citation Crossref Google Scholar
  190. Speeding Up Affordance Learning for Tool Use, Using Proprioceptive and...
    Go to citation Crossref Google Scholar
  191. Activities Prediction Using Structured Data Base
    Go to citation Crossref Google Scholar
  192. Benchmarking Search and Annotation in Continuous Human Skeleton Sequen...
    Go to citation Crossref Google Scholar
  193. H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Intera...
    Go to citation Crossref Google Scholar
  194. Putting Humans in a Scene: Learning Affordance in 3D Indoor Environmen...
    Go to citation Crossref Google Scholar
  195. Predicting the What and How - a Probabilistic Semi-Supervised Approach...
    Go to citation Crossref Google Scholar
  196. Who Takes What: Using RGB-D Camera and Inertial Sensor for Unmanned Mo...
    Go to citation Crossref Google Scholar
  197. Complex Human–Object Interactions Analyzer Using a DCNN and SVM Hybrid...
    Go to citation Crossref Google Scholar
  198. Learning a Generative Model for Multi‐Step Human‐Object Interactions f...
    Go to citation Crossref Google Scholar
  199. Discriminative bit selection hashing in RGB-D based object recognition...
    Go to citation Crossref Google Scholar
  200. Skeleton-Based Human Action Recognition by Pose Specificity and Weight...
    Go to citation Crossref Google Scholar
  201. Facilitating Human–Robot Collaborative Tasks by Teaching-Learning-Coll...
    Go to citation Crossref Google Scholar
  202. Effective Behavioural Dynamic Coupling through Echo State Networks
    Go to citation Crossref Google Scholar
  203. Predicting Human Actions Taking into Account Object Affordances
    Go to citation Crossref Google Scholar
  204. Fusing depth and colour information for human action recognition
    Go to citation Crossref Google Scholar
  205. Accurate detection of sitting posture activities in a secure IoT based...
    Go to citation Crossref Google Scholar
  206. COSMO: Contextualized scene modeling with Boltzmann Machines
    Go to citation Crossref Google Scholar
  207. RGB-D Action Recognition Using Multimodal Correlative Representation L...
    Go to citation Crossref Google Scholar
  208. A Comprehensive Survey of Vision-Based Human Action Recognition Method...
    Go to citation Crossref Google Scholar
  209. Collaborative multimodal feature learning for RGB-D action recognition
    Go to citation Crossref Google Scholar
  210. Perceiving the person and their interactions with the others for socia...
    Go to citation Crossref Google Scholar
  211. Multi-Scale Guided Mask Refinement for Coarse-to-Fine RGB-D Perception
    Go to citation Crossref Google Scholar
  212. I-Planner: Intention-aware motion planning using learning-based human ...
    Go to citation Crossref Google Scholar
  213. Predicting Action Tubes
    Go to citation Crossref Google Scholar
  214. RGB-D Sensors Data Quality Assessment and Improvement for Advanced App...
    Go to citation Crossref Google Scholar
  215. Assistive Humanoid Robots for the Elderly with Mild Cognitive Impairme...
    Go to citation Crossref Google Scholar
  216. Extended histogram: probabilistic modelling of video content temporal ...
    Go to citation Crossref Google Scholar
  217. Skeleton-Based Action Recognition With Key-Segment Descriptor and Temp...
    Go to citation Crossref Google Scholar
  218. Recognition of Assembly Tasks Based on the Actions Associated to the M...
    Go to citation Crossref Google Scholar
  219. Toward Computer Vision Systems That Understand Real-World Assembly Pro...
    Go to citation Crossref Google Scholar
  220. Advances in description of 3D human motion
    Go to citation Crossref Google Scholar
  221. Learning Complex Spatio-Temporal Configurations of Body Joints for Onl...
    Go to citation Crossref Google Scholar
  222. Joint Deep Learning for RGB-D Action Recognition
    Go to citation Crossref Google Scholar
  223. Three-dimensional convolutional restricted Boltzmann machine for human...
    Go to citation Crossref Google Scholar
  224. Human activity learning for assistive robotics using a classifier ense...
    Go to citation Crossref Google Scholar
  225. Deep-Temporal LSTM for Daily Living Action Recognition
    Go to citation Crossref Google Scholar
  226. Learning Under-Specified Object Manipulations from Human Demonstration...
    Go to citation Crossref Google Scholar
  227. Human activity recognition using dynamic representation and matching o...
    Go to citation Crossref Google Scholar
  228. Operator Awareness in Human–Robot Collaboration Through Wearable Vibro...
    Go to citation Crossref Google Scholar
  229. A self-organizing neural network architecture for learning human-objec...
    Go to citation Crossref Google Scholar
  230. Object Affordance Driven Inverse Reinforcement Learning Through Concep...
    Go to citation Crossref Google Scholar
  231. 场景知觉过程中的动作意图识别
    Go to citation Crossref Google Scholar
  232. Structured RNN for human interaction
    Go to citation Crossref Google Scholar
  233. Skeleton-based bio-inspired human activity prediction for real-time hu...
    Go to citation Crossref Google Scholar
  234. Video scene analysis: an overview and challenges on deep learning algo...
    Go to citation Crossref Google Scholar
  235. Action recognition in depth videos using hierarchical gaussian descrip...
    Go to citation Crossref Google Scholar
  236. Simultaneous joint and object trajectory templates for human activity ...
    Go to citation Crossref Google Scholar
  237. CNN+RNN Depth and Skeleton based Dynamic Hand Gesture Recognition
    Go to citation Crossref Google Scholar
  238. Data-Driven, 3-D Classification of Person-Object Relationships and Sem...
    Go to citation Crossref Google Scholar
  239. Robust 3D Action Recognition Through Sampling Local Appearances and Gl...
    Go to citation Crossref Google Scholar
  240. Learning Stable Movement Primitives by Finding a Suitable Fuzzy Lyapun...
    Go to citation Crossref Google Scholar
  241. Action-Attending Graphic Neural Network
    Go to citation Crossref Google Scholar
  242. Gaze and motion information fusion for human intention inference
    Go to citation Crossref Google Scholar
  243. Robot action planning by online optimization in human–robot collaborat...
    Go to citation Crossref Google Scholar
  244. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Kno...
    Go to citation Crossref Google Scholar
  245. Demo2Vec: Reasoning Object Affordances from Online Videos
    Go to citation Crossref Google Scholar
  246. From Lifestyle Vlogs to Everyday Interactions
    Go to citation Crossref Google Scholar
  247. Human Action Recognition from Motion Trajectory using Fourier Temporal...
    Go to citation Crossref Google Scholar
  248. Human-Robot Interaction Control Through Demonstration
    Go to citation Crossref Google Scholar
  249. Exploiting ability for human adaptation to facilitate improved human-r...
    Go to citation Crossref Google Scholar
  250. Functional Object-Oriented Network: Construction & Expansion
    Go to citation Crossref Google Scholar
  251. Fusing Object Context to Detect Functional Area for Cognitive Robots
    Go to citation Crossref Google Scholar
  252. Object-Centric Approach to Prediction and Labeling of Manipulation Tas...
    Go to citation Crossref Google Scholar
  253. Teaching a Robot the Semantics of Assembly Tasks
    Go to citation Crossref Google Scholar
  254. Structure-Aware Multimodal Feature Fusion for RGB-D Scene Classificati...
    Go to citation Crossref Google Scholar
  255. Representation, Analysis, and Recognition of 3D Humans
    Go to citation Crossref Google Scholar
  256. Human Weapon-Activity Recognition in Surveillance Videos Using Structu...
    Go to citation Crossref Google Scholar
  257. Unsupervised early prediction of human reaching for human–robot collab...
    Go to citation Crossref Google Scholar
  258. Low-Cost Automatic Ambient Assisted Living System
    Go to citation Crossref Google Scholar
  259. Affordances in Psychology, Neuroscience, and Robotics: A Survey
    Go to citation Crossref Google Scholar
  260. Bootstrapping Relational Affordances of Object Pairs Using Transfer
    Go to citation Crossref Google Scholar
  261. End-to-End Fine-Grained Action Segmentation and Recognition Using Cond...
    Go to citation Crossref Google Scholar
  262. Instance-Aware Detailed Action Labeling in Videos
    Go to citation Crossref Google Scholar
  263. Structural Recurrent Neural Network (SRNN) for Group Activity Analysis
    Go to citation Crossref Google Scholar
  264. An Unsupervised Machine Learning Approach to Assessing Designer Perfor...
    Go to citation Crossref Google Scholar
  265. Distances evolution analysis for online and off-line human object inte...
    Go to citation Crossref Google Scholar
  266. Watch-n-Patch: Unsupervised Learning of Actions and Relations
    Go to citation Crossref Google Scholar
  267. Two‐person activity recognition using skeleton data
    Go to citation Crossref Google Scholar
  268. Deep Bilinear Learning for RGB-D Action Recognition
    Go to citation Crossref Google Scholar
  269. Learning Human-Object Interactions by Graph Parsing Neural Networks
    Go to citation Crossref Google Scholar
  270. Neural Graph Matching Networks for Fewshot 3D Action Recognition
    Go to citation Crossref Google Scholar
  271. Graph Distillation for Action Detection with Privileged Modalities
    Go to citation Crossref Google Scholar
  272. Human Activities Transfer Learning for Assistive Robotics
    Go to citation Crossref Google Scholar
  273. A Large Scale RGB-D Dataset for Action Recognition
    Go to citation Crossref Google Scholar
  274. Bidirectional invariant representation of rigid body motions and its a...
    Go to citation Crossref Google Scholar
  275. Integrating multi-purpose natural language understanding, robot’s memo...
    Go to citation Crossref Google Scholar
  276. Multimodal Trip Hazard Affordance Detection on Construction Sites
    Go to citation Crossref Google Scholar
  277. Human Motion Segmentation via Robust Kernel Sparse Subspace Clustering
    Go to citation Crossref Google Scholar
  278. Motion Trajectory for Human Action Recognition Using Fourier Temporal ...
    Go to citation Crossref Google Scholar
  279. Robust echo state networks based on correntropy induced loss function
    Go to citation Crossref Google Scholar
  280. Affordance learning and inference based on vision-speech association i...
    Go to citation Crossref Google Scholar
  281. Affordance triggering for arbitrary states based on robot exploring
    Go to citation Crossref Google Scholar
  282. Emergent Structuring of Interdependent Affordance Learning Tasks Using...
    Go to citation Crossref Google Scholar
  283. A human activity recognition framework using max-min features and key ...
    Go to citation Crossref Google Scholar
  284. A dynamic Markov model for n th -order mov...
    Go to citation Crossref Google Scholar
  285. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition
    Go to citation Crossref Google Scholar
  286. Computational models of affordance in robotics: a taxonomy and systema...
    Go to citation Crossref Google Scholar
  287. Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos
    Go to citation Crossref Google Scholar
  288. Predicting Human Activities Using Stochastic Grammar
    Go to citation Crossref Google Scholar
  289. Jointly Recognizing Object Fluents and Tasks in Egocentric Videos
    Go to citation Crossref Google Scholar
  290. Probabilistic Structure from Motion with Objects (PSfMO)
    Go to citation Crossref Google Scholar
  291. Adaptive Binarization for Weakly Supervised Affordance Segmentation
    Go to citation Crossref Google Scholar
  292. Use of Thermal Point Cloud for Thermal Comfort Measurement and Human P...
    Go to citation Crossref Google Scholar
  293. Learning to Recognize Human Activities Using Soft Labels
    Go to citation Crossref Google Scholar
  294. Human‐action recognition using a multi‐layered fusion scheme of Kinect...
    Go to citation Crossref Google Scholar
  295. Sentence Directed Video Object Codiscovery
    Go to citation Crossref Google Scholar
  296. An affordances based approach to assisted teleoperation
    Go to citation Crossref Google Scholar
  297. Structured prediction with short/long-range dependencies for human act...
    Go to citation Crossref Google Scholar
  298. Two-stream RNN/CNN for action recognition in 3D videos
    Go to citation Crossref Google Scholar
  299. Cooking Behavior Recognition Using Egocentric Vision for Cooking Navig...
    Go to citation Crossref Google Scholar
  300. Structured LSTM for human-object interaction detection and anticipatio...
    Go to citation Crossref Google Scholar
  301. Action recognition based on a mixture of RGB and depth based skeleton
    Go to citation Crossref Google Scholar
  302. Human Action Recognition with RGB-D Sensors
    Go to citation Crossref Google Scholar
  303. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
    Go to citation Crossref Google Scholar
  304. SCC: Semantic Context Cascade for Efficient Action Detection
    Go to citation Crossref Google Scholar
  305. Learning and Refining of Privileged Information-Based RNNs for Action ...
    Go to citation Crossref Google Scholar
  306. Weakly Supervised Affordance Detection
    Go to citation Crossref Google Scholar
  307. The Impact of Typicality for Informative Representative Selection
    Go to citation Crossref Google Scholar
  308. Non-uniform Subset Selection for Active Learning in Structured Data
    Go to citation Crossref Google Scholar
  309. Key-frame Extraction With Semantic Graphs in Assembly Processes
    Go to citation Crossref Google Scholar
  310. Action prediction based on physically grounded object affordances in h...
    Go to citation Crossref Google Scholar
  311. Learning Human Activities for Assisted Living Robotics
    Go to citation Crossref Google Scholar
  312. Detecting Biological Motion for Human–Robot Interaction: A Link betwee...
    Go to citation Crossref Google Scholar
  313. Human activity recognition using segmented body part and body joint fe...
    Go to citation Crossref Google Scholar
  314. Modeling 4D Human-Object Interactions for Joint Event Segmentation, Re...
    Go to citation Crossref Google Scholar
  315. The toronto rehab stroke pose dataset to detect compensation during st...
    Go to citation Crossref Google Scholar
  316. Space-time representation of people based on 3D skeletal data: A revie...
    Go to citation Crossref Google Scholar
  317. Generation of action description from classification of motion and obj...
    Go to citation Crossref Google Scholar
  318. EGGNOG: A Continuous, Multi-modal Data Set of Naturally Occurring Gest...
    Go to citation Crossref Google Scholar
  319. The DAily Home LIfe Activity Dataset: A High Semantic Activity Dataset...
    Go to citation Crossref Google Scholar
  320. What can i do around here? Deep functional scene understanding for cog...
    Go to citation Crossref Google Scholar
  321. Semantic analysis of manipulation actions using spatial relations
    Go to citation Crossref Google Scholar
  322. An overview of action recognition in videos
    Go to citation Crossref Google Scholar
  323. Healthy human sitting posture estimation in RGB-D scenes using object ...
    Go to citation Crossref Google Scholar
  324. Human Intention Inference Using Expectation-Maximization Algorithm Wit...
    Go to citation Crossref Google Scholar
  325. Automatic Learning of Articulated Skeletons Based on Mean of 3D Joints...
    Go to citation Crossref Google Scholar
  326. Predicting human activities in sequences of actions in RGB-D videos
    Go to citation Crossref Google Scholar
  327. Semantic Decomposition and Recognition of Long and Complex Manipulatio...
    Go to citation Crossref Google Scholar
  328. A novel local feature descriptor based on energy information for human...
    Go to citation Crossref Google Scholar
  329. A Multilevel Body Motion-Based Human Activity Analysis Methodology
    Go to citation Crossref Google Scholar
  330. Context-Associative Hierarchical Memory Model for Human Activity Recog...
    Go to citation Crossref Google Scholar
  331. RGB-D datasets using microsoft kinect or similar sensors: a survey
    Go to citation Crossref Google Scholar
  332. Learning Actions to Improve the Perceptual Anchoring of Objects
    Go to citation Crossref Google Scholar
  333. O-PrO: An Ontology for Object Affordance Reasoning
    Go to citation Crossref Google Scholar
  334. Assistive Humanoid Robots for the Elderly with Mild Cognitive Impairme...
    Go to citation Crossref Google Scholar
  335. Physics Simulation Games
    Go to citation Crossref Google Scholar
  336. Intention Inference for Human-Robot Collaboration in Assistive Robotic...
    Go to citation Crossref Google Scholar
  337. Visual Information-Based Activity Recognition and Fall Detection for A...
    Go to citation Crossref Google Scholar
  338. Motion segment decomposition of RGB-D sequences for human behavior und...
    Go to citation Crossref Google Scholar
  339. Facial Expression Recognition Utilizing Local Direction-Based Robust F...
    Go to citation Crossref Google Scholar
  340. Multimodal Gesture Recognition Using 3-D Convolution and Convolutional...
    Go to citation Crossref Google Scholar
  341. Feature-Based Resource Allocation for Real-Time Stereo Disparity Estim...
    Go to citation Crossref Google Scholar
  342. Tracking a Subset of Skeleton Joints: An Effective Approach towards Co...
    Go to citation Crossref Google Scholar
  343. Human gesture recognition performance evaluation for service robots
    Go to citation Crossref Google Scholar
  344. Hybrid Multi-modal Fusion for Human Action Recognition
    Go to citation Crossref Google Scholar
  345. RGB-D-based action recognition datasets: A survey
    Go to citation Crossref Google Scholar
  346. Recent Data Sets on Object Manipulation: A Survey
    Go to citation Crossref Google Scholar
  347. A review of RGB-D sensor calibration for night vision
    Go to citation Crossref Google Scholar
  348. A probabilistic graphical model approach for human activity recognitio...
    Go to citation Crossref Google Scholar
  349. Training Agents With Interactive Reinforcement Learning and Contextual...
    Go to citation Crossref Google Scholar
  350. Affordance Research in Developmental Robotics: A Survey
    Go to citation Crossref Google Scholar
  351. Toward Simple Strategy for Optimal Tracking and Localization of Robots...
    Go to citation Crossref Google Scholar
  352. Biological movement detector enhances the attentive skills of humanoid...
    Go to citation Crossref Google Scholar
  353. Learning human-robot handovers through π-STAM: Policy improvement with...
    Go to citation Crossref Google Scholar
  354. Human pose recognition and tracking using RGB-D camera
    Go to citation Crossref Google Scholar
  355. Affordance processing in segregated parieto-frontal dorsal stream sub-...
    Go to citation Crossref Google Scholar
  356. Human intent forecasting using intrinsic kinematic constraints
    Go to citation Crossref Google Scholar
  357. Human activity recognition based on weighted limb features
    Go to citation Crossref Google Scholar
  358. Modeling 3D Environments through Hidden Human Context
    Go to citation Crossref Google Scholar
  359. Human Activity Recognition from automatically labeled data in RGB-D vi...
    Go to citation Crossref Google Scholar
  360. Optimal depth recovery using image guided TGV with depth confidence fo...
    Go to citation Crossref Google Scholar
  361. Modeling spatial layout of features for real world scenario RGB-D acti...
    Go to citation Crossref Google Scholar
  362. RGB-D based daily activity recognition for service robots using cluste...
    Go to citation Crossref Google Scholar
  363. Simplified industrial robot programming: Effects of errors on multimod...
    Go to citation Crossref Google Scholar
  364. PiGraphs
    Go to citation Crossref Google Scholar
  365. Humanoid infers Archimedes' principle: understanding phy...
    Go to citation Crossref Google Scholar
  366. Action Recognition Based on Efficient Deep Feature Learning in the Spa...
    Go to citation Crossref Google Scholar
  367. A depth video-based facial expression recognition system utilizing gen...
    Go to citation Crossref Google Scholar
  368. A Deep Structured Model with Radius–Margin Bound for 3D Human Activity...
    Go to citation Crossref Google Scholar
  369. Multi-modal RGB–Depth–Thermal Human Body Segmentation
    Go to citation Crossref Google Scholar
  370. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
    Go to citation Crossref Google Scholar
  371. 3D Semantic Parsing of Large-Scale Indoor Spaces
    Go to citation Crossref Google Scholar
  372. Structural-RNN: Deep Learning on Spatio-Temporal Graphs
    Go to citation Crossref Google Scholar
  373. Learning Action Maps of Large Environments via First-Person Vision
    Go to citation Crossref Google Scholar
  374. RGBD Datasets: Past, Present and Future
    Go to citation Crossref Google Scholar
  375. Understanding Human Behaviors with an Object Functional Role Perspecti...
    Go to citation Crossref Google Scholar
  376. Deep metric learning autoencoder for nonlinear temporal alignment of h...
    Go to citation Crossref Google Scholar
  377. Efficient, dense, object-based segmentation from RGBD video
    Go to citation Crossref Google Scholar
  378. Watch-Bot: Unsupervised learning for reminding humans of forgotten act...
    Go to citation Crossref Google Scholar
  379. Recurrent Neural Networks for driver activity anticipation via sensory...
    Go to citation Crossref Google Scholar
  380. From pose to activity: Surveying datasets and introducing CONVERSE
    Go to citation Crossref Google Scholar
  381. Human action recognition using multi-layer codebooks of key poses and ...
    Go to citation Crossref Google Scholar
  382. Improving face detection with depth
    Go to citation Crossref Google Scholar
  383. Deep recursive and hierarchical conditional random fields for human ac...
    Go to citation Crossref Google Scholar
  384. Tell me Dave: Context-sensitive grounding of natural language to manip...
    Go to citation Crossref Google Scholar
  385. Anticipatory Planning for Human-Robot Teams
    Go to citation Crossref Google Scholar
  386. Introduction
    Go to citation Crossref Google Scholar
  387. Attribute Based Affordance Detection from Human-Object Interaction Ima...
    Go to citation Crossref Google Scholar
  388. Automatic Human Activity Segmentation and Labeling in RGBD Videos
    Go to citation Crossref Google Scholar
  389. Latent Force Models for Human Action Recognition
    Go to citation Crossref Google Scholar
  390. Recognition of Transitive Actions with Hierarchical Neural Network Lea...
    Go to citation Crossref Google Scholar
  391. Real-Time RGB-D Activity Prediction by Soft Regression
    Go to citation Crossref Google Scholar
  392. Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation
    Go to citation Crossref Google Scholar
  393. A Multi-scale CNN for Affordance Segmentation in RGB Images
    Go to citation Crossref Google Scholar
  394. Predicting the Intention of Human Activities for Real-Time Human-Robot...
    Go to citation Crossref Google Scholar
  395. STAM: A Framework for Spatio-Temporal Affordance Maps
    Go to citation Crossref Google Scholar
  396. A 3D Human Posture Approach for Activity Recognition Based on Depth Ca...
    Go to citation Crossref Google Scholar
  397. A probabilistic approach to workspace sharing for human–robot cooperat...
    Go to citation Crossref Google Scholar
  398. On the Use of Multi-Depth-Camera Based Motion Tracking Systems in Prod...
    Go to citation Crossref Google Scholar
  399. Anticipating Human Activities Using Object Affordances for Reactive Ro...
    Go to citation Crossref Google Scholar
  400. Large Displacement 3D Scene Flow with Occlusion Reasoning
    Go to citation Crossref Google Scholar
  401. Affordance matching from the shared information in multi-robot
    Go to citation Crossref Google Scholar
  402. Human action recognition using key poses and atomic motions
    Go to citation Crossref Google Scholar
  403. Human observation-based calibration of multiple RGB-D cameras for inte...
    Go to citation Crossref Google Scholar
  404. Latent Hierarchical Model for Activity Recognition
    Go to citation Crossref Google Scholar
  405. Multimodal Human Activity Recognition for Industrial Manufacturing Pro...
    Go to citation Crossref Google Scholar
  406. Activity-centric scene synthesis for functional 3D scene modeling
    Go to citation Crossref Google Scholar
  407. Recognizing complex instrumental activities of daily living using scen...
    Go to citation Crossref Google Scholar
  408. Towards efficient support relation extraction from RGBD images
    Go to citation Crossref Google Scholar
  409. Human activity modeling and prediction for assisting appliance operati...
    Go to citation Crossref Google Scholar
  410. Human Activity Recognition Process Using 3-D Posture Data
    Go to citation Crossref Google Scholar
  411. Fuzzy Temporal Segmentation and Probabilistic Recognition of Continuou...
    Go to citation Crossref Google Scholar
  412. Model-free incremental learning of the semantics of manipulation actio...
    Go to citation Crossref Google Scholar
  413. Fine manipulative action recognition through sensor fusion
    Go to citation Crossref Google Scholar
  414. A framework for unsupervised online human reaching motion recognition ...
    Go to citation Crossref Google Scholar
  415. A hierarchical representation for human activity recognition with nois...
    Go to citation Crossref Google Scholar
  416. Semantic parsing of human manipulation activities using on-line learne...
    Go to citation Crossref Google Scholar
  417. Interactive affordance map building for a robotic task
    Go to citation Crossref Google Scholar
  418. Affordance Learning Based on Subtask's Optimal Strategy
    Go to citation Crossref Google Scholar
  419. A Survey of Applications and Human Motion Recognition with Microsoft K...
    Go to citation Crossref Google Scholar
  420. Human Activity-Understanding: A Multilayer Approach Combining Body Mov...
    Go to citation Crossref Google Scholar
  421. Human robot interaction can boost robot's affordance learning: A proof...
    Go to citation Crossref Google Scholar
  422. Cognitive Learning, Monitoring and Assistance of Industrial Workflows ...
    Go to citation Crossref Google Scholar
  423. Self-organizing neural integration of pose-motion features for human a...
    Go to citation Crossref Google Scholar
  424. Discriminative key-component models for interaction detection and reco...
    Go to citation Crossref Google Scholar
  425. SUN RGB-D: A RGB-D scene understanding benchmark suite
    Go to citation Crossref Google Scholar
  426. Mining semantic affordances of visual object categories
    Go to citation Crossref Google Scholar
  427. Jointly learning heterogeneous features for RGB-D activity recognition
    Go to citation Crossref Google Scholar
  428. Staged Development of Robot Skills: Behavior Formation, Affordance Lea...
    Go to citation Crossref Google Scholar
  429. Fusing Multiple Features for Depth-Based Action Recognition
    Go to citation Crossref Google Scholar
  430. PlanIt: A crowdsourcing approach for learning to plan paths from large...
    Go to citation Crossref Google Scholar
  431. Decision making under uncertain segmentations
    Go to citation Crossref Google Scholar
  432. Affordance detection of tool parts from geometric features
    Go to citation Crossref Google Scholar
  433. 3D Reasoning from Blocks to Stability
    Go to citation Crossref Google Scholar
  434. Concept and Functional Structure of a Service Robot
    Go to citation Crossref Google Scholar
  435. Characterizing Predicate Arity and Spatial Structure for Inductive Lea...
    Go to citation Crossref Google Scholar
  436. Qualitative and Quantitative Spatio-temporal Relations in Daily Living...
    Go to citation Crossref Google Scholar
  437. Discriminative Dictionary Learning for Skeletal Action Recognition
    Go to citation Crossref Google Scholar
  438. Physics Simulation Games
    Go to citation Crossref Google Scholar
  439. Learning Dictionaries of Sparse Codes of 3D Movements of Body Joints f...
    Go to citation Crossref Google Scholar
  440. A new benchmark for pose estimation with ground truth from virtual rea...
    Go to citation Crossref Google Scholar
  441. Human activity recognition in the context of industrial human-robot in...
    Go to citation Crossref Google Scholar
  442. SceneGrok
    Go to citation Crossref Google Scholar
  443. Human Activity Recognition in Images using SVMs and Geodesics on Smoot...
    Go to citation Crossref Google Scholar
  444. 3D Human Activity Recognition with Reconfigurable Convolutional Neural...
    Go to citation Crossref Google Scholar
  445. An Efficient Local Feature-Based Facial Expression Recognition System
    Go to citation Crossref Google Scholar
  446. Mining Mid-Level Features for Action Recognition Based on Effective Sk...
    Go to citation Crossref Google Scholar
  447. Improving reinforcement learning with interactive feedback and afforda...
    Go to citation Crossref Google Scholar
  448. 3D content fingerprinting
    Go to citation Crossref Google Scholar
  449. Handling Real-World Context Awareness, Uncertainty and Vagueness in Re...
    Go to citation Crossref Google Scholar
  450. 3D human action segmentation and recognition using pose kinetic energy
    Go to citation Crossref Google Scholar
  451. Evaluating spatiotemporal interest point features for depth-based acti...
    Go to citation Crossref Google Scholar
  452. A two-layered approach to recognize high-level human activities
    Go to citation Crossref Google Scholar
  453. Human activities segmentation and location of key frames based on 3D s...
    Go to citation Crossref Google Scholar
  454. Discriminative Hierarchical Modeling of Spatio-temporally Composable H...
    Go to citation Crossref Google Scholar
  455. Learning latent structure for activity recognition
    Go to citation Crossref Google Scholar
  456. Learning Actionlet Ensemble for 3D Human Action Recognition
    Go to citation Crossref Google Scholar
  457. “Important stuff, everywhere!” Activity recognition with...
    Go to citation Crossref Google Scholar
  458. Learning Actionlet Ensemble for 3D Human Action Recognition
    Go to citation Crossref Google Scholar
  459. Physically Grounded Spatio-temporal Object Affordances
    Go to citation Crossref Google Scholar
  460. Pipelining Localized Semantic Features for Fine-Grained Action Recogni...
    Go to citation Crossref Google Scholar
  461. Detecting Social Actions of Fruit Flies
    Go to citation Crossref Google Scholar
  462. Skeleton Tracking Based Complex Human Activity Recognition Using Kinec...
    Go to citation Crossref Google Scholar
  463. Modeling 4D Human-Object Interactions for Event and Object Recognition
    Go to citation Crossref Google Scholar
  464. Infinite Latent Conditional Random Fields
    Go to citation Crossref Google Scholar
  465. Combining color and depth data for edge detection
    Go to citation Crossref Google Scholar
  466. Tangled: Learning to untangle ropes with RGB-D perception
    Go to citation Crossref Google Scholar
  467. Anticipating human activities for reactive robotic response
    Go to citation Crossref Google Scholar
  468. Affordance graph: A framework to encode perspective taking and effort ...
    Go to citation Crossref Google Scholar
  469. [The document that should appear here is unavailable]
    Go to citation Crossref Google Scholar
  470. Multilevel Depth and Image Fusion for Human Activity Detection
    Go to citation Crossref Google Scholar
  471. Representing Videos Using Mid-level Discriminative Patches
    Go to citation Crossref Google Scholar
  472. Hallucinated Humans as the Hidden Context for Labeling 3D Scenes
    Go to citation Crossref Google Scholar
  473. 3D-Based Reasoning with Blocks, Support, and Stability
    Go to citation Crossref Google Scholar
  474. Predicting Functional Regions on Objects
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:

IOM3 members can access this journal content using society membership credentials.

IOM3 members can access this journal content using society membership credentials.


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.