New content
The International Journal of Robotics Research

Open access

Research article

First published online July 11, 2013

Learning human activities and object affordances from RGB-D videos

Hema Swetha Koppula, Rudhir Gupta, and Ashutosh SaxenaView all authors and affiliations

https://doi.org/10.1177/0278364913478446

Abstract

Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our method on a challenging dataset comprising 120 activity videos collected from 4 subjects, and obtained an accuracy of 79.4% for affordance, 63.4% for sub-activity and 75.0% for high-level activity labeling. We then demonstrate the use of such descriptive labeling in performing assistive tasks by a PR2 robot.

References

Aggarwal JK, Ryoo MS (2011) Human activity analysis: A review. ACM Computer Surveys 43(3): 16.

Crossref

Google Scholar

Aksoy E, Abramov A, Worgotter F, Dellen B (2010) Categorizing object–action relations from semantic scene graphs. In: Proceedings of ICRA.

Google Scholar

Aksoy EE, Abramov A, Dörr J, Ning K, Dellen B, Wörgötter F (2011) Learning the semantics of object-action relations by observation. The International Journal of Robotics Research 30: 1229–1249.

Crossref

Google Scholar

Aldoma A, Tombari F, Vincze M (2012) Supervised learning of hidden and non-hidden 0-order affordances and detection in real scenes. In: Proceedings of ICRA.

Google Scholar

Anand A, Koppula HS, Joachims T, Saxena A (2012) Contextually guided semantic labeling and search for 3D point clouds. The International Journal of Robotics Research, in press.

Google Scholar

Bollini M, Tellex S, Thompson T, Roy N, Rus D (2012) Interpreting and executing recipes with a cooking robot. In: Proceedings of ISER.

Google Scholar

Choi C, Christensen HI (2012) Robust 3D visual tracking using particle filtering on the special Euclidean group: A combined approach of keypoint and edge features. The International Journal of Robotics Research 31: 498–519.

Crossref

Google Scholar

Collet A, Martinez M, Srinivasa SS (2011) The MOPED framework: object recognition and pose estimation for manipulation. The International Journal of Robotics Research 30: 1284–1306.

Crossref

ISI

Google Scholar

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of CVPR.

Google Scholar

Diankov R (2010) Automated Construction of Robotic Manipulation Programs. PhD thesis, Carnegie Mellon University, Robotics Institute. http://www.programmingvision.com/rosen_diankov_thesis.pdf.

Google Scholar

Felzenszwalb PF, Huttenlocher D (2004) Efficient graph-based image segmentation. International Journal of Computer Vision 59: 167–181.

Crossref

ISI

Google Scholar

Finley T, Joachims T (2008) Training structural svms when exact inference is intractable. In: Proceedings of ICML.

Crossref

Google Scholar

Fox E, Sudderth E, Jordan M, Willsky A (2011) Bayesian nonparametric inference of switching dynamic linear models. IEEE Transactions on Signal Processing 59: 1569–1585.

Crossref

Google Scholar

Gall J, Fossati A, van Gool L (2011) Functional categorization of objects using real-time markerless motion capture. In: Proceedings of CVPR.

Google Scholar

Gibson J (1979) The Ecological Approach to Visual Perception. Houghton Mifflin.

Google Scholar

Gupta A, Kembhavi A, Davis L (2009) Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31: 1775–1789.

Crossref

PubMed

Google Scholar

Hammer P, Hansen P, Simeone B (1984) Roof duality, complementation and persistency in quadratic 0–1 optimization. Mathematical Programming 28: 121–155.

Crossref

Google Scholar

Henry P, Krainin M, Herbst E, Ren X, Fox D (2012) RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research 31: 647–663.

Crossref

Google Scholar

Hermans T, Rehg JM, Bobick A (2011) Affordance prediction via learned object attributes. In: ICRA: Workshop on Semantic Perception, Mapping, and Exploration.

Google Scholar

Hoai M, De la, Torre F (2012) Maximum margin temporal clustering. In: Proceedings of International Conference on Artificial Intelligence and Statistics.

Google Scholar

Hoai M, Lan Z, De la, Torre F (2011) Joint segmentation and classification of human actions in video. In: Proceedings of CVPR.

Google Scholar

Jiang Y, Li Z, Chang S (2011a) Modeling scene and object contexts for human action retrieval with few examples. IEEE Transactions on Circuits and Systems for Video Technology 21: 674–681.

Crossref

Google Scholar

Jiang Y, Lim M, Saxena A (2012a) Learning object arrangements in 3D scenes using human context. In: Proceedings of ICML.

Google Scholar

Jiang Y, Lim M, Zheng C, Saxena A (2012b) Learning to place new objects in a scene. The International Journal of Robotics Research 31: 1021–1043.

Crossref

Google Scholar

Jiang Y, Moseson S, Saxena A (2011b) Efficient grasping from rgbd images: Learning using a new rectangle representation. In: Proceedings of ICRA.

Google Scholar

Jiang Y, Saxena A (2012) Hallucinating humans for learning robotic placement of objects. In: Proceedings of ISER.

Google Scholar

Joachims T, Finley T, Yu C (2009) Cutting-plane training of structural SVMs. Machine Learning 77: 27–59.

Crossref

Google Scholar

Kjellström H, Romero J, Kragic D (2011) Visual object-action recognition: Inferring object affordances from human demonstration. Computer Vision and Image Understanding 115: 81–90.

Crossref

ISI

Google Scholar

Koller D, Friedman N (2009) Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press.

Google Scholar

Konidaris G, Kuindersma S, Grupen R, Barto A (2012) Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research 31: 360–375.

Crossref

Google Scholar

Koppula H, Anand A, Joachims T, Saxena A (2011) Semantic labeling of 3D point clouds for indoor scenes. In: Proceedings of NIPS.

Google Scholar

Koppula HS, Gupta R, Saxena A (2012) Human activity learning using object affordances from RGB-D videos. CoRR abs/1208.0967.

Google Scholar

Kormushev P, Calinon S, Caldwell DG (2010) Robot motor skill coordination with EM-based reinforcement learning. In: Proceedings of IROS.

Google Scholar

Krainin M, Henry P, Ren X, Fox D (2011) Manipulator and object tracking for in-hand 3D object modeling. The International Journal of Robotics Research 30: 1311–1327.

Crossref

Google Scholar

Lai K, Bo L, Ren X, Fox D (2011a) A large-scale hierarchical multi-view RGB-D object dataset. In: Proceedings of ICRA.

Google Scholar

Lai K, Bo L, Ren X, Fox D (2011b) Sparse distance learning for object recognition combining RGB and depth information. In: Proceedings of ICRA.

Google Scholar

Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of CVPR.

Google Scholar

Li C, Kowdle A, Saxena A, Chen T (2012) Towards holistic scene understanding: Feedback enabled cascaded classification models. IEEE Transactions on Pattern Analysis and Machine Intelligence 34: 1394–1408.

Crossref

PubMed

Google Scholar

Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Workshop on CVPR for Human Communicative Behavior Analysis.

Crossref

Google Scholar

Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of CVPR.

Google Scholar

Lopes M, Santos-Victor J (2005) Visual learning by imitation with motor representations. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 35: 438–449.

Crossref

PubMed

Google Scholar

Matikainen P, Sukthankar R, Hebert M (2012) Model recommendation for action recognition. In: Proceedings of CVPR.

Google Scholar

Miller S, van den Berg J, Fritz M, Darrell T, Goldberg K, Abbeel P (2011) A geometric approach to robotic laundry folding. The International Journal of Robotics Research 31: 249–267.

Crossref

Google Scholar

Moldovan B, van Otterlo M, Moreno P, Santos-Victor J, De Raedt L (2012) Statistical relational learning of object affordances for robotic manipulation. In: Latest Advances in Inductive Logic Programming,.

Google Scholar

Montesano L, Lopes M, Bernardino A, Santos-Victor J (2008) Learning object affordances: From sensory–motor coordination to imitation. IEEE Transactions on Robotics 24: 15–26.

Crossref

ISI

Google Scholar

Ni B, Wang G, Moulin P (2011) Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: ICCV Workshop on Consumer Depth Cameras for Computer Vision.

Crossref

Google Scholar

Panangadan A, Mataric MJ, Sukhatme GS (2010) Tracking and modeling of human activity using laser rangefinders. International Journal of Social Robotics 2: 95–107.

Crossref

Google Scholar

Pandey A, Alami R (2012) Taskability graph: Towards analyzing effort based agent-agent affordances. In: IEEE RO-MAN, pp. 791–796.

Google Scholar

Pandey AK, Alami R (2010) Mightability maps: A perceptual level decisional framework for co-operative and competitive human–robot interaction. In: Proceedings of IROS.

Google Scholar

Pele O, Werman M (2008) A linear time histogram metric for improved SIFT matching. In: Proceedings of ECCV.

Google Scholar

Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: Proceedings of CVPR.

Google Scholar

Ridge B, Skočaj D, Leonardis A (2009) Unsupervised learning of basic object affordances from object properties. In: Proceedings of the Fourteenth Computer Vision Winter Workshop (CVWW).

Google Scholar

Rohrbach M, Amin S, Andriluka M, Schiele B (2012) A database for fine grained activity detection of cooking activities. In: Proceedings of CVPR.

Google Scholar

Rosman B, Ramamoorthy S (2011) Learning spatial relationships between objects. The International Journal of Robotics Research 30: 1328–1342.

Crossref

Google Scholar

Rother C, Kolmogorov V, Lempitsky V, Szummer M (2007) Optimizing binary MRFs via extended roof duality. In: Proceedings of CVPR.

Google Scholar

Rusu R, Bradski G, Thibaux R, Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

Crossref

Google Scholar

Rusu RB, Blodow N, Marton ZC, Beetz M (2009) Close-range scene segmentation and reconstruction of 3D point cloud maps for mobile manipulation in domestic environments. In: Proceedings of IROS.

Google Scholar

Sadanand S, Corso J (2012) Action bank: A high-level representation of activity in video. In: Proceedings of CVPR.

Google Scholar

Saxena A, Driemeyer J, Ng A (2008) Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27: 157.

Crossref

ISI

Google Scholar

Saxena A, Sun M, Ng AY (2009) Make3d: Learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31: 824–840.

Crossref

PubMed

Google Scholar

Shi Q, Wang L, Cheng L, Smola A (2011) Human action segmentation and recognition using discriminative semi-Markov models. International Journal of Computer Vision 93: 22–32.

Crossref

Google Scholar

Shotton J, Fitzgibbon A, Cook M, et al. (2011) Real-time human pose recognition in parts from single depth images. In: Proceedings of CVPR.

Google Scholar

Sun J, Moore JL, Bobick A, Rehg JM (2009) Learning visual object categories for robot affordance prediction. The International Journal of Robotics Research 29: 174–197.

Crossref

Google Scholar

Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from RGBD images. In: Proceedings of ICRA.

Google Scholar

Sung JY, Ponce C, Selman B, Saxena A (2011) Human activity detection from rgbd images. In: AAAI workshop on Pattern, Activity and Intent Recognition (PAIR).

Google Scholar

Tang K, Fei-Fei L, Koller D (2012) Learning latent temporal structure for complex event detection. In: Proceedings of CVPR.

Google Scholar

Taskar B, Chatalbashev V, Koller D (2004) Learning associative Markov networks. In: Proceedings of ICML.

Crossref

Google Scholar

Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of ICML.

Crossref

Google Scholar

Yang W, Wang Y, Mori G (2010) Recognizing human actions from still images with latent poses. In: Proceedings of CVPR.

Google Scholar

Yao B, Fei-Fei L (2010) Modeling mutual context of object and human pose in human–object interaction activities. In: Proceedings of CVPR.

Google Scholar

Yao B, Jiang X, Khosla A, Lin A, Guibas L, Fei-Fei L (2011) Action recognition by learning bases of action attributes and parts. In: Proceedings of ICCV.

Google Scholar

Yu C, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of ICML.

Crossref

Google Scholar

Zhang H, Parker LE (2011) 4-dimensional local spatio-temporal features for human activity recognition. In: Proceedings of IROS.

Google Scholar

Cite article

If you have citation software installed, you can download article citation data to the citation manager of your choice

Information, rights and permissions

Information

Published In

The International Journal of Robotics Research

Volume 32, Issue 8

Pages: 951 - 970

Article first published online: July 11, 2013

Issue published: July 2013

Keywords

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution-Non Commercial 3.0 License (http://www.creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(http://www.uk.sagepub.com/aboutus/openaccess.htm).

Request permissions for this article.

Request Permissions

Authors

Affiliations

Hema Swetha Koppula

Department of Computer Science, Cornell University, USA

View all articles by this author

Rudhir Gupta

Department of Computer Science, Cornell University, USA

View all articles by this author

Ashutosh Saxena

Department of Computer Science, Cornell University, USA

View all articles by this author

Notes

Hema Swetha Koppula, Cornell University, Upson Hall, Cornell University, Ithaca, NY 14853, USA. Email: [email protected]

Metrics and citations

Metrics

This article was published in The International Journal of Robotics Research.

VIEW ALL JOURNAL METRICS

Total views and downloads: 4239

^*Article usage tracking started in December 2016

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores

Receive email alerts when this article is cited

Web of Science: 433 view articles Opens in new tab

Crossref: 474

Grounded Affordance from Exocentric View

Go to citation Crossref Google Scholar
Interactor detection in multiview videos of parent-child play

Go to citation Crossref Google Scholar
OASNet: Object Affordance State Recognition Network With Joint Visual ...

Go to citation Crossref Google Scholar
AI Trainer : Video-Based Squat Analysis

Go to citation Crossref Google Scholar
ADOSMNet: a novel visual affordance detection network with object shap...

Go to citation Crossref Google Scholar
IKEA Ego 3D Dataset: Understanding furniture assembly actions from ego...

Go to citation Crossref Google Scholar
Multi-Label Action Anticipation for Real-World Videos With Scene Under...

Go to citation Crossref Google Scholar
Multimodal action recognition: a comprehensive survey on temporal mode...

Go to citation Crossref Google Scholar
Depth Image Restoration for High-Level Vision Tasks

Go to citation Crossref Google Scholar
Task parse tree: Learning task policy from videos with task-irrelevant...

Go to citation Crossref Google Scholar
Human activity recognition in an end-of-life consumer electronics disa...

Go to citation Crossref Google Scholar
A Method to Detect Human Hands Moving Objects

Go to citation Crossref Google Scholar
Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Obje...

Go to citation Crossref Google Scholar
MECCANO: A multimodal egocentric dataset for humans behavior understan...

Go to citation Crossref Google Scholar
The multi-angle extended three-dimensional activities (META) stimulus ...

Go to citation Crossref Google Scholar
Open Set Video HOI detection from Action-centric Chain-of-Look Prompti...

Go to citation Crossref Google Scholar
Grounding 3D Object Affordance from 2D Interactions in Images

Go to citation Crossref Google Scholar
Learning Robot Manipulation Skills From Human Demonstration Videos Usi...

Go to citation Crossref Google Scholar
RPLNet: Object-Object Affordance Recognition via Relational Phrase Lea...

Go to citation Crossref Google Scholar
GFIE: A Dataset and Baseline for Gaze-Following from 2D to 3D in Indoo...

Go to citation Crossref Google Scholar
LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affor...

Go to citation Crossref Google Scholar
Putting People in Their Place: Affordance-Aware Human Insertion into S...

Go to citation Crossref Google Scholar
Few-shot human–object interaction video recognition with transformers

Go to citation Crossref Google Scholar
PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Env...

Go to citation Crossref Google Scholar
CholecTriplet2021: A benchmark challenge for surgical action triplet r...

Go to citation Crossref Google Scholar
A Multi-modal Framework for Robots to Learn Manipulation Tasks from Hu...

Go to citation Crossref Google Scholar
Toward human activity recognition: a survey

Go to citation Crossref Google Scholar
Toyota Smarthome Untrimmed: Real-World Untrimmed Videos for Activity D...

Go to citation Crossref Google Scholar
A Step Towards Automated Functional Assessment of Activities of Daily ...

Go to citation Crossref Google Scholar
SSRT: A Sequential Skeleton RGB Transformer to Recognize Fine-Grained ...

Go to citation Crossref Google Scholar
Learning Scene-Aware Spatio-Temporal GNNs for Few-Shot Early Action Pr...

Go to citation Crossref Google Scholar
A Comprehensive Survey of RGB-Based and Skeleton-Based Human Action Re...

Go to citation Crossref Google Scholar
Applications of Deep Learning-Based Methods on Surveillance Video Stre...

Go to citation Crossref Google Scholar
Partial Alignment of Time Series for Action and Activity Prediction

Go to citation Crossref Google Scholar
Fine-grained Affordance Annotation for Egocentric Hand-Object Interact...

Go to citation Crossref Google Scholar
Placing Human Animations into 3D Scenes by Learning Interaction- and G...

Go to citation Crossref Google Scholar
Skew-Robust Human-Object Interactions in Videos

Go to citation Crossref Google Scholar
Contact Part Detection From 3D Human Motion Data Using Manually Labele...

Go to citation Crossref Google Scholar
The VISTA datasets, a combination of inertial sensors and depth camera...

Go to citation Crossref Google Scholar
A Survey of Human Action Recognition and Posture Prediction

Go to citation Crossref Google Scholar
Language-guided graph parsing attention network for human-object inter...

Go to citation Crossref Google Scholar
Online suspicious event detection in a constrained environment with RG...

Go to citation Crossref Google Scholar
Distillation of human–object interaction contexts for action recogniti...

Go to citation Crossref Google Scholar
Recognition of Human-object Interaction in Video through a Two-stream ...

Go to citation Crossref Google Scholar
STIT: Spatio-Temporal Interaction Transformers for Human-Object Intera...

Go to citation Crossref Google Scholar
Human activity recognition in artificial intelligence framework: a nar...

Go to citation Crossref Google Scholar
Spatial Parsing and Dynamic Temporal Pooling networks for Human-Object...

Go to citation Crossref Google Scholar
Deep learning and RGB-D based human action, human–human and human–obje...

Go to citation Crossref Google Scholar
A Versatile Affordance Modeling Framework Using Screw Primitives to In...

Go to citation Crossref Google Scholar
One-shot Video Graph Generation for Explainable Action Reasoning

Go to citation Crossref Google Scholar
Online human action detection and anticipation in videos: A survey

Go to citation Crossref Google Scholar
Understanding 3D Object Articulation in Internet Videos

Go to citation Crossref Google Scholar
Learning Affordance Grounding from Exocentric Images

Go to citation Crossref Google Scholar
Complex Video Action Reasoning via Learnable Markov Logic Network

Go to citation Crossref Google Scholar
Human Action Recognition and Prediction: A Survey

Go to citation Crossref Google Scholar
ChaLearn Looking at People: IsoGD and ConGD Large-Scale RGB-D Gesture ...

Go to citation Crossref Google Scholar
3D Human Action Recognition: Through the eyes of researchers

Go to citation Crossref Google Scholar
Visual Affordance and Function Understanding

Go to citation Crossref Google Scholar
Spatio-temporal analysis and comparison of 3D videos

Go to citation Crossref Google Scholar
ANew VR Kitchen Environment for Recording Well Annotated Object Intera...

Go to citation Crossref Google Scholar
DTR-HAR: deep temporal residual representation for human activity reco...

Go to citation Crossref Google Scholar
A semi-supervised deep learning based video anomaly detection framewor...

Go to citation Crossref Google Scholar
Skeleton Graph-Neural-Network-Based Human Action Recognition: A Survey

Go to citation Crossref Google Scholar
A Vision-Based Measure of Environmental Effects on Inferring Human Int...

Go to citation Crossref Google Scholar
Improved human-object interaction detection through skeleton-object re...

Go to citation Crossref Google Scholar
Egocentric Activity Recognition and Localization on a 3D Map

Go to citation Crossref Google Scholar
Human-Object Interaction Detection: A Survey of Deep Learning-Based Me...

Go to citation Crossref Google Scholar
Graphing the Future: Activity and Next Active Object Prediction Using ...

Go to citation Crossref Google Scholar
Human Intent Prediction for Human-Robot Collaboration

Go to citation Crossref Google Scholar
Learning for action-based scene understanding

Go to citation Crossref Google Scholar
Task Recognition in Human-Robot Collaboration for Consumer Electronics...

Go to citation Crossref Google Scholar
Angular Features-Based Human Action Recognition System for a Real Appl...

Go to citation Crossref Google Scholar
We Know Where They Are Looking at From the RGB-D Camera: Gaze Followin...

Go to citation Crossref Google Scholar
Human Action Recognition From Various Data Modalities: A Review

Go to citation Crossref Google Scholar
Geometric Features Informed Multi-person Human-Object Interaction Reco...

Go to citation Crossref Google Scholar
Hallucinating Pose-Compatible Scenes

Go to citation Crossref Google Scholar
NTU-DensePose: A New Benchmark for Dense Pose Action Recognition

Go to citation Crossref Google Scholar
Switching Structured Prediction for Simple and Complex Human Activity ...

Go to citation Crossref Google Scholar
Action Recognition with Fusion of Multiple Graph Convolutional Network...

Go to citation Crossref Google Scholar
Modeling Object’s Affordances via Reward Functions

Go to citation Crossref Google Scholar
Spatio-Temporal Interaction Graph Parsing Networks for Human-Object In...

Go to citation Crossref Google Scholar
RGB+D and deep learning-based real-time detection of suspicious event ...

Go to citation Crossref Google Scholar
Detecting Human-Object Relationships in Videos

Go to citation Crossref Google Scholar
Seeing the Unseen: Predicting the First-Person Camera Wearer’s Locatio...

Go to citation Crossref Google Scholar
Inferring object properties from human interaction and transferring th...

Go to citation Crossref Google Scholar
ST-HOI: A Spatial-Temporal Baseline for Human-Object Interaction Detec...

Go to citation Crossref Google Scholar
HOIsim: Synthesizing Realistic 3D Human-Object Interaction Data for Hu...

Go to citation Crossref Google Scholar
A Generalized Earley Parser for Human Activity Parsing and Prediction

Go to citation Crossref Google Scholar
The KIT Bimanual Manipulation Dataset

Go to citation Crossref Google Scholar
Scene-Perception Graph Convolutional Networks for Human Action Predict...

Go to citation Crossref Google Scholar
Deep learning approaches for human-centered IoT applications in smart ...

Go to citation Crossref Google Scholar
M&M: Recognizing Multiple Co-evolving Activities From Multi-source Vid...

Go to citation Crossref Google Scholar
Exploring the usefulness of Object Affordances through a Knowledge bas...

Go to citation Crossref Google Scholar
Robotics Dexterous Grasping: The Methods Based on Point Cloud and Deep...

Go to citation Crossref Google Scholar
Skeleton-based structured early activity prediction

Go to citation Crossref Google Scholar
Variable structure Human Intention Estimator with mobility and vision ...

Go to citation Crossref Google Scholar
3D AffordanceNet: A Benchmark for Visual Object Affordance Understandi...

Go to citation Crossref Google Scholar
Synthesizing Long-Term 3D Human Motion and Interaction in 3D Scenes

Go to citation Crossref Google Scholar
Learning Asynchronous and Sparse Human-Object Interaction in Videos

Go to citation Crossref Google Scholar
Two-stream 2D/3D Residual Networks for Learning Robot Manipulations fr...

Go to citation Crossref Google Scholar
Spatial–Temporal Relation Reasoning for Action Prediction in Videos

Go to citation Crossref Google Scholar
Action recognition using kinematics posture feature on 3D skeleton joi...

Go to citation Crossref Google Scholar
Activity recognition in video sequences over qualitative abstracts of ...

Go to citation Crossref Google Scholar
Endowing Robots with Longer-term Autonomy by Recovering from External ...

Go to citation Crossref Google Scholar
Spatio-temporal multi-factor model for individual identification from ...

Go to citation Crossref Google Scholar
Deep Full-Body HPE for Activity Recognition from RGB Frames Only

Go to citation Crossref Google Scholar
Inferring Tasks and Fluents in Videos by Learning Causal Relations

Go to citation Crossref Google Scholar
Late Fusion of Bayesian and Convolutional Models for Action Recognitio...

Go to citation Crossref Google Scholar
IPT: A Dataset for Identity Preserved Tracking in Closed Domains

Go to citation Crossref Google Scholar
Video Activity Recognition Based on Objects Detection Using Recurrent ...

Go to citation Crossref Google Scholar
Visual Methods for Sign Language Recognition: A Modality-Based Review

Go to citation Crossref Google Scholar
Action Prediction During Human-Object Interaction Based on DTW and Ear...

Go to citation Crossref Google Scholar
Inferring Semantic Object Affordances from Videos

Go to citation Crossref Google Scholar
An Robot Vision Grasping Network Based on Inception-Lite

Go to citation Crossref Google Scholar
Joint Object Affordance Reasoning and Segmentation in RGB-D Videos

Go to citation Crossref Google Scholar
Predicting Task-Driven Attention via Integrating Bottom-Up Stimulus an...

Go to citation Crossref Google Scholar
Exploration of Spatial and Temporal Modeling Alternatives for HOI

Go to citation Crossref Google Scholar
Evaluation and Optimization of Dual-Arm Robot Path Planning for Human–...

Go to citation Crossref Google Scholar
Video Action Understanding

Go to citation Crossref Google Scholar
OSD: An Occlusion Skeleton Dataset for Action Recognition

Go to citation Crossref Google Scholar
Automatic Employability Test for Factory Workers using Collaborative F...

Go to citation Crossref Google Scholar
Sensor-based and vision-based human activity recognition: A comprehens...

Go to citation Crossref Google Scholar
Predicting human navigation goals based on Bayesian inference and acti...

Go to citation Crossref Google Scholar
Human-in-the-Loop Robot Control for Human-Robot Collaboration: Human I...

Go to citation Crossref Google Scholar
Sustainable Wearable System: Human Behavior Modeling for Life-Logging ...

Go to citation Crossref Google Scholar
A knowledge-driven layered inverse reinforcement learning approach for...

Go to citation Crossref Google Scholar
EV-Action: Electromyography-Vision Multi-Modal Action Dataset

Go to citation Crossref Google Scholar
Rate-Invariant Modeling in Lie Algebra for Activity Recognition

Go to citation Crossref Google Scholar
ETRI-Activity3D: A Large-Scale RGB-D Dataset for Robots to Recognize D...

Go to citation Crossref Google Scholar
LIGHTEN

Go to citation Crossref Google Scholar
FPHA-Afford: A Domain-Specific Benchmark Dataset for Occluded Object A...

Go to citation Crossref Google Scholar
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understan...

Go to citation Crossref Google Scholar
Estimation of Sweet Pepper Crop Fresh Weight with Convolutional Neural...

Go to citation Crossref Google Scholar
Object affordance detection with relationship-aware network

Go to citation Crossref Google Scholar
Thermal comfort measurement using thermal-depth images for robotic mon...

Go to citation Crossref Google Scholar
Prediction of Human Activities Based on a New Structure of Skeleton Fe...

Go to citation Crossref Google Scholar
Anticipating Human Intention for Full-Body Motion Prediction in Object...

Go to citation Crossref Google Scholar
Symbolic Learning and Reasoning With Noisy Data for Probabilistic Anch...

Go to citation Crossref Google Scholar
Hierarchical and parameterized learning of pick-and-place manipulation...

Go to citation Crossref Google Scholar
Learning to infer human attention in daily activities

Go to citation Crossref Google Scholar
Fine-grained action plausibility rating

Go to citation Crossref Google Scholar
An Ensemble of Knowledge Sharing Models for Dynamic Hand Gesture Recog...

Go to citation Crossref Google Scholar
A Survey of Human Action Analysis in HRI Applications

Go to citation Crossref Google Scholar
Generating 3D People in Scenes Without People

Go to citation Crossref Google Scholar
Context-Aware Human Motion Prediction

Go to citation Crossref Google Scholar
Beyond the Self: Using Grounded Affordances to Interpret and Describe ...

Go to citation Crossref Google Scholar
Roombots extended: Challenges in the next generation of self-reconfigu...

Go to citation Crossref Google Scholar
Simultaneous Learning from Human Pose and Object Cues for Real-Time Ac...

Go to citation Crossref Google Scholar
KETO: Learning Keypoint Representations for Tool Manipulation

Go to citation Crossref Google Scholar
Fine-Grained Action Recognition by Motion Saliency and Mid-Level Patch...

Go to citation Crossref Google Scholar
Learning task-oriented grasping for tool manipulation from simulated s...

Go to citation Crossref Google Scholar
Vision-based human action recognition: An overview and real world chal...

Go to citation Crossref Google Scholar
A multimodal approach for human activity recognition based on skeleton...

Go to citation Crossref Google Scholar
Semantic Relational Object Tracking

Go to citation Crossref Google Scholar
Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmen...

Go to citation Crossref Google Scholar
Double-layer conditional random fields model for human action recognit...

Go to citation Crossref Google Scholar
Action Recognition Based on the Fusion of Graph Convolutional Networks...

Go to citation Crossref Google Scholar
Text Like Classification of Skeletal Sequences for Human Action Recogn...

Go to citation Crossref Google Scholar
Human Activity Understanding

Go to citation Crossref Google Scholar
Social Activity Recognition on Continuous RGB-D Video Sequences

Go to citation Crossref Google Scholar
Dynamic graph convolutional networks

Go to citation Crossref Google Scholar
Prognosing Human Activity Using Actions Forecast and Structured Databa...

Go to citation Crossref Google Scholar
Learning Dynamic Spatio-Temporal Relations for Human Activity Recognit...

Go to citation Crossref Google Scholar
Learning Object-Action Relations from Bimanual Human Demonstration Usi...

Go to citation Crossref Google Scholar
A Comparative Review of Recent Kinect-Based Action Recognition Algorit...

Go to citation Crossref Google Scholar
Human Intention Estimation using Fusion of Pupil and Hand Motion

Go to citation Crossref Google Scholar
Representation Learning on Visual-Symbolic Graphs for Video Understand...

Go to citation Crossref Google Scholar
LEMMA: A Multi-view Dataset for L Earning Multi-agent Multi-task Activ...

Go to citation Crossref Google Scholar
Learning attentive dynamic maps (ADMs) for Understanding Human Actions

Go to citation Crossref Google Scholar
Motion Segmentation via Generalized Curvatures

Go to citation Crossref Google Scholar
Learning and Comfort in Human–Robot Interaction: A Review

Go to citation Crossref Google Scholar
A weighting scheme for mining key skeletal joints for human action rec...

Go to citation Crossref Google Scholar
Simitate: A Hybrid Imitation Learning Benchmark

Go to citation Crossref Google Scholar
Improving Robot Success Detection using Static Object Data

Go to citation Crossref Google Scholar
Towards Action Prediction Applying Deep Learning

Go to citation Crossref Google Scholar
Early Action Prediction by Soft Regression

Go to citation Crossref Google Scholar
Explainable Video Action Reasoning via Prior Knowledge and State Trans...

Go to citation Crossref Google Scholar
Resolving 3D Human Pose Ambiguities With 3D Scene Constraints

Go to citation Crossref Google Scholar
YogaNet: 3-D Yoga Asana Recognition Using Joint Angular Displacement M...

Go to citation Crossref Google Scholar
A Survey on Predicting Resident Intentions Using Contextual Modalities...

Go to citation Crossref Google Scholar
An Ontology for Human-Human Interactions and Learning Interaction Beha...

Go to citation Crossref Google Scholar
A New Bayesian Modeling for 3D Human-Object Action Recognition

Go to citation Crossref Google Scholar
Human Motion Prediction Based on Object Interactions

Go to citation Crossref Google Scholar
Help by Predicting What to Do

Go to citation Crossref Google Scholar
Compositional Learning of Human Activities With a Self-Organizing Neur...

Go to citation Crossref Google Scholar
Affordance-based altruistic robotic architecture for human–robot colla...

Go to citation Crossref Google Scholar
A Survey of Knowledge Representation in Service Robotics

Go to citation Crossref Google Scholar
Two-Stream Network with 3D Common-Specific Framework for RGB-D Action ...

Go to citation Crossref Google Scholar
A benchmark image dataset for industrial tools

Go to citation Crossref Google Scholar
Speeding Up Affordance Learning for Tool Use, Using Proprioceptive and...

Go to citation Crossref Google Scholar
Activities Prediction Using Structured Data Base

Go to citation Crossref Google Scholar
Benchmarking Search and Annotation in Continuous Human Skeleton Sequen...

Go to citation Crossref Google Scholar
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Intera...

Go to citation Crossref Google Scholar
Putting Humans in a Scene: Learning Affordance in 3D Indoor Environmen...

Go to citation Crossref Google Scholar
Predicting the What and How - a Probabilistic Semi-Supervised Approach...

Go to citation Crossref Google Scholar
Who Takes What: Using RGB-D Camera and Inertial Sensor for Unmanned Mo...

Go to citation Crossref Google Scholar
Complex Human–Object Interactions Analyzer Using a DCNN and SVM Hybrid...

Go to citation Crossref Google Scholar
Learning a Generative Model for Multi‐Step Human‐Object Interactions f...

Go to citation Crossref Google Scholar
Discriminative bit selection hashing in RGB-D based object recognition...

Go to citation Crossref Google Scholar
Skeleton-Based Human Action Recognition by Pose Specificity and Weight...

Go to citation Crossref Google Scholar
Facilitating Human–Robot Collaborative Tasks by Teaching-Learning-Coll...

Go to citation Crossref Google Scholar
Effective Behavioural Dynamic Coupling through Echo State Networks

Go to citation Crossref Google Scholar
Predicting Human Actions Taking into Account Object Affordances

Go to citation Crossref Google Scholar
Fusing depth and colour information for human action recognition

Go to citation Crossref Google Scholar
Accurate detection of sitting posture activities in a secure IoT based...

Go to citation Crossref Google Scholar
COSMO: Contextualized scene modeling with Boltzmann Machines

Go to citation Crossref Google Scholar
RGB-D Action Recognition Using Multimodal Correlative Representation L...

Go to citation Crossref Google Scholar
A Comprehensive Survey of Vision-Based Human Action Recognition Method...

Go to citation Crossref Google Scholar
Collaborative multimodal feature learning for RGB-D action recognition

Go to citation Crossref Google Scholar
Perceiving the person and their interactions with the others for socia...

Go to citation Crossref Google Scholar
Multi-Scale Guided Mask Refinement for Coarse-to-Fine RGB-D Perception

Go to citation Crossref Google Scholar
I-Planner: Intention-aware motion planning using learning-based human ...

Go to citation Crossref Google Scholar
Predicting Action Tubes

Go to citation Crossref Google Scholar
RGB-D Sensors Data Quality Assessment and Improvement for Advanced App...

Go to citation Crossref Google Scholar
Assistive Humanoid Robots for the Elderly with Mild Cognitive Impairme...

Go to citation Crossref Google Scholar
Extended histogram: probabilistic modelling of video content temporal ...

Go to citation Crossref Google Scholar
Skeleton-Based Action Recognition With Key-Segment Descriptor and Temp...

Go to citation Crossref Google Scholar
Recognition of Assembly Tasks Based on the Actions Associated to the M...

Go to citation Crossref Google Scholar
Toward Computer Vision Systems That Understand Real-World Assembly Pro...

Go to citation Crossref Google Scholar
Advances in description of 3D human motion

Go to citation Crossref Google Scholar
Learning Complex Spatio-Temporal Configurations of Body Joints for Onl...

Go to citation Crossref Google Scholar
Joint Deep Learning for RGB-D Action Recognition

Go to citation Crossref Google Scholar
Three-dimensional convolutional restricted Boltzmann machine for human...

Go to citation Crossref Google Scholar
Human activity learning for assistive robotics using a classifier ense...

Go to citation Crossref Google Scholar
Deep-Temporal LSTM for Daily Living Action Recognition

Go to citation Crossref Google Scholar
Learning Under-Specified Object Manipulations from Human Demonstration...

Go to citation Crossref Google Scholar
Human activity recognition using dynamic representation and matching o...

Go to citation Crossref Google Scholar
Operator Awareness in Human–Robot Collaboration Through Wearable Vibro...

Go to citation Crossref Google Scholar
A self-organizing neural network architecture for learning human-objec...

Go to citation Crossref Google Scholar
Object Affordance Driven Inverse Reinforcement Learning Through Concep...

Go to citation Crossref Google Scholar
场景知觉过程中的动作意图识别

Go to citation Crossref Google Scholar
Structured RNN for human interaction

Go to citation Crossref Google Scholar
Skeleton-based bio-inspired human activity prediction for real-time hu...

Go to citation Crossref Google Scholar
Video scene analysis: an overview and challenges on deep learning algo...

Go to citation Crossref Google Scholar
Action recognition in depth videos using hierarchical gaussian descrip...

Go to citation Crossref Google Scholar
Simultaneous joint and object trajectory templates for human activity ...

Go to citation Crossref Google Scholar
CNN+RNN Depth and Skeleton based Dynamic Hand Gesture Recognition

Go to citation Crossref Google Scholar
Data-Driven, 3-D Classification of Person-Object Relationships and Sem...

Go to citation Crossref Google Scholar
Robust 3D Action Recognition Through Sampling Local Appearances and Gl...

Go to citation Crossref Google Scholar
Learning Stable Movement Primitives by Finding a Suitable Fuzzy Lyapun...

Go to citation Crossref Google Scholar
Action-Attending Graphic Neural Network

Go to citation Crossref Google Scholar
Gaze and motion information fusion for human intention inference

Go to citation Crossref Google Scholar
Robot action planning by online optimization in human–robot collaborat...

Go to citation Crossref Google Scholar
Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Kno...

Go to citation Crossref Google Scholar
Demo2Vec: Reasoning Object Affordances from Online Videos

Go to citation Crossref Google Scholar
From Lifestyle Vlogs to Everyday Interactions

Go to citation Crossref Google Scholar
Human Action Recognition from Motion Trajectory using Fourier Temporal...

Go to citation Crossref Google Scholar
Human-Robot Interaction Control Through Demonstration

Go to citation Crossref Google Scholar
Exploiting ability for human adaptation to facilitate improved human-r...

Go to citation Crossref Google Scholar
Functional Object-Oriented Network: Construction & Expansion

Go to citation Crossref Google Scholar
Fusing Object Context to Detect Functional Area for Cognitive Robots

Go to citation Crossref Google Scholar
Object-Centric Approach to Prediction and Labeling of Manipulation Tas...

Go to citation Crossref Google Scholar
Teaching a Robot the Semantics of Assembly Tasks

Go to citation Crossref Google Scholar
Structure-Aware Multimodal Feature Fusion for RGB-D Scene Classificati...

Go to citation Crossref Google Scholar
Representation, Analysis, and Recognition of 3D Humans

Go to citation Crossref Google Scholar
Human Weapon-Activity Recognition in Surveillance Videos Using Structu...

Go to citation Crossref Google Scholar
Unsupervised early prediction of human reaching for human–robot collab...

Go to citation Crossref Google Scholar
Low-Cost Automatic Ambient Assisted Living System

Go to citation Crossref Google Scholar
Affordances in Psychology, Neuroscience, and Robotics: A Survey

Go to citation Crossref Google Scholar
Bootstrapping Relational Affordances of Object Pairs Using Transfer

Go to citation Crossref Google Scholar
End-to-End Fine-Grained Action Segmentation and Recognition Using Cond...

Go to citation Crossref Google Scholar
Instance-Aware Detailed Action Labeling in Videos

Go to citation Crossref Google Scholar
Structural Recurrent Neural Network (SRNN) for Group Activity Analysis

Go to citation Crossref Google Scholar
An Unsupervised Machine Learning Approach to Assessing Designer Perfor...

Go to citation Crossref Google Scholar
Distances evolution analysis for online and off-line human object inte...

Go to citation Crossref Google Scholar
Watch-n-Patch: Unsupervised Learning of Actions and Relations

Go to citation Crossref Google Scholar
Two‐person activity recognition using skeleton data

Go to citation Crossref Google Scholar
Deep Bilinear Learning for RGB-D Action Recognition

Go to citation Crossref Google Scholar
Learning Human-Object Interactions by Graph Parsing Neural Networks

Go to citation Crossref Google Scholar
Neural Graph Matching Networks for Fewshot 3D Action Recognition

Go to citation Crossref Google Scholar
Graph Distillation for Action Detection with Privileged Modalities

Go to citation Crossref Google Scholar
Human Activities Transfer Learning for Assistive Robotics

Go to citation Crossref Google Scholar
A Large Scale RGB-D Dataset for Action Recognition

Go to citation Crossref Google Scholar
Bidirectional invariant representation of rigid body motions and its a...

Go to citation Crossref Google Scholar
Integrating multi-purpose natural language understanding, robot’s memo...

Go to citation Crossref Google Scholar
Multimodal Trip Hazard Affordance Detection on Construction Sites

Go to citation Crossref Google Scholar
Human Motion Segmentation via Robust Kernel Sparse Subspace Clustering

Go to citation Crossref Google Scholar
Motion Trajectory for Human Action Recognition Using Fourier Temporal ...

Go to citation Crossref Google Scholar
Robust echo state networks based on correntropy induced loss function

Go to citation Crossref Google Scholar
Affordance learning and inference based on vision-speech association i...

Go to citation Crossref Google Scholar
Affordance triggering for arbitrary states based on robot exploring

Go to citation Crossref Google Scholar
Emergent Structuring of Interdependent Affordance Learning Tasks Using...

Go to citation Crossref Google Scholar
A human activity recognition framework using max-min features and key ...

Go to citation Crossref Google Scholar
A dynamic Markov model for n th -order mov...

Go to citation Crossref Google Scholar
Jointly Learning Heterogeneous Features for RGB-D Activity Recognition

Go to citation Crossref Google Scholar
Computational models of affordance in robotics: a taxonomy and systema...

Go to citation Crossref Google Scholar
Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos

Go to citation Crossref Google Scholar
Predicting Human Activities Using Stochastic Grammar

Go to citation Crossref Google Scholar
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos

Go to citation Crossref Google Scholar
Probabilistic Structure from Motion with Objects (PSfMO)

Go to citation Crossref Google Scholar
Adaptive Binarization for Weakly Supervised Affordance Segmentation

Go to citation Crossref Google Scholar
Use of Thermal Point Cloud for Thermal Comfort Measurement and Human P...

Go to citation Crossref Google Scholar
Learning to Recognize Human Activities Using Soft Labels

Go to citation Crossref Google Scholar
Human‐action recognition using a multi‐layered fusion scheme of Kinect...

Go to citation Crossref Google Scholar
Sentence Directed Video Object Codiscovery

Go to citation Crossref Google Scholar
An affordances based approach to assisted teleoperation

Go to citation Crossref Google Scholar
Structured prediction with short/long-range dependencies for human act...

Go to citation Crossref Google Scholar
Two-stream RNN/CNN for action recognition in 3D videos

Go to citation Crossref Google Scholar
Cooking Behavior Recognition Using Egocentric Vision for Cooking Navig...

Go to citation Crossref Google Scholar
Structured LSTM for human-object interaction detection and anticipatio...

Go to citation Crossref Google Scholar
Action recognition based on a mixture of RGB and depth based skeleton

Go to citation Crossref Google Scholar
Human Action Recognition with RGB-D Sensors

Go to citation Crossref Google Scholar
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes

Go to citation Crossref Google Scholar
SCC: Semantic Context Cascade for Efficient Action Detection

Go to citation Crossref Google Scholar
Learning and Refining of Privileged Information-Based RNNs for Action ...

Go to citation Crossref Google Scholar
Weakly Supervised Affordance Detection

Go to citation Crossref Google Scholar
The Impact of Typicality for Informative Representative Selection

Go to citation Crossref Google Scholar
Non-uniform Subset Selection for Active Learning in Structured Data

Go to citation Crossref Google Scholar
Key-frame Extraction With Semantic Graphs in Assembly Processes

Go to citation Crossref Google Scholar
Action prediction based on physically grounded object affordances in h...

Go to citation Crossref Google Scholar
Learning Human Activities for Assisted Living Robotics

Go to citation Crossref Google Scholar
Detecting Biological Motion for Human–Robot Interaction: A Link betwee...

Go to citation Crossref Google Scholar
Human activity recognition using segmented body part and body joint fe...

Go to citation Crossref Google Scholar
Modeling 4D Human-Object Interactions for Joint Event Segmentation, Re...

Go to citation Crossref Google Scholar
The toronto rehab stroke pose dataset to detect compensation during st...

Go to citation Crossref Google Scholar
Space-time representation of people based on 3D skeletal data: A revie...

Go to citation Crossref Google Scholar
Generation of action description from classification of motion and obj...

Go to citation Crossref Google Scholar
EGGNOG: A Continuous, Multi-modal Data Set of Naturally Occurring Gest...

Go to citation Crossref Google Scholar
The DAily Home LIfe Activity Dataset: A High Semantic Activity Dataset...

Go to citation Crossref Google Scholar
What can i do around here? Deep functional scene understanding for cog...

Go to citation Crossref Google Scholar
Semantic analysis of manipulation actions using spatial relations

Go to citation Crossref Google Scholar
An overview of action recognition in videos

Go to citation Crossref Google Scholar
Healthy human sitting posture estimation in RGB-D scenes using object ...

Go to citation Crossref Google Scholar
Human Intention Inference Using Expectation-Maximization Algorithm Wit...

Go to citation Crossref Google Scholar
Automatic Learning of Articulated Skeletons Based on Mean of 3D Joints...

Go to citation Crossref Google Scholar
Predicting human activities in sequences of actions in RGB-D videos

Go to citation Crossref Google Scholar
Semantic Decomposition and Recognition of Long and Complex Manipulatio...

Go to citation Crossref Google Scholar
A novel local feature descriptor based on energy information for human...

Go to citation Crossref Google Scholar
A Multilevel Body Motion-Based Human Activity Analysis Methodology

Go to citation Crossref Google Scholar
Context-Associative Hierarchical Memory Model for Human Activity Recog...

Go to citation Crossref Google Scholar
RGB-D datasets using microsoft kinect or similar sensors: a survey

Go to citation Crossref Google Scholar
Learning Actions to Improve the Perceptual Anchoring of Objects

Go to citation Crossref Google Scholar
O-PrO: An Ontology for Object Affordance Reasoning

Go to citation Crossref Google Scholar
Assistive Humanoid Robots for the Elderly with Mild Cognitive Impairme...

Go to citation Crossref Google Scholar
Physics Simulation Games

Go to citation Crossref Google Scholar
Intention Inference for Human-Robot Collaboration in Assistive Robotic...

Go to citation Crossref Google Scholar
Visual Information-Based Activity Recognition and Fall Detection for A...

Go to citation Crossref Google Scholar
Motion segment decomposition of RGB-D sequences for human behavior und...

Go to citation Crossref Google Scholar
Facial Expression Recognition Utilizing Local Direction-Based Robust F...

Go to citation Crossref Google Scholar
Multimodal Gesture Recognition Using 3-D Convolution and Convolutional...

Go to citation Crossref Google Scholar
Feature-Based Resource Allocation for Real-Time Stereo Disparity Estim...

Go to citation Crossref Google Scholar
Tracking a Subset of Skeleton Joints: An Effective Approach towards Co...

Go to citation Crossref Google Scholar
Human gesture recognition performance evaluation for service robots

Go to citation Crossref Google Scholar
Hybrid Multi-modal Fusion for Human Action Recognition

Go to citation Crossref Google Scholar
RGB-D-based action recognition datasets: A survey

Go to citation Crossref Google Scholar
Recent Data Sets on Object Manipulation: A Survey

Go to citation Crossref Google Scholar
A review of RGB-D sensor calibration for night vision

Go to citation Crossref Google Scholar
A probabilistic graphical model approach for human activity recognitio...

Go to citation Crossref Google Scholar
Training Agents With Interactive Reinforcement Learning and Contextual...

Go to citation Crossref Google Scholar
Affordance Research in Developmental Robotics: A Survey

Go to citation Crossref Google Scholar
Toward Simple Strategy for Optimal Tracking and Localization of Robots...

Go to citation Crossref Google Scholar
Biological movement detector enhances the attentive skills of humanoid...

Go to citation Crossref Google Scholar
Learning human-robot handovers through π-STAM: Policy improvement with...

Go to citation Crossref Google Scholar
Human pose recognition and tracking using RGB-D camera

Go to citation Crossref Google Scholar
Affordance processing in segregated parieto-frontal dorsal stream sub-...

Go to citation Crossref Google Scholar
Human intent forecasting using intrinsic kinematic constraints

Go to citation Crossref Google Scholar
Human activity recognition based on weighted limb features

Go to citation Crossref Google Scholar
Modeling 3D Environments through Hidden Human Context

Go to citation Crossref Google Scholar
Human Activity Recognition from automatically labeled data in RGB-D vi...

Go to citation Crossref Google Scholar
Optimal depth recovery using image guided TGV with depth confidence fo...

Go to citation Crossref Google Scholar
Modeling spatial layout of features for real world scenario RGB-D acti...

Go to citation Crossref Google Scholar
RGB-D based daily activity recognition for service robots using cluste...

Go to citation Crossref Google Scholar
Simplified industrial robot programming: Effects of errors on multimod...

Go to citation Crossref Google Scholar
PiGraphs

Go to citation Crossref Google Scholar
Humanoid infers Archimedes' principle: understanding phy...

Go to citation Crossref Google Scholar
Action Recognition Based on Efficient Deep Feature Learning in the Spa...

Go to citation Crossref Google Scholar
A depth video-based facial expression recognition system utilizing gen...

Go to citation Crossref Google Scholar
A Deep Structured Model with Radius–Margin Bound for 3D Human Activity...

Go to citation Crossref Google Scholar
Multi-modal RGB–Depth–Thermal Human Body Segmentation

Go to citation Crossref Google Scholar
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis

Go to citation Crossref Google Scholar
3D Semantic Parsing of Large-Scale Indoor Spaces

Go to citation Crossref Google Scholar
Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Go to citation Crossref Google Scholar
Learning Action Maps of Large Environments via First-Person Vision

Go to citation Crossref Google Scholar
RGBD Datasets: Past, Present and Future

Go to citation Crossref Google Scholar
Understanding Human Behaviors with an Object Functional Role Perspecti...

Go to citation Crossref Google Scholar
Deep metric learning autoencoder for nonlinear temporal alignment of h...

Go to citation Crossref Google Scholar
Efficient, dense, object-based segmentation from RGBD video

Go to citation Crossref Google Scholar
Watch-Bot: Unsupervised learning for reminding humans of forgotten act...

Go to citation Crossref Google Scholar
Recurrent Neural Networks for driver activity anticipation via sensory...

Go to citation Crossref Google Scholar
From pose to activity: Surveying datasets and introducing CONVERSE

Go to citation Crossref Google Scholar
Human action recognition using multi-layer codebooks of key poses and ...

Go to citation Crossref Google Scholar
Improving face detection with depth

Go to citation Crossref Google Scholar
Deep recursive and hierarchical conditional random fields for human ac...

Go to citation Crossref Google Scholar
Tell me Dave: Context-sensitive grounding of natural language to manip...

Go to citation Crossref Google Scholar
Anticipatory Planning for Human-Robot Teams

Go to citation Crossref Google Scholar
Introduction

Go to citation Crossref Google Scholar
Attribute Based Affordance Detection from Human-Object Interaction Ima...

Go to citation Crossref Google Scholar
Automatic Human Activity Segmentation and Labeling in RGBD Videos

Go to citation Crossref Google Scholar
Latent Force Models for Human Action Recognition

Go to citation Crossref Google Scholar
Recognition of Transitive Actions with Hierarchical Neural Network Lea...

Go to citation Crossref Google Scholar
Real-Time RGB-D Activity Prediction by Soft Regression

Go to citation Crossref Google Scholar
Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation

Go to citation Crossref Google Scholar
A Multi-scale CNN for Affordance Segmentation in RGB Images

Go to citation Crossref Google Scholar
Predicting the Intention of Human Activities for Real-Time Human-Robot...

Go to citation Crossref Google Scholar
STAM: A Framework for Spatio-Temporal Affordance Maps

Go to citation Crossref Google Scholar
A 3D Human Posture Approach for Activity Recognition Based on Depth Ca...

Go to citation Crossref Google Scholar
A probabilistic approach to workspace sharing for human–robot cooperat...

Go to citation Crossref Google Scholar
On the Use of Multi-Depth-Camera Based Motion Tracking Systems in Prod...

Go to citation Crossref Google Scholar
Anticipating Human Activities Using Object Affordances for Reactive Ro...

Go to citation Crossref Google Scholar
Large Displacement 3D Scene Flow with Occlusion Reasoning

Go to citation Crossref Google Scholar
Affordance matching from the shared information in multi-robot

Go to citation Crossref Google Scholar
Human action recognition using key poses and atomic motions

Go to citation Crossref Google Scholar
Human observation-based calibration of multiple RGB-D cameras for inte...

Go to citation Crossref Google Scholar
Latent Hierarchical Model for Activity Recognition

Go to citation Crossref Google Scholar
Multimodal Human Activity Recognition for Industrial Manufacturing Pro...

Go to citation Crossref Google Scholar
Activity-centric scene synthesis for functional 3D scene modeling

Go to citation Crossref Google Scholar
Recognizing complex instrumental activities of daily living using scen...

Go to citation Crossref Google Scholar
Towards efficient support relation extraction from RGBD images

Go to citation Crossref Google Scholar
Human activity modeling and prediction for assisting appliance operati...

Go to citation Crossref Google Scholar
Human Activity Recognition Process Using 3-D Posture Data

Go to citation Crossref Google Scholar
Fuzzy Temporal Segmentation and Probabilistic Recognition of Continuou...

Go to citation Crossref Google Scholar
Model-free incremental learning of the semantics of manipulation actio...

Go to citation Crossref Google Scholar
Fine manipulative action recognition through sensor fusion

Go to citation Crossref Google Scholar
A framework for unsupervised online human reaching motion recognition ...

Go to citation Crossref Google Scholar
A hierarchical representation for human activity recognition with nois...

Go to citation Crossref Google Scholar
Semantic parsing of human manipulation activities using on-line learne...

Go to citation Crossref Google Scholar
Interactive affordance map building for a robotic task

Go to citation Crossref Google Scholar
Affordance Learning Based on Subtask's Optimal Strategy

Go to citation Crossref Google Scholar
A Survey of Applications and Human Motion Recognition with Microsoft K...

Go to citation Crossref Google Scholar
Human Activity-Understanding: A Multilayer Approach Combining Body Mov...

Go to citation Crossref Google Scholar
Human robot interaction can boost robot's affordance learning: A proof...

Go to citation Crossref Google Scholar
Cognitive Learning, Monitoring and Assistance of Industrial Workflows ...

Go to citation Crossref Google Scholar
Self-organizing neural integration of pose-motion features for human a...

Go to citation Crossref Google Scholar
Discriminative key-component models for interaction detection and reco...

Go to citation Crossref Google Scholar
SUN RGB-D: A RGB-D scene understanding benchmark suite

Go to citation Crossref Google Scholar
Mining semantic affordances of visual object categories

Go to citation Crossref Google Scholar
Jointly learning heterogeneous features for RGB-D activity recognition

Go to citation Crossref Google Scholar
Staged Development of Robot Skills: Behavior Formation, Affordance Lea...

Go to citation Crossref Google Scholar
Fusing Multiple Features for Depth-Based Action Recognition

Go to citation Crossref Google Scholar
PlanIt: A crowdsourcing approach for learning to plan paths from large...

Go to citation Crossref Google Scholar
Decision making under uncertain segmentations

Go to citation Crossref Google Scholar
Affordance detection of tool parts from geometric features

Go to citation Crossref Google Scholar
3D Reasoning from Blocks to Stability

Go to citation Crossref Google Scholar
Concept and Functional Structure of a Service Robot

Go to citation Crossref Google Scholar
Characterizing Predicate Arity and Spatial Structure for Inductive Lea...

Go to citation Crossref Google Scholar
Qualitative and Quantitative Spatio-temporal Relations in Daily Living...

Go to citation Crossref Google Scholar
Discriminative Dictionary Learning for Skeletal Action Recognition

Go to citation Crossref Google Scholar
Physics Simulation Games

Go to citation Crossref Google Scholar
Learning Dictionaries of Sparse Codes of 3D Movements of Body Joints f...

Go to citation Crossref Google Scholar
A new benchmark for pose estimation with ground truth from virtual rea...

Go to citation Crossref Google Scholar
Human activity recognition in the context of industrial human-robot in...

Go to citation Crossref Google Scholar
SceneGrok

Go to citation Crossref Google Scholar
Human Activity Recognition in Images using SVMs and Geodesics on Smoot...

Go to citation Crossref Google Scholar
3D Human Activity Recognition with Reconfigurable Convolutional Neural...

Go to citation Crossref Google Scholar
An Efficient Local Feature-Based Facial Expression Recognition System

Go to citation Crossref Google Scholar
Mining Mid-Level Features for Action Recognition Based on Effective Sk...

Go to citation Crossref Google Scholar
Improving reinforcement learning with interactive feedback and afforda...

Go to citation Crossref Google Scholar
3D content fingerprinting

Go to citation Crossref Google Scholar
Handling Real-World Context Awareness, Uncertainty and Vagueness in Re...

Go to citation Crossref Google Scholar
3D human action segmentation and recognition using pose kinetic energy

Go to citation Crossref Google Scholar
Evaluating spatiotemporal interest point features for depth-based acti...

Go to citation Crossref Google Scholar
A two-layered approach to recognize high-level human activities

Go to citation Crossref Google Scholar
Human activities segmentation and location of key frames based on 3D s...

Go to citation Crossref Google Scholar
Discriminative Hierarchical Modeling of Spatio-temporally Composable H...

Go to citation Crossref Google Scholar
Learning latent structure for activity recognition

Go to citation Crossref Google Scholar
Learning Actionlet Ensemble for 3D Human Action Recognition

Go to citation Crossref Google Scholar
“Important stuff, everywhere!” Activity recognition with...

Go to citation Crossref Google Scholar
Learning Actionlet Ensemble for 3D Human Action Recognition

Go to citation Crossref Google Scholar
Physically Grounded Spatio-temporal Object Affordances

Go to citation Crossref Google Scholar
Pipelining Localized Semantic Features for Fine-Grained Action Recogni...

Go to citation Crossref Google Scholar
Detecting Social Actions of Fruit Flies

Go to citation Crossref Google Scholar
Skeleton Tracking Based Complex Human Activity Recognition Using Kinec...

Go to citation Crossref Google Scholar
Modeling 4D Human-Object Interactions for Event and Object Recognition

Go to citation Crossref Google Scholar
Infinite Latent Conditional Random Fields

Go to citation Crossref Google Scholar
Combining color and depth data for edge detection

Go to citation Crossref Google Scholar
Tangled: Learning to untangle ropes with RGB-D perception

Go to citation Crossref Google Scholar
Anticipating human activities for reactive robotic response

Go to citation Crossref Google Scholar
Affordance graph: A framework to encode perspective taking and effort ...

Go to citation Crossref Google Scholar
[The document that should appear here is unavailable]

Go to citation Crossref Google Scholar
Multilevel Depth and Image Fusion for Human Activity Detection

Go to citation Crossref Google Scholar
Representing Videos Using Mid-level Discriminative Patches

Go to citation Crossref Google Scholar
Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

Go to citation Crossref Google Scholar
3D-Based Reasoning with Blocks, Support, and Stability

Go to citation Crossref Google Scholar
Predicting Functional Regions on Objects

Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:

Sage Journals profile

Sign in

Access personal subscriptions, purchases, paired institutional or society access and free tools such as email alerts and saved searches.

Required fields

Email:

Password:

Remember me

Forgotten your password?

Create profile

Institution

Society

IOM3 members can access this journal content using society membership credentials.

MEMBERSHIP LOGIN

Alternatively, view purchase options below:

Purchase access

Purchase 24 hour online access to view and download content.

Article - $41.50

Issue - $420.28

Subscribe to this journal

Read with DeepDyve

Need help?

Learning human activities and object affordances from RGB-D videos

Abstract

References

Cite article

Cite article

Cite article

Download to reference manager

Information, rights and permissions

Information

Published In

Keywords

Rights and permissions

Authors

Affiliations

Notes

Metrics and citations

Metrics

Journals metrics

Article usage^*

Altmetric

Articles citing this one

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

Get access

Access options

Also from Sage

Abstract

References

Cite article

Cite article

Download to reference manager

Share

Share this article

Share with email

Share on social media

Share access to this article

Information

Published In

Keywords

Rights and permissions

Authors

Affiliations

Notes

Metrics

Journals metrics

Article usage*

Altmetric

Articles citing this one

Figures & Media

Tables

View options

PDF/ePub

Get access

Access options

Sign in

Also from Sage

Article usage^*