Skip to main content

    Shaheena Noor

    Smart homes and offices are becoming more and more common with the advances in computer vision research and technology. Identifying the human activities and scenarios are basic components of such systems. This is important not only for... more
    Smart homes and offices are becoming more and more common with the advances in computer vision research and technology. Identifying the human activities and scenarios are basic components of such systems. This is important not only for the eco-system to work independently, but also to allow robots to be able to assist humans. This is specially true in the more complicated medical setups, e.g. dentistry, where we need subtle cues e.g. eye motion to identify scenarios. We present a hierarchical model in this paper for robustly recognizing scenarios and procedures in a dental setup by using the objects seen in eye gaze trajectories like material and equipment used by the dentist, and symptoms of the patient. We utilize the fact that by identifying the objects viewed during an activity and linking them over time to create more complicated scenarios, the problem of scenario recognition can be hierarchically solved. We performed experiments on a dental dataset and showed that combining multiple parameters results in a better precision and accuracy compared to any of them individually. Our experiments show that the accuracy increased from 45.18% to 94.42% when we used a combination of parameters vs. a single one.
    In this paper, we generated an activity recognition model using an ANN and trained it using Backpropagation learning. We considered a sandwich making scenario and identified the hand-motion-based activities of reaching, sprinkling,... more
    In this paper, we generated an activity recognition model using an ANN and trained it using Backpropagation learning. We considered a sandwich making scenario and identified the hand-motion-based activities of reaching, sprinkling, spreading and cutting. The contribution of this paper is twofold: First, given the fact that many image processing steps like feature identification are computation intensive and execution time increases sharply as more images are added, we've shown that it is not always useful to add more data. We trained our system using (i) single (front) camera only and (ii) multiple (left, front, right) cameras, and have shown that adding extra cameras decreased the recognition precision from 89.22% to 79.99%. Hence, we've shown that a properly-positioned camera results in a higher precision than multiple, inappropriately-positioned cameras. Second, in the ANN training part, we've shown that adding additional hidden layers/neurons lead to unnecessary complexity which in turn result in longer computational time and lower precision. In our experiments, using a single hidden layer resulted in a precision of 90.77% and the training was completed in less than 1200 cycles. On the other hand, adding or deleting hidden layers not only decreased the precision, but also increased the training time by many folds.
    Being aware of the context is one the important requirements of Cyber-Physical Systems (CPS). Context-aware systems have the capability to sense what is happening or changing in their environment and take appropriate actions to adapt to... more
    Being aware of the context is one the important requirements of Cyber-Physical Systems (CPS). Context-aware systems have the capability to sense what is happening or changing in their environment and take appropriate actions to adapt to the changes. In this chapter, we present a technique for identifying the focus of attention in a context-aware cyber-physical system. We propose to use first-person vision, obtained through wearable gaze-directed camera that can capture the scene through the wearer’s point-of-view. We use the fact that human cognition is linked to his gaze and typically the object/person of interest holds our gaze. We argue that our technique is robust and works well in the presence of noise and other distracting signals, where the conventional techniques of IR sensors and tagging fail. Moreover, the technique is unobtrusive and does not pollute the environment with unnecessary signals. Our approach is general in that it may be applied to a generic CPS like healthcare, office and industrial scenarios and also in intelligent homes.
    This paper presents a MapReduce-based implementation of using high-dimensional image streams from inside-out and outside-in views applied to a simplistic SIFT-based feature extraction method to provide a fast and more accurate object... more
    This paper presents a MapReduce-based implementation of using high-dimensional image streams from inside-out and outside-in views applied to a simplistic SIFT-based feature extraction method to provide a fast and more accurate object recognition algorithm. We have combined multiple camera streams and have shown that using inside-out vision significantly improves the recognition precision. We show an accuracy of 81.25% against 31.25% when we used SIFT using our combined approach against the standard isolated ones. SIFT has a high computation cost and adding more data streams increases the cost even more. Hence, in our work we used MapReduce to parallelize the computation and achieved the same with a speedup of 80. This paper has two major contributions: First, we used inside-out vision as an additional perception source to increase the object recognition precision. Second, we used MapReduce to increase computational speed to achieve increased object recognition precision which would not have been otherwise practically possible.
    Purpose Watermarking technique is one of the significant methods in which carrier signal hides digital information in the form of watermark to prevent the authenticity of the stakeholders by manipulating different coefficients as... more
    Purpose Watermarking technique is one of the significant methods in which carrier signal hides digital information in the form of watermark to prevent the authenticity of the stakeholders by manipulating different coefficients as watermark in time and frequency domain to sustain trade-off in performance parameters. One challenging component among others is to maintain the robustness, to limit perceptibility with embedding information. Transform domain is more popular to achieve the required results in color image watermarking. Variants of complex Hadamard transform (CHT) have been applied for gray image watermarking, and it has been proved that it has better performance than other orthogonal transforms. This paper is aimed at analyzing the performance of spatio-chromatic complex Hadamard transform (Sp-CHT) that is proposed as an application of color image watermarking in sequency domain (SD). Design/methodology/approach In this paper, color image watermarking technique is designed and implemented in SD using spatio-chromatic – conjugate symmetric sequency – ordered CHT. The color of a pixel is represented as complex number a*+jb*, where a* and b* are chromatic components of International Commission on Illumination (CIE) La*b* color space. The embedded watermark is almost transparent to human eye although robust against common signal processing attacks. Findings Based on the results, bit error rate (BER) and peak signal to noise ratio are measured and discussed in comparison of CIE La*b* and hue, saturation and value color model with spatio-chromatic discrete Fourier transform (Sp-DFT), and results are also analyzed with other discrete orthogonal transforms. It is observed from BER that Sp-CHT has 8%–12% better performance than Sp-DFT. Structural similarity index has been measured at different watermark strength and it is observed that presented transform performs better than other transforms. Originality/value This work presents the details and comparative analysis of two orthogonal transforms as color image watermarking application using MATLAB software. A finding from this study demonstrates that the Complex Hadamard transform is the competent candidate that can be replaced with DFT in many signal processing applications.
    This research project aims to design and develop a 3D interior designing application to provide a virtual experience to users in which they can visualize a standard home (sample space) and do interior designing. Users can interact with... more
    This research project aims to design and develop a 3D interior designing application to provide a virtual experience to users in which they can visualize a standard home (sample space) and do interior designing. Users can interact with four main interior design modules i.e., Furniture, Tiles, Paints and Customization (Mix and Match). Users will be able to experience it on two different platforms, which are the desktop and the VR version. Its primary purpose is to display interior design products with complete context, unlike stores where small samples are displayed. This will help customers to make a better buying decision when it comes to design and decorate their homes.
    This manuscript presents a full duplex communication system for the Deaf and Mute (D-M) based on Machine Learning (ML). These individuals, who generally communicate through sign language, are an integral part of our society, and their... more
    This manuscript presents a full duplex communication system for the Deaf and Mute (D-M) based on Machine Learning (ML). These individuals, who generally communicate through sign language, are an integral part of our society, and their contribution is vital. They face communication difficulties mainly because others, who generally do not know sign language, are unable to communicate with them. The work presents a solution to this problem through a system enabling the non-deaf and mute (ND-M) to communicate with the D-M individuals without the need to learn sign language. The system is low-cost, reliable, easy to use, and based on a commercial-off-the-shelf (COTS) Leap Motion Device (LMD). The hand gesture data of D-M individuals is acquired using an LMD device and processed using a Convolutional Neural Network (CNN) algorithm. A supervised ML algorithm completes the processing and converts the hand gesture data into speech. A new dataset for the ML-based algorithm is created and pres...
    Deaf and mute people are an integral part of society, and it is particularly important to provide them with a platform to be able to communicate without the need for any training or learning. These people rely on sign language, but for... more
    Deaf and mute people are an integral part of society, and it is particularly important to provide them with a platform to be able to communicate without the need for any training or learning. These people rely on sign language, but for effective communication, it is expected that others can understand sign language. Learning sign language is a challenge for those with no impairment. Another challenge is to have a system in which hand gestures of different languages are supported. In this manuscript, a system is presented that provides communication between deaf and mute (DnM) and non-deaf and mute (NDnM). The hand gestures of DnM people are acquired and processed using deep learning, and multiple language support is achieved using supervised machine learning. The NDnM people are provided with an audio interface where the hand gestures are converted into speech and generated through the sound card interface of the computer. Speech from NDnM people is acquired using microphone input a...
    6G is one of the key cornerstone elements of the futuristic smart system setup – the others being cloud computing, big data, wearable devices and Artificial Intelligence. Also, smart offices and homes have become even more popular than... more
    6G is one of the key cornerstone elements of the futuristic smart system setup – the others being cloud computing, big data, wearable devices and Artificial Intelligence. Also, smart offices and homes have become even more popular than before, because of the advancement in computer vision and Machine Learning (ML) technologies. Recognition of human actions and situations are fundamental components of such systems, especially in complex environments like healthcare, for example at the dentist clinic, where we need cues such as eye movement to distinguish procedures being undertaken. In this work, we compare models based on hierarchical modelling and machine learning to identify the dental procedure. We used the objects seen while following the eye trajectories and focussed on elements including material used for treatment, equipment involved and the teeth conditions i.e. symptoms. Our experiments showed that using Artificial Neural Network (ANN) increased the accuracy of prediction c...
    Smart homes and offices are becoming more and more common with the advances in computer vision research and technology. Identifying the human activities and scenarios are basic components of such systems. This is important not only for... more
    Smart homes and offices are becoming more and more common with the advances in computer vision research and technology. Identifying the human activities and scenarios are basic components of such systems. This is important not only for the eco-system to work independently, but also to allow robots to be able to assist humans. This is specially true in the more complicated medical setups, e.g. dentistry, where we need subtle cues e.g. eye motion to identify scenarios. We present a hierarchical model in this paper for robustly recognizing scenarios and procedures in a dental setup by using the objects seen in eye gaze trajectories like material and equipment used by the dentist, and symptoms of the patient. We utilize the fact that by identifying the objects viewed during an activity and linking them over time to create more complicated scenarios, the problem of scenario recognition can be hierarchically solved. We performed experiments on a dental dataset and showed that combining multiple parameters results in a better precision and accuracy compared to any of them individually. Our experiments show that the accuracy increased from 45.18% to 94.42% when we used a combination of parameters vs. a single one.
    Virtual Reality (VR) has been gaining interest of gamers and as well as for game developers day by day because of its uniqueness and simulated environment in which users can interact with different 3D objects. As to this day a vast number... more
    Virtual Reality (VR) has been gaining interest of gamers and as well as for game developers day by day because of its uniqueness and simulated environment in which users can interact with different 3D objects. As to this day a vast number of games have been developed that consist VR technology. To share the gaming experience, the internet has been widely adopted because it allows users to interact with each other. According to the work presented above, we developed a carrom game that uses VR that allow users to interact with different 3D object. In this game a user can play with Artificial Intelligence (AI) opponent. For the purpose of creating a more fun game we introduce multiplayer gaming that allows users to play with each other without any need of physical carrom. In this paper we also took multiple surveys on Local Area Network (LAN) gaming with several people for the purpose to determine the latency rate, threshold and frame rate etc which effect the LAN gaming experience.
    Virtual Reality (VR) technology has emerged itself to become one of the essential parts of education, entertainment, healthcare, architecture and much more. It empowers a user to learn, discover, explore, design and interact with 3D... more
    Virtual Reality (VR) technology has emerged itself to become one of the essential parts of education, entertainment, healthcare, architecture and much more. It empowers a user to learn, discover, explore, design and interact with 3D objects in real-time. Due to the recent advancements in computer hardware and constant decline in the cost of mobile devices has encouraged virtual reality to reach a broader audience. The combination of high computational power, mobile computing, and virtual reality can help simulate the world dynamically. This dynamic behavior enables users to communicate with virtual environments using different forms such as body movement, voice commands, gesture control, etc. These means of interaction can create a real-time communication between people and VR. Due to this interactive nature of VR, it can be used in facilitating any physical activity, e.g., sports, exercise, military and more. This paper consists of experiments and surveys of a VR application developed for physical training, i.e., walking, running and jogging with an accuracy of 82.46%. The purpose of this paper urges the use of emerging technologies in competitive physical training to encourage users to be physically trained using virtual environment.
    In this growing world of technology, there are so many things going on at the same time. Humans are progressing in so many fields at a time for the betterment of human race. A technology that holds its place among these is Augmented... more
    In this growing world of technology, there are so many things going on at the same time. Humans are progressing in so many fields at a time for the betterment of human race. A technology that holds its place among these is Augmented Reality (AR). AR refers to displaying virtual objects into the real world. It has so much potential and can change the way you live, but not many people are aware of it. People do not know what it is capable of doing and that it can have a huge impact on our day to day lives. Soon it will be so common to our everyday life that it can help us do a lot of things. So just to give the idea of what AR is, me and my team have created a game based on augmented reality to tell the world what it feels like and what it can do to help our lives in the near future.
    In recent technological developments, robot-assisted surgery has become popular due to its tremendous prospects in enhancing the capabilities of surgeons performing open surgery, yet very little effort has been made to make these tools... more
    In recent technological developments, robot-assisted surgery has become popular due to its tremendous prospects in enhancing the capabilities of surgeons performing open surgery, yet very little effort has been made to make these tools available to dental surgeons. This paper addresses the problem of identifying the problem of real-time object recognition of dental instruments by utilizing deep learning techniques. For this reason, the Single Shot MultiBox Detector (SSD) network was considered as the meta structure and joined with the base Convolutional Neural Network (CNN) MobileNet to shape SSD-MobileNet. The task of object recognition for dental instruments like spatula, elevator, mouth mirror etc is performed, in order to constitute a robotic arm; that works with voice commands using speech recognition, and assists the dentist in surgery. Our method can recognize instruments more precisely and quickly as contrast with other lightweight system strategies and conventional machine learning techniques. We have achieved the precision and accuracy of 87.3% and 98.8% respectively.
    Recognizing objects of interest in an environment is one of the most important aspects of security applications. Many techniques exist focusing on object categorization; however, most of them consider just a single viewpoint. This leads... more
    Recognizing objects of interest in an environment is one of the most important aspects of security applications. Many techniques exist focusing on object categorization; however, most of them consider just a single viewpoint. This leads to increased false alarms as multiple objects look alike from one viewpoint and are totally different from other view. Hence, it is important to consider multiple views of the target simultaneously while categorizing. This paper presents a strategy for multi-view object categorization on the basis of video. The temporal and spatial information of videos is utilized to effectively categorize the objects from multiple views. Given a set of images of an object category, independent graphical models are generated for each object using underlying geometry and pruned using morphing. Next, the model is evolved by combining the independent graphs, where each node represents different instances from same viewpoint and links exist between adjacent viewpoints. ...
    Smart clinics have gained much popularity due to the technological advancements in areas like computer vision. The recognition of objects and activities and overall perceiving the environment lies at the core of such systems. This is... more
    Smart clinics have gained much popularity due to the technological advancements in areas like computer vision. The recognition of objects and activities and overall perceiving the environment lies at the core of such systems. This is essential not just for the eco-independent systems, but also for HumanMachine-Interaction specially in scenarios with small work-areas like dental treatment. In this paper, we compare a number of machine learning models (including Multinomial Logistic Regression, Lazy Instance-based Learning (IBk), Sequential Minimal Optimization (SMO), Hoeffding Tree and Random Tree) for robustly identifying dental treatments. We take the objects-focussed as input, which covers parameters like material, symptoms of the patient teeth and tools used by the dentist. We take advantage of the fact that the issue of identifying a particular treatment can be solved by recognizing the objects seen during an activity. We collected a dental dataset in-the-wild and ran our tests ...
    Ability to express yourself to the events that occurs in your surroundings is one of the priceless gift that we have. Normal human being have the ability to see, listen, talk and express themselves but there are few people who are unable... more
    Ability to express yourself to the events that occurs in your surroundings is one of the priceless gift that we have. Normal human being have the ability to see, listen, talk and express themselves but there are few people who are unable to communicate and are not able to express themselves. Moreover it is also difficult for them to understand what normal people are saying to them and for normal people too to commute with them. Therefore to bridge the gap between these two we have created a full duplex smart communication system that less the communication and its gap between the normal people and the people who are unable to communicate. Our system focuses on Deaf & Dumb (D&D) people. Using our approach we have achieved an accuracy and precision of 95% and 85.5%. This paper has two major contributions. At start, uses the Leap Motion Device for hand gesture recognition. This will track data that will be used for training Pakistani Sign Language (PSL). In the second part, when normal...
    In this paper, we generated an activity recognition model using an ANN and trained it using Backpropagation learning. We considered a sandwich making scenario and identified the hand-motion-based activities of reaching, sprinkling,... more
    In this paper, we generated an activity recognition model using an ANN and trained it using Backpropagation learning. We considered a sandwich making scenario and identified the hand-motion-based activities of reaching, sprinkling, spreading and cutting. The contribution of this paper is twofold: First, given the fact that many image processing steps like feature identification are computation intensive and execution time increases sharply as more images are added, we've shown that it is not always useful to add more data. We trained our system using (i) single (front) camera only and (ii) multiple (left, front, right) cameras, and have shown that adding extra cameras decreased the recognition precision from 89.22% to 79.99%. Hence, we've shown that a properly-positioned camera results in a higher precision than multiple, inappropriately-positioned cameras. Second, in the ANN training part, we've shown that adding additional hidden layers/neurons lead to unnecessary comp...
    In Wireless Sensor Network (WSN), clustering is considered as an efficient network topology which maximizes the received data at the sink by minimizing a direct transmission of data from the sensor nodes. Limiting direct communication... more
    In Wireless Sensor Network (WSN), clustering is considered as an efficient network topology which maximizes the received data at the sink by minimizing a direct transmission of data from the sensor nodes. Limiting direct communication between sensor nodes and the sink is achieved by confining sensor node’s transmission within a certain region known as clusters. Once data are being collected from all the sensors in the cluster it is sent to the sink by a node designated to communicate with the sink within a cluster. This technique not only reduces the network congestion, but it increases data reception, and conserves network energy. To achieve an increase in data received at the sink, it is necessary that the correct number of clusters are created within a sensing field. In this paper a new heuristic approach is presented to find the optimal number of clusters in a mobility supported terrestrial and underwater sensor networks. To maintain a strong association between sensor nodes and the node designated known as cluster-head (CH), it is necessary that sensor node’s mobility should also be considered during the cluster setup operation. This approach not only reduces the direct transmission between the sensor nodes and sink, but it also increases sensor node’s connectivity with its CH for the transmission of sensed data which results in the creation of a stable network structure. The proposed analytical estimate considers sensor node’s transmission range and sensing field dimensions for finding the correct number of the clusters in a sensing field. With this approach a better network coverage and connectivity during the exchange of data can be achieved, which in turn increases the network performance.
    Identifying activities of daily living is an important area of research with applications in smart-homes and healthcare for elderly people. It is challenging due to reasons like human self-occlusion, complex natural environment and the... more
    Identifying activities of daily living is an important area of research with applications in smart-homes and healthcare for elderly people. It is challenging due to reasons like human self-occlusion, complex natural environment and the human behavior when performing a complicated task. From psychological studies, we know that human gaze is closely linked with the thought process and we tend to “look” at the objects before acting on them. Hence, we have used the object information present in gaze images as the context and formed the basis for activity prediction. Our system is based on HMM (Hidden Markov Models) and trained using ANN (Artificial Neural Network). We begin with extracting motion information from TPV (Third Person Vision) streams and object information from FPV (First Person Vision) cameras. The advantage of having FPV is that the object information forms the context of the scene. When context is included as input to the HMM for activity recognition, the precision incre...
    Smart homes are becoming a growing need to prepare for a comfortable life style for the elderly and make things easy for the caretakers of the future. One important component of these systems is to identify the human activities and... more
    Smart homes are becoming a growing need to prepare for a comfortable life style for the elderly and make things easy for the caretakers of the future. One important component of these systems is to identify the human activities and scenarios. As the wireless technologies are becoming advanced, they are being used to provide a low-cost, non-intrusive and privacy-conscious solution to activity recognition. However, in more complicated environments, we need to identify scenarios with subtle cues e.g. eye gaze. These situations call for a complementary vision-based solution, and we present a robust scenario recognition system by following the objects seen in eye gaze trajectories. In this paper, we present a probabilistic hierarchical model for scenario recognition using the environmental elements like objects in the scene. We utilize the fact that any scenario can be divided into constituent tasks and activities recursively to the level of atomic actions and objects. Considering bottom-up, the scenario recognition problem can be hierarchically solved by identifying the objects seen and combining them together to form coarse-grained higher level activities. This is a novel contribution to be able to recognize complete scenarios only on the basis of objects seen. We performed experiments on standard datasets of Georgia Tech Egocentric Activities (GTEA-Gaze) and unconstrained videos collected “in the Wild”; and trained an Artificial Neural Network to get a precision of 73.84% and accuracy of 92.27%.
    The authors propose a method to improve activity recognition by including the contextual information from first person vision (FPV). Adding the context, i.e. objects seen while performing an activity, increases the activity recognition... more
    The authors propose a method to improve activity recognition by including the contextual information from first person vision (FPV). Adding the context, i.e. objects seen while performing an activity, increases the activity recognition precision. This is because, in goal-oriented tasks, human gaze precedes the action and tends to focus on relevant objects. They extract object information from FPV images and combine it with the activity information from external or FPV videos to train an Artificial Neural Network (ANN). They used four configurations as combination of gaze/eye-tracker, head-mounted and externally mounted cameras using three standard cooking datasets from Georgia Tech Egocentric Activities Gaze, Technische Universitat Munchen kitchen and CMU multi-modal activity database. Adding object information when training the ANN increased the precision and accuracy of activity recognition from average 58.02% (and 89.78%) to 74.03% (and 93.42%). Experiments also showed that when objects are not considered, having an external camera is necessary. However, when objects are considered, the combination of internal and external cameras is optimal because of their complementary advantages in observing hand and objects. Adding object information also decreases ANN training cycles from 513.25 to 139, which shows that it provides critical information that speeds up training.
    Purpose Watermarking technique is one of the significant methods in which carrier signal hides digital information in the form of watermark to prevent the authenticity of the stakeholders by manipulating different coefficients as... more
    Purpose Watermarking technique is one of the significant methods in which carrier signal hides digital information in the form of watermark to prevent the authenticity of the stakeholders by manipulating different coefficients as watermark in time and frequency domain to sustain trade-off in performance parameters. One challenging component among others is to maintain the robustness, to limit perceptibility with embedding information. Transform domain is more popular to achieve the required results in color image watermarking. Variants of complex Hadamard transform (CHT) have been applied for gray image watermarking, and it has been proved that it has better performance than other orthogonal transforms. This paper is aimed at analyzing the performance of spatio-chromatic complex Hadamard transform (Sp-CHT) that is proposed as an application of color image watermarking in sequency domain (SD). Design/methodology/approach In this paper, color image watermarking technique is designed a...