A big portion of our common surroundings were created by humans, for humans. Along the centuries, we shaped the environments surrounding us according to our own conceptions and convenience. With the growing need of robots that can perform tasks on those large-scale dynamic environments, it is paramount that those robots can understand the World in the same fashion as humans do. Being able to reason and perform high-level tasks, with human-like learning and cognitive skills that can enhance their task planning and fast adaptation to highly dynamic surroundings, while also storing and utilizing past experiences are crucial skills for the next generation of robots. However, the current tools still mostly focus on machine-centric environment modeling, which reiterates the need of a new human-like environment and knowledge model.
Our proposal - Triplet Ontological Semantic Model (TOSM)
In this work, we present a real-time autonomous robot perception and navigation framework, for human-like robot interaction and task performance in a large-scale dynamic environment. Our main research focus is to introduce a novel semantic modeling and mapping structure, which will further enable robots to better understand and act on their surroundings in a parallel form as humans would do. Triplet Ontological Semantic Model (TOSM) was created based on the understanding of the human visual sensory information processing from cognitive science and the brain GPS model from neuroscience research and physiology.
Our model consists of a set of explicit models, implicit models and symbols which together can represent various geometric characteristics, as well as material and relationship information. More specifically, the explicit model defines all the geometrical and physical information that can be retrieved by sensors, while the implicit model describes the intrinsic relation between objects, spaces and its occupants and semantic knowledge which cannot be obtained just by using sensors. The symbolic model defines any element in a language-oriented way.
Based on the findings of cognitive science, we propose a hierarchical mapping system, which can be generated on-demand according to the specifications of the robot and the given task. By using TOSM approach, we eliminate the demand to store several different maps by being able to generate them only when needed after assigning a task to the robot.
Software Platform Framework
Our proposed framework minimizes the dependency of a cloud database for robot’s independent and real-time performance. Block diagram illustrates our proposed framework. This framework, consisting of robot, network and cloud, is designed with a robot-centric architecture. The core of this framework is robot's memory system separated by Long-Term Memory (LTM) and Short-Term Memory (STM), based on the human memory system. The LTM is comprised of the robot mounted on-demand database that stores environmental information, behavior, knowledge, and map data. The STM which is used as a working memory stores information obtained from sensors and the dynamic driving map for self-driving. These memories are organically linked to Autonomous Navigation Module (ANM), Learning Module (LM), and Behavior Planner Module (BPM) to form robot’s behavior and knowledge system. Network and cloud databases are LTM, stored in the network and cloud respectively. These databases complement the limited storage capacity of robots through an interface with the on-demand database.
TOSM-based on-demand database
Our ontology was represented using the protégé tool, and can divided into three parts: classes, object properties and data properties. A class is a general set of individuals, while object properties represent relations between different individuals. Data properties represent data related to individuals like integer, string and so on. This division is not equivalent to the three main parts of TOSM, and can be seen as a ontological representation of our framework. In our ontological model, the following classes were defined: Map, MathematicalStructure, Time, Behavior and EnvironmentElement. The ontology can be described through creating individuals in corresponding classes, attaching data properties and then defining relations by using object properties. For example, in the following sentence , “Room1 ‘hasBoundary’ boundary1” room1 is an individual of the EnvironmentElement class while boundary1 is an individuals of the MathematicalStructure class, and ‘hasBoundary’ is the instance of object properties that connect both. Object properties can be divided into describedInMap, mathemeticalProperty, spatialRelationKnowledge and temporal Knowledge. Lastly, symbol and explicitModel are defined in the data properties section also together with objectSemanticKnowledge, placeSemanticKnowledge and temporalSemanticKnowledge.
AI planning for long-term autonomy
The efficiency of a task planning module depends heavily on the data storage capacity and on a reduced computational time. Robots should be able to figure out the current environment state (while comparing with the stored prior information) and make the best decision by analyzing its given task, environmental information and its own self state. By introducing the implicit information in our framework, we allow robots to take high-level decisions based on information that goes beyond what the robot can perceive just by using its sensors. As an example, consider an automatic door. Even though the door is closed most of the time we, as humans, know that it automatically opens as soon as we stand in front of the door. By adding this kind of information into our framework, we allow the robot to reason using those high-level implicit information and plan accordingly.
Semantic descriptor based Learning and Recognition
When executing a task, the robot may encounter discrepancies between the information obtained by its sensors and the data stored on the on-demand database. By using this newly obtained information, the robot should not only update the explicit information (i.e. sensorial data), but also add the implicit data about any novel object/place. To understand the implicit data of an object/place, the robot must first figure out its class by using Machine Learning algorithms and then attach the implicit data by looking into the information of similarly classified objects/places and reasoning whether they should be incorporated by this new entity or not.
Our proposed approach in the field of autonomous mobile robot perception is aimed at developing real-time and near-perfect object detection and place recognition using semantic descriptor-based learning and recognition. The overview of our semantic descriptor-based recognition model is illustrated in the following figure.
Recognition model: Training and Testing
Recognition model consists of two stages: Training stage and Testing stage
1. Training Stage:
It assists to train the model for object detection and place recognition using sensory input database and knowledge database. It involves semantic analysis, sematic descriptor and training recognition model
I. Semantic Analysis
The purpose of semantic analysis is to get the information about image contents and its characteristics. It includes two major operations related to visual data preprocessing and feature extraction.
a) Preprocessing of Video Input Data:
We use different image processing techniques in this step for noise removal, illumination equalization and contrast enhancement. We apply various filters to remove the noise by preserving the object’s details and contrast stretching to handle contrast enhancement and illumination issues.
b) Extracting Features:
We extract useful object features from processed visual data using computer vision methods. These features include both global features and local features. We get the overall properties of different objects in an image/ video frame by extracting global features (colors, edges, corners) while we acquire salient regions or patches within image by extracting local features.
Computational simplicity and low storage requirement are two main factors that motivated us to pass the extracted feature vectors instead of whole image to the algorithm for future processing.
II. Semantic Descriptor
The result of Image analysis at semantic level is the extraction of semantic descriptions as per human perception. Thus, we reduce the semantic gap by combining the visual features extracted at low level and information at high level.
III. Training Recognition Model
We build a CNN based on semantic descriptor and train our CNN model for object detection and place recognition. We pass feature vectors instead of the whole image to train the model and make our model more efficient with minimum memory/ storage requirement
2. Testing Stage:
In this stage, we also perform semantic analysis. After that, we run our semantic descriptor-based CNN trained model in real world using the visual input data that is directly acquired from the visual sensors (camera) and knowledge database. Finally, we retrieve this data to build semantic map based on object detection and place recognition for robust robot navigation
Our real-time autonomous robot perception and navigation framework overcomes the major challenges and processes the visual data for providing the robot, human-like navigation and understanding about the environment around it.
Simulator for Semantic Navigation
To demonstrate the usability of our proposed framework, we designed a simulation environment of a convention center, which we believe to be a good example of a highly dynamic environment that requires long-term autonomy. By using the Gazebo simulator together with ROS (Robot Operating System), we were able to model an environment that contains several static objects/people, moving actors and a four-wheeled robot. The simulated robot contains both an RGB-D camera (3D sensor) and a laser range finder (2D sensor) and can be easily controlled using ROS. The environment information is stored in the on-demand database following our proposed structure, in order to prove the feasibility of our framework.
S. H. Joo, S. Manzoor, Y. G. Rocha, H. U. Lee and T. Y. Kuc. A Realtime Autonomous Robot Navigation Framework for Human like High-level Interaction and Task Planning in Global Dynamic Environment. In IEEE 18th International Conference on Electronics, Information, and Communication (ICEIC), 2019.