Visuospatial skill learning for robots
The so-called “visuospatial skills” allow people to visually perceive objects and the spatial relationships among them. This video demonstrates a novel machine learning approach that allows a robot to learn simple visuospatial skills for performing object reconfiguration tasks. The main advantage of this approach is that the robot can learn from a single demonstration, and can generalize the skill to new initial configurations. The results from this research work were presented at the International Conference on Intelligent Robots and Systems (IROS 2013) in Tokyo, Japan in November 2013.
We present a novel robot learning approach based on visual perception that allows a robot to acquire new skills by observing a demonstration from a tutor. Unlike most existing learning from demonstration approaches, where the focus is placed on the trajectories, in our approach the focus is on achieving a desired goal configuration of objects relative to one another. Our approach is based on visual perception which captures the object’s context for each demonstrated action. This context is the basis of the visuospatial representation and encodes implicitly the relative positioning of the object with respect to multiple other objects simultaneously. The proposed approach is capable of learning and generalizing multi-operation skills from a single demonstration, while requiring minimum a priori knowledge about the environment. The learned skills comprise a sequence of operations that aim to achieve the desired goal configuration using the given objects. We illustrate the capabilities of our approach using three object reconfiguration tasks with a Barrett WAM robot.
Link to publication:
S. Ahmadzadeh, P. Kormushev, D. Caldwell, “Visuospatial Skill Learning for Object Reconfiguration Tasks,” in Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013), Tokyo, Japan, 3-8 Nov 2013.
Robot walking with dynamically generated gait
The video shows the humanoid robot WABIAN-2R walking with dynamically generated gait. Two scenarios are demonstrated: (1) sudden stopping and reversing, and (2) sudden step change to avoid an obstacle. The walking gait is dynamically generated using a hybrid gait pattern generator capable of rapid and dynamically consistent pattern regeneration.
We propose a two-stage gait pattern generation scheme for the full-scale humanoid robots, that considers the dynamics of the system throughout the process. The fist stage is responsible for generating the preliminary motion reference, such as step position, timing and trajectory of Center of Mass (CoM), while the second stage serves as dynamics filter and modifies the initial references to make the pattern stable on the full-scale multi-degree-of-freedom humanoid robot. The approach allows employment of easy to use models for motion generation, yet the use of the dynamics filtering ensures that the pattern is safe to execute on the real-world humanoid robot. The paper contains description of two approaches used in the first and second stage, as well as experimental results proving the effectiveness of the method. The fast calculation time and the use of the system’s dynamic state as initial conditions for pattern generation makes it a good candidate for the real-time gait pattern generator.
Przemyslaw Kryczka, Petar Kormushev, Kenji Hashimoto, Hun-ok Lim, Nikolaos Tsagarakis, Darwin G. Caldwell and Atsuo Takanishi. Hybrid gait pattern generator capable of rapid and dynamically consistent pattern regeneration. Proc. URAI 2013.
Autonomous Robotic Valve Turning
The video shows a KUKA robot that learns how to grasp and turn a valve autonomously. The robot learns not only how to achieve the goal of the task, but also how to react to different disturbances during the task execution. For example, the robot learns a reactive behavior that allows it to pause and resume the task in response to the changes of the uncertainty in the valve position. This helps the robot to avoid collision with the valve, and improves the reliability and robustness of the task execution.
The setup of this experiment comprises: the robot arm which is a KUKA LWR (Lightweight robotic arm), an Optitrack system for motion capture, a T-bar valve with adjustable friction level.
The initial task demonstration and reproduction phases are performed with kinesthetic teaching. The reactive behavior is implemented using a Reactive Fuzzy Decision Maker (RFDM).
The valve turning task is challenging, especially if the valve is moving dynamically. A similar valve-turning task is also included in the DARPA robot competition (DRC). However, in that challenge the valves are fixed, while here the valve is moving, which makes it even more difficult to accomplish the task.
Seyed Reza Ahmadzadeh, Petar Kormushev and Darwin G. Caldwell. Autonomous Robotic Valve Turning: A Hierarchical Learning Approach. IEEE Intl. Conf. on Robotics and Automation (ICRA 2013), Karlsruhe, Germany, 2013.
Learning to walk efficiently with passive compliance
We present a learning-based approach for minimizing the electric energy consumption during walking of a passively-compliant bipedal robot. The energy consumption is reduced by learning a varying-height center-of-mass trajectory which uses efficiently the robot’s passive compliance. To do this, we propose a reinforcement learning method which evolves the policy parameterization dynamically during the learning process and thus manages to find better policies faster than by using fixed parameterization. The method is first tested on a function approximation task, and then applied to the humanoid robot COMAN where it achieves significant energy reduction.
Link to publication:
Kormushev, P., Ugurlu, B., Calinon, S., Tsagarakis, N., and Caldwell, D.G., “Bipedal Walking Energy Minimization by Reinforcement Learning with Evolving Policy Parameterization“, IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS-2011), San Francisco, 2011. [pdf] [bibtex]
Humanoid robot learns to clean a whiteboard
A Japanese humanoid robot (Fujitsu HOAP-2) learns to clean a whiteboard by upper-body kinesthetic teaching during full-body balance control. The research is from an Italian-Japanese collaboration between the Italian Institute of Technology and Tokyo City University.
We present an integrated approach allowing a free-standing humanoid robot to acquire new motor skills by kinesthetic teaching. The proposed full-body control method controls simultaneously the upper and lower body of the robot with different control strategies. Imitation learning is used for training the upper body of the humanoid robot via kinesthetic teaching, while at the same time Reaction Null Space method is used for keeping the balance of the robot. During demonstration, a force/torque sensor is used to record the exerted forces, and during reproduction, we use a hybrid position/force controller to apply the learned trajectories in terms of positions and forces to the end effector. The proposed method is tested on a 25-DOF Fujitsu HOAP-2 humanoid robot with a surface cleaning task.
This research was presented at the International Conference on Robotics and Automation (ICRA) in May 2011 in Shanghai, China.
IIT – Italian Institute of Technology, Advanced Robotics Dept.
TCU – Tokyo City University, Mechanical Systems Engineering Dept.
Link to publication:
Kormushev, P., Nenchev, D.N., Calinon, S., and Caldwell, D.G., ”Upper-body Kinesthetic Teaching of a Free-standing Humanoid Robot“, IEEE Intl. Conf. on Robotics and Automation (ICRA 2011), 2011. [pdf] [bibtex]
Robot Archer iCub
Humanoid robot iCub learns the skill of archery. After being instructed how to hold the bow and release the arrow, the robot learns by itself to aim and shoot arrows at the target. It learns to hit the center of the target in only 8 trials.
The learning algorithm, called ARCHER (Augmented Reward Chained Regression) algorithm, was developed and optimized specifically for problems like the archery training, which have a smooth solution space and prior knowledge about the goal to be achieved. In the case of archery, we know that hitting the center corresponds to the maximum reward we can get. Using this prior information about the task, we can view the position of the arrow’s tip as an augmented reward. ARCHER uses a chained local regression process that iteratively estimates new policy parameters which have a greater probability of leading to the achievement of the goal of the task, based on the experience so far. An advantage of ARCHER over other learning algorithms is that it makes use of richer feedback information about the result of a rollout.
For the archery training, the ARCHER algorithm is used to modulate and coordinate the motion of the two hands, while an inverse kinematics controller is used for the motion of the arms. After every rollout, the image processing part recognizes automatically where the arrow hits the target which is then sent as feedback to the ARCHER algorithm. The image recognition is based on Gaussian Mixture Models for color-based detection of the target and the arrow’s tip.
The experiments are performed on a 53-DOF humanoid robot iCub. The distance between the robot and the target is 3.5m, and the height of the robot is 104cm.
This research was presented at the Humanoids 2010 conference in December 2010 in USA.
Photos of Robot Archer iCub:
NEW! High-resolution photos here: http://bit.ly/boCmVi
Link to publication:
Kormushev, P., Calinon, S., Saegusa, R. and Metta, G., “Learning the skill of archery by a humanoid robot iCub”, Proc. IEEE Intl Conf. on Humanoid Robots (Humanoids-2010), pp. 417-423, 2010.
Robot learns to flip pancakes
Teaching a Barrett WAM robot to flip pancakes:
The video shows a Barrett WAM 7 DOFs manipulator learning to flip pancakes by reinforcement learning.
The motion is encoded in a mixture of basis force fields through an extension of Dynamic Movement Primitives (DMP) that represents the synergies across the different variables through stiffness matrices. An Inverse Dynamics controller with variable stiffness is used for reproduction.
The skill is first demonstrated via kinesthetic teaching, and then refined by Policy learning by Weighting Exploration with the Returns (PoWER) algorithm. After 50 trials, the robot learns that the first part of the task requires a stiff behavior to throw the pancake in the air, while the second part requires the hand to be compliant in order to catch the pancake without having it bounced off the pan.
Link to publication:
Kormushev, P., Calinon, S. and Caldwell, D.G. “Robot Motor Skill Coordination with EM-based Reinforcement Learning”, Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS-2010), 2010.