A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation

Abdulrahman Altahhan

Research output: Contribution to conferencePaper

2 Citations (Scopus)
17 Downloads (Pure)

Abstract

This work describes a fast learning robot goal-aware navigation model that employs both gradient and conjugate gradient Temporal Difference (TD, TD-conj) methods. It builds on the fact that TD-conj was proven to be equivalent to a gradient TD method with a variable lambda under certain conditions. Based on straightforward features extraction process combined with goal-aware capabilities provided by whole image measure, the model solves what we call u-turn-homing benchmark problem without using landmarks. Only one goal snapshot was used with agent facing the goal directly. Therefore a novel threshold stopping formula was used to recognize the goal which is less sensitive to the agent-goal orientation problem. Unlike other models, this model refrains from artificially manipulating or assuming a priori knowledge about the environment, two constraints that widely restrict the applicability of existing models in realistic scenarios. An on-line control method was used to train a set of neural networks. With the aid of variable and fixed eligibility traces, these networks approximate the agent’s action-value function allowing it to take close to optimal actions to reach its home. The effectiveness of the model was experimentally verified on an agent.
Original languageEnglish
Pages1534-1541
DOIs
Publication statusPublished - Jul 2014
EventNeural Networks, 2014 International Joint Conference - Beijing, China
Duration: 6 Jul 201411 Jul 2014

Conference

ConferenceNeural Networks, 2014 International Joint Conference
Abbreviated titleIJCNN
CountryChina
CityBeijing
Period6/07/1411/07/14

Fingerprint

Navigation
Robots
Robot learning
Gradient methods
Feature extraction
Neural networks

Bibliographical note

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”

Keywords

  • TD-conj
  • Home Aware
  • Variable λ TD
  • U-Turn-Homin
  • Orientation Insensitive Thersholding

Cite this

Altahhan, A. (2014). A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation. 1534-1541. Paper presented at Neural Networks, 2014 International Joint Conference, Beijing, China. https://doi.org/10.1109/IJCNN.2014.6889845

A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation. / Altahhan, Abdulrahman.

2014. 1534-1541 Paper presented at Neural Networks, 2014 International Joint Conference, Beijing, China.

Research output: Contribution to conferencePaper

Altahhan, A 2014, 'A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation' Paper presented at Neural Networks, 2014 International Joint Conference, Beijing, China, 6/07/14 - 11/07/14, pp. 1534-1541. https://doi.org/10.1109/IJCNN.2014.6889845
Altahhan A. A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation. 2014. Paper presented at Neural Networks, 2014 International Joint Conference, Beijing, China. https://doi.org/10.1109/IJCNN.2014.6889845
Altahhan, Abdulrahman. / A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation. Paper presented at Neural Networks, 2014 International Joint Conference, Beijing, China.
@conference{b3200875421648dab2d40c2e7ad159d8,
title = "A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation",
abstract = "This work describes a fast learning robot goal-aware navigation model that employs both gradient and conjugate gradient Temporal Difference (TD, TD-conj) methods. It builds on the fact that TD-conj was proven to be equivalent to a gradient TD method with a variable lambda under certain conditions. Based on straightforward features extraction process combined with goal-aware capabilities provided by whole image measure, the model solves what we call u-turn-homing benchmark problem without using landmarks. Only one goal snapshot was used with agent facing the goal directly. Therefore a novel threshold stopping formula was used to recognize the goal which is less sensitive to the agent-goal orientation problem. Unlike other models, this model refrains from artificially manipulating or assuming a priori knowledge about the environment, two constraints that widely restrict the applicability of existing models in realistic scenarios. An on-line control method was used to train a set of neural networks. With the aid of variable and fixed eligibility traces, these networks approximate the agent’s action-value function allowing it to take close to optimal actions to reach its home. The effectiveness of the model was experimentally verified on an agent.",
keywords = "TD-conj, Home Aware, Variable λ TD, U-Turn-Homin, Orientation Insensitive Thersholding",
author = "Abdulrahman Altahhan",
note = "{\circledC} 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”; Neural Networks, 2014 International Joint Conference, IJCNN ; Conference date: 06-07-2014 Through 11-07-2014",
year = "2014",
month = "7",
doi = "10.1109/IJCNN.2014.6889845",
language = "English",
pages = "1534--1541",

}

TY - CONF

T1 - A Fast Learning Variable Lambda TD Model: Used to Realize Home Aware Robot Navigation

AU - Altahhan, Abdulrahman

N1 - © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”

PY - 2014/7

Y1 - 2014/7

N2 - This work describes a fast learning robot goal-aware navigation model that employs both gradient and conjugate gradient Temporal Difference (TD, TD-conj) methods. It builds on the fact that TD-conj was proven to be equivalent to a gradient TD method with a variable lambda under certain conditions. Based on straightforward features extraction process combined with goal-aware capabilities provided by whole image measure, the model solves what we call u-turn-homing benchmark problem without using landmarks. Only one goal snapshot was used with agent facing the goal directly. Therefore a novel threshold stopping formula was used to recognize the goal which is less sensitive to the agent-goal orientation problem. Unlike other models, this model refrains from artificially manipulating or assuming a priori knowledge about the environment, two constraints that widely restrict the applicability of existing models in realistic scenarios. An on-line control method was used to train a set of neural networks. With the aid of variable and fixed eligibility traces, these networks approximate the agent’s action-value function allowing it to take close to optimal actions to reach its home. The effectiveness of the model was experimentally verified on an agent.

AB - This work describes a fast learning robot goal-aware navigation model that employs both gradient and conjugate gradient Temporal Difference (TD, TD-conj) methods. It builds on the fact that TD-conj was proven to be equivalent to a gradient TD method with a variable lambda under certain conditions. Based on straightforward features extraction process combined with goal-aware capabilities provided by whole image measure, the model solves what we call u-turn-homing benchmark problem without using landmarks. Only one goal snapshot was used with agent facing the goal directly. Therefore a novel threshold stopping formula was used to recognize the goal which is less sensitive to the agent-goal orientation problem. Unlike other models, this model refrains from artificially manipulating or assuming a priori knowledge about the environment, two constraints that widely restrict the applicability of existing models in realistic scenarios. An on-line control method was used to train a set of neural networks. With the aid of variable and fixed eligibility traces, these networks approximate the agent’s action-value function allowing it to take close to optimal actions to reach its home. The effectiveness of the model was experimentally verified on an agent.

KW - TD-conj

KW - Home Aware

KW - Variable λ TD

KW - U-Turn-Homin

KW - Orientation Insensitive Thersholding

U2 - 10.1109/IJCNN.2014.6889845

DO - 10.1109/IJCNN.2014.6889845

M3 - Paper

SP - 1534

EP - 1541

ER -