This paper describes a reinforcement learning architecture that is capable of incorporating deeply learned feature representation of a robot's unknown working environment. An autoencoder is used along with convolutional and pooling layers to deduce the reduced feature representation based on a set of images taken by the agent. This representation is used to discover and learn the best route to navigate to a goal. The features are fed to an actor layer that can learn from a value function calculated by a second output layer. The policy is ɛ-greedy and the effect is similar to actor-critic architecture where temporal difference error is back propagated from the critic to the actor. This compact architecture helps in reducing the overhead of setting up a desired fully fledged actor-critic architecture that typically needs extra processing time. Hence, the model is ideal for dealing with lots of data coming from visual sensor that needs speedy processing. The processing is accomplished off board due to the limitation of the used robot but latency was compensated by the speedy processing. Adaptability for the different data sizes, critical to big data processing, is realized by the ability to shrink or expand the whole architecture to fit different deeply learned feature dimensions. This added flexibility is crucial for setting up such model since the space dimensionality is not known prior to operating in the environment. Initial experimental results on real robot show that the agent accomplished good level of accuracy and efficacy in reaching the goal.
Bibliographical noteUnder a Creative Commons License.
- reinforcement learning
- big data
- visual sensors
- deep learning