Occupants adapt to thermal discomfort using three types of thermal behaviours: physiological (e.g., sweating or shivering), changing the environment (e.g., changing heating settings or opening a window), or changing personal elements (e.g., clothing, body position, seating location). Compared to the built environment, the range of thermal behaviours in a vehicle is limited. Modern vehicle climate systems aim to maximise thermal comfort semi-automatically through control modes that allow the occupants to provide feedback via the interface. A novel approach to automatic climate control is the use of Reinforcement Learning. However, past work has ignored the feedback from the user. The main aim of this thesis is to integrate user-feedback into an Reinforcement Learning (RL)-based vehicle climate control and assess if the system can learn user’s preferences within a reasonable time. In order to develop an integrated system that includes the interaction of the user with the climate control interface there is a need for: a) a set of literature-based rules describing the extent of the thermal behaviour; b) a human-agent that mimics the feedback process; c) a method of integrating the simulated feedback in the context of Reinforcement Learning. For the purpose of modelling the interaction with the climate system, three main rules were identiﬁed in the thermal comfort literature related to how likely occupants are to make changes when they are uncomfortable, which setting (temperature, blower or vent) they are likely to select, and which value they are likely to prefer. The activation likelihood for each rule is found using data from an in-ﬁeld experiment with 49 trials, monitoring occupant thermal comfort, climate control actions, and the thermal environment parameters. The resulting hybrid model (User-Based Module (UBM)) is validated against a hold-out set of data from the experimental trials. The User-Based Reinforcement Learning (UBRL) climate controller combines the simulated feedback from the UBM with feedback from the thermal environment by means of reward shaping. Three types of reward shaping methods were statistically compared: state shaping, look-back advice, and look-forward advice. Several State-Action-Reward-State-Action (SARSA)-based RL algorithms were used to train the system and their performance was evaluated using a set of test scenarios. The UBM outperforms simpler models, such as neural network and fuzzy logic, achieving the highest accuracy for estimating setting adjustments. The simulated user feedback from the UBM improves the learning speed of the UBRL controller to 2.9 years of simulated learning. The controller using look-back advice has a statistically higher average reward per trial than alternative methods. Additionally, it requires a lower number of steps to achieve occupant desired equivalent temperature. The UBRL controller using the Double SARSA algorithm achieves on average the occupant’s desired comfort in 5.6 minutes, maintaining it 86% of the journey duration and consuming an average power of 1.07 kW. Therefore the Double SARSA UBRL climate control can signiﬁcantly improve the comfort of the occupants by learning and maintaining their setting preferences within less than half the life time of their vehicles. Potential avenues for improvement involve a variable exploration rate, further development of the human agent and multi-zone climate control, extending its application to a variety of user modelling and control areas.
|Date of Award||Apr 2019|
|Supervisor||Elena Gaura (Supervisor)|