Abstract
Scientific predictions are a key component of Environmental Impact Assessments (EIA), which can indicate the level of change within an environmental sphere (e.g., soil). As part of the EIA process, decision-making in mitigating complex environmental problems such as maintaining soil quality can be challenging, especially in data-sparse locations. Artificial Intelligence (AI) can ameliorate but the literature suggests that the deployment of Machine Learning (ML) techniques in soil research is concentrated mostly in developed countries. The potential of ML in managing soil pollution from complex mixture of heavy metals, petroleum hydrocarbons, and physicochemical factors is rarely explored. To address this research gap, we built robust models that increase the accuracy of impact prediction based on new experimental soil data from a data-sparse region of
Africa (i.e., Nigeria). The algorithms applied are artificial neural networks (ANN), support vector regression (SVR), regression tree (RT), and random forest (RF). The study also implemented a multivariate linear regression (MLR) model as a baseline. Key findings include (a) the MLR model performed less than the machine learning models largely due to the nonlinearity of data; (b) Log-normalization helped to improve the predictive capability of all models as the effects of statistical variability were removed; (c) the RF model had the best performance in terms of correlation coefficient, mean absolute error, and root mean square error, and (d) the machine learning models showed improved performance with increased correlation and lower error between the actual
and predicted soil electrical conductivity values. Our results imply that data sparsity may no longer be an excuse for the non-use of quantitative impact prediction in Environmental Impact Assessment (EIA) processes. This could change how EIAs are conducted and enhance sustainability in natural resource exploitation, globally. Future work will apply algorithms for automated feature
selection to obtain optimal subset of soil quality measurements that will further improve the accuracy of the models.
Africa (i.e., Nigeria). The algorithms applied are artificial neural networks (ANN), support vector regression (SVR), regression tree (RT), and random forest (RF). The study also implemented a multivariate linear regression (MLR) model as a baseline. Key findings include (a) the MLR model performed less than the machine learning models largely due to the nonlinearity of data; (b) Log-normalization helped to improve the predictive capability of all models as the effects of statistical variability were removed; (c) the RF model had the best performance in terms of correlation coefficient, mean absolute error, and root mean square error, and (d) the machine learning models showed improved performance with increased correlation and lower error between the actual
and predicted soil electrical conductivity values. Our results imply that data sparsity may no longer be an excuse for the non-use of quantitative impact prediction in Environmental Impact Assessment (EIA) processes. This could change how EIAs are conducted and enhance sustainability in natural resource exploitation, globally. Future work will apply algorithms for automated feature
selection to obtain optimal subset of soil quality measurements that will further improve the accuracy of the models.
| Original language | English |
|---|---|
| Article number | 100554 |
| Number of pages | 18 |
| Journal | Journal of Environmental Advances |
| Volume | 17 |
| Early online date | 4 Jun 2024 |
| DOIs | |
| Publication status | Published - Oct 2024 |
Bibliographical note
Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)Keywords
- Machine learning
- Artificial intelligence
- Multivariate Linear Regression
- Sustainability
- Environmental Impact Assessment (EIA)
- Soil Quality
- Environmental Data Science
ASJC Scopus subject areas
- General Environmental Science
- General Computer Science