Modelling of Soil Heavy Metal contamination using Machine learning techniques and spectroscopic data

Document Type : Original Research

Authors
1 Department of Remote Sensing and GIS, Faculty of Humanities, tarbiat Modares University, Tehran, Iran
2 MSc., Department of Remote Sensing and GIS, Islamic Azad University, Science and Research branch of Tehran, Iran.
Abstract
Mines and their related-industries are able to affect their surrounding environment, not only by their activities, but also after being abandoned. Among their different harmful effects, under water and surface water contaminations, and soil contamination can be mentioned. In order to manage these environmental effects, it is necessary to use reasonable methods for modelling heavy metal concentration in soil. This study aims to present a framework for modelling heavy metal soil contamination based on spectroscopy and statistical models. For this purpose, the spectral curves of the 53 soil samples, derived from an abandoned mine and its surrounding areas in New South Wales, Australia, were collected using a spectroradiometer in visible to short wavelength infrared (SWIR) wavelengths. Calculating the second derivative of the collected spectral data, random forest feature selection method (RFFS) was used to determine the most important spectral data for modelling heavy metal concentrations including lead, silver, cadmium and mercury. Then, the modelling techniques including multiple linear regression, random forest regression, and support vector regression (SVR) were applied on the selected spectral data. The results indicated that SWIR wavelengths are the most important spectral data for modelling heavy metal concentrations. Moreover, the non-linear machine learning methods, especially random forest with RMSE of 0.8 ppm and R2 of 0.51 for lead and RMSE of 9.4 ppm and R2 of 0.46 for cadmium performed better than multiple linear regression.

Keywords

Subjects


Atieh, M. A., Ji, Y., & Kochkodan, V. (2017). Metals in the Environment: Toxic Metals Removal. Bioinorganic Chemistry and Applications, 2017, 4309198. doi:10.1155/2017/4309198
Aryafar, A., Gholami, R., Rooki, R., & Doulati Ardejani, F. (2012). Heavy metal pollution assessment using support vector machine in the Shur River, Sarcheshmeh copper mine, Iran. Environmental Earth Sciences, 67(4), 1191-1199. doi:10.1007/s12665-012-1565-7
Araki, S., Shima, M., & Yamamoto, K. (2018). Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan. Sci Total Environ, 634, 1269-1277. doi:10.1016/j.scitotenv.2018.03.324
Altman, DG. (1990). Practical Statistics for Medical Research. Boca Raton, Florida: CRC Press
Bao, Y., Meng, X., Ustin, S., Wang, X., Zhang, X., Liu, H., & Tang, H. (2020). Vis-SWIR spectral prediction model for soil organic matter with different grouping strategies. Catena, 195. doi:10.1016/j.catena.2020.104703
Breiman, L. (2001a). Random forests. Mach Learn 45:5–32
Brokamp, C., Jandarov, R., Rao, M. B., LeMasters, G., & Ryan, P. (2017). Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches. Atmospheric Environment, 151, 1-11. doi:https://doi.org/10.1016/j.atmosenv.2016.11.066
Dube A, Z. R., Kowalkowski T., Cukrowska E., Buszewski B. (2001). Adsorption and Migration of Heavy Metals in Soil. Polish Journal of Environmental Studies, 10(1), 10.
Dewi, C. and Chen, R.C. (2019). Random forest and support vector machine on features selection for regression analysis. Int. J. Innov. Comput. Inf. Control, 15 (6), 2027-2037.
Doksum, K., Tang, S., & Tsui, K.-W. (2008). Nonparametric Variable Selection: The EARTH Algorithm. Journal of the American Statistical Association, 103(484), 1609-1620. doi:10.1198/016214508000000878
Guan, Q., Zhao, R., Wang, F., Pan, N., Yang, L., Song, N., . . . Lin, J. (2019). Prediction of heavy metals in soils of an arid area based on multi-spectral data. Journal of Environmental Management, 243, 137-143. doi:https://doi.org/10.1016/j.jenvman.2019.04.109
Gerald, B. (2018). A Brief Review of Independent, Dependent and One Sample t-test. International Journal of Applied Mathematics and Theoretical Physics, 4(2). doi:10.11648/j.ijamtp.20180402.13
Gupta, V. K., Gupta, M., & Sharma, S. (2001). Process development for the removal of lead and chromium from aqueous solutions using red mud--an aluminium industry waste. Water research, 35(5), 1125–1134. https://doi.org/10.1016/s0043-1354(00)00389-4
Hastie T, T. R., Friedman J (2008). The Elements of Statistical Learning. Springer, 2nd edn.
Harrison, J., Heijnis, H., & Caprarelli, G. (2003). Historical pollution variability from abandoned mine sites, Greater Blue Mountains World Heritage Area, New South Wales, Australia. Environmental Geology, 43(6), 680-687. doi:10.1007/s00254-002-0687-8
Hong-gui, D., Teng-feng, G., Ming-hui, L., & Xu, D. (2012). Comprehensive Assessment Model on Heavy Metal Pollution in Soil. International Journal of Electrochemical Science, 7, 5286-5296.
Hafsa, N., Rushd, S., Al-Yaari, M., & Rahman, M. (2020). A Generalized Method for Modeling the Adsorption of Heavy Metals with Machine Learning Algorithms. Water, 12(12). doi:10.3390/w12123490
Hegazi, H. A. (2013). Removal of heavy metals from wastewater using agricultural and industrial wastes as adsorbents. HBRC Journal, 9(3), 276-282. doi:https://doi.org/10.1016/j.hbrcj.2013.08.004
Jaykaran. (2010). How to select appropriate statistical test? Journal of Pharmaceutical Negative Results, 1, 61. doi:10.4103/0976-9234.75708
Hu, B., Xue, J., Zhou, Y., Shao, S., Fu, Z., Li, Y., . . . Shi, Z. (2020). Modelling bioaccumulation of heavy metals in soil-crop ecosystems and identifying its controlling factors using machine learning. Environmental Pollution, 262, 114308. doi:https://doi.org/10.1016/j.envpol.2020.114308
Jin, L., Zhang, G., & Tian, H. (2014). Current state of sewage treatment in China. Water research, 66, 85–98. https://doi.org/10.1016/j.watres.2014.08.014
Järup L. (2003). Hazards of heavy metal contamination. British medical bulletin, 68, 167–182. https://doi.org/10.1093/bmb/ldg032
Kooistra, L., Wehrens, R., Leuven, R. S. E. W., & Buydens, L. M. C. (2001). Possibilities of visible–near-infrared spectroscopy for the assessment of soil contamination in river floodplains. Analytica Chimica Acta, 446(1), 97-105. doi:https://doi.org/10.1016/S0003-2670(01)01265-X
Kellner, J., & Celisse, A. (2019). A one-sample test for normality with kernel methods. Bernoulli, 25(3), 1816-1837. doi:10.3150/18-BEJ1037
Kanungo, S. B., & Mohapatra, R. (2000). Leaching Behavior of Various Trace Metals in Aqueous Medium from Two Fly Ash Samples. Journal of Environmental Quality, 29(1), 188-196. doi:https://doi.org/10.2134/jeq2000.00472425002900010024x
Kumar, P., & Saravanan, A. (2017). Sustainable wastewater treatments in textile sector.
Liaw, A. Wiener, M. (2002). Classification and regression by random Forest. R News 2(3):18–22.
Lamine, S., Petropoulos, G. P., Brewer, P. A., Bachari, N. E., Srivastava, P. K., Manevski, K., . . . Macklin, M. G. (2019). Heavy Metal Soil Contamination Detection Using Combined Geochemistry and Field Spectroradiometry in the United Kingdom. Sensors (Basel), 19(4). doi:10.3390/s19040762
Lau, Y. J., Khan, F. S. A., Mubarak, N. M., Lau, S. Y., Chua, H. B., Khalid, M., & Abdullah, E. C. (2019). Chapter 10 - Functionalized carbon nanomaterials for wastewater treatment. In S. Thomas, Y. Grohens, & Y. B. Pottathara (Eds.), Industrial Applications of Nanomaterials (pp. 283-311): Elsevier.
Malley, D. F., & Williams, P. C. (1997). Use of Near-Infrared Reflectance Spectroscopy in Prediction of Heavy Metals in Freshwater Sediment by Their Association with Organic Matter. Environmental Science & Technology, 31(12), 3461-3467. doi:10.1021/es970214p
MCDONALD, J. H. (2014). Handbook of Biolological Statistics. Third Edition. Baltimore, Maryland, U.S.A: Sparky House Publishing, University of Delaware
Omran, E.-S. E. (2016). Inference model to predict heavy metals of Bahr El Baqar soils, Egypt using spectroscopy and chemometrics technique. Modeling Earth Systems and Environment, 2(4), 1-17. doi:10.1007/s40808-016-0259-7
Pacyna, J.M. (1994). Global Perspectives on Lead, Mercury and Cadmium Cycling in the Environment. Edited by T.C. Hutchingson Wiley Eastern Ltd. 315-328
Prasad, A., Iverson, L., & Liaw, A. (2006). Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems, 9, 181-199. doi:10.1007/s10021-005-0054-1
Peng, B., Fang, S., Tang, L., Ouyang, X., & Zeng, G. (2019). Chapter 8 - Nanohybrid Materials Based Biosensors for Heavy Metal Detection. In L. Tang, Y. Deng, J. Wang, J. Wang, & G. Zeng (Eds.), Nanohybrid and Nanoporous Materials for Aquatic Pollution Control (pp. 233-264): Elsevier.
Pereira, L. A., Taylor-Rodríguez, D., & Gutiérrez, L. (2020). A Bayesian nonparametric testing procedure for paired samples. Biometrics, 76(4), 1133-1146. doi:https://doi.org/10.1111/biom.13234
Qiu, L., Wang, K., Long, W., Wang, K., Hu, W., & Amable, G. S. (2016). A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models. PLoS One, 11(3), e0151131. doi:10.1371/journal.pone.0151131
Rodriguez-Galiano, V., Mendes, M. P., Garcia-Soldado, M. J., Chica-Olmo, M., & Ribeiro, L. (2014). Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). The Science of the total environment, 476-477, 189–206. https://doi.org/10.1016/j.scitotenv.2014.01.001
Rhouati, A., Marty, J.-L., & Vasilescu, A. (2018). Chapter 7 - Metal Nanomaterial-Assisted Aptasensors for Emerging Pollutants Detection. In D. P. Nikolelis & G.-P. Nikoleli (Eds.), Nanotechnology and Biosensors (pp. 193-231): Elsevier.
Shamsoddini, A., Raval, S., & Taplin, R. (2014). Spectroscopic analysis of soil metal contamination around a derelict mine site in the Blue Mountains, Australia. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, II-7, 75-79. doi:10.5194/isprsannals-II-7-75-2014
Sakizadeh, M., Mirzaei, R., & Ghorbani, H. (2016). Support vector machine and artificial neural network to model soil pollution: a case study in Semnan Province, Iran. Neural Computing and Applications, 28(11), 3229-3238. doi:10.1007/s00521-016-2231-x
Szefer, P., Ikuta, K., Kushiyama, S., Frelek, K., & Geldon, J. (1997). Distribution of Trace Metals in the Pacific Oyster, Crassostrea gigas, and Crabs from the East Coast of Kyushu Island, Japan. Bulletin of Environmental Contamination and Toxicology, 58(1), 108-114. doi:10.1007/s001289900307
Szefer, P., Ali, A. A., Ba-Haroon, A. A., Rajeh, A. A., Gełdon, J., & Nabrzyski, M. (1999). Distribution and relationships of selected trace metals in molluscs and associated sediments from the Gulf of Aden, Yemen. Environmental Pollution, 106(3), 299-314. doi:https://doi.org/10.1016/S0269-7491(99)00108-6
Schmidt, S.-A., Gukelberger, E., Hermann, M., Fiedler, F., Großmann, B., Hoinkis, J., . . . Bundschuh, J. (2016). Pilot study on arsenic removal from groundwater using a small-scale reverse osmosis system—Towards sustainable drinking water production. Journal of Hazardous Materials, 318, 671-678. doi:https://doi.org/10.1016/j.jhazmat.2016.06.005
Tan, K., Ma, W., Wu, F., & Du, Q. (2019). Random forest-based estimation of heavy metal concentration in agricultural soils with hyperspectral sensor data. Environ Monit Assess, 191(7), 446. doi:10.1007/s10661-019-7510-4
Tasharrofi, S., Sadegh Hassani, S., Taghdisian, H., & Sobat, Z. (2018). 24 - Environmentally friendly stabilized nZVI-composite for removal of heavy metals. In C. M. Hussain & A. K. Mishra (Eds.), New Polymer Nanocomposites for Environmental Remediation (pp. 623-642): Elsevier.
Vapnik,V. (1998). Statistical Learning Theory. John Wiley & Sons Inc., New York, p. 736.
Wang, W., Xu, Z., Lu, W., & Zhang, X. (2003). Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing, 55(3-4), 643-663. doi:10.1016/s0925-2312(02)00632-x
Wang, H., Yilihamu, Q., Yuan, M., Bai, H., Xu, H., & Wu, J. (2020). Prediction models of soil heavy metal(loid)s concentration for agricultural land in Dongli: A comparison of regression and random forest. Ecological Indicators, 119. doi:10.1016/j.ecolind.2020.106801
Wang, J., Cui, L., Gao, W., Shi, T., Chen, Y., & Gao, Y. (2014). Prediction of low heavy metal concentrations in agricultural soils using visible and near-infrared reflectance spectroscopy. Geoderma, 216, 1-9. doi:10.1016/j.geoderma.2013.10.024
Wu, Y., Chen, J., Wu, X., Tian, Q., Ji, J., & Qin, Z. (2005). Possibilities of reflectance spectroscopy for the assessment of contaminant elements in suburban soils. Applied Geochemistry, 20(6), 1051-1059. doi:https://doi.org/10.1016/j.apgeochem.2005.01.009
Wuana, R. A., & Okieimen, F. E. (2011). Heavy Metals in Contaminated Soils: A Review of Sources, Chemistry, Risks and Best Available Strategies for Remediation. ISRN Ecology, 2011, 402647. doi:10.5402/2011/402647
Xue, Y., Zou, B., Wen, Y., Tu, Y., & Xiong, L. (2020). Hyperspectral Inversion of Chromium Content in Soil Using Support Vector Machine Combined with Lab and Field Spectra. Sustainability, 12(11). doi:10.3390/su12114441