Integrated machine learning and geochemical modeling reveal hydrogeochemical controls on fluoride and arsenic co-contamination in groundwater


Ullah Z., Ali W., Ullah I., Ahmad T., ARSLAN Ş., Rauf S., ...Daha Fazla

Environmental geochemistry and health, cilt.48, sa.6, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 48 Sayı: 6
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s10653-026-03161-4
  • Dergi Adı: Environmental geochemistry and health
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, Chemical Abstracts Core, Compendex, Environment Index, Geobase, INSPEC, MEDLINE
  • Anahtar Kelimeler: Arsenic, Dadu Canal Command, Fluoride, Geochemical models, Groundwater, Machine learning
  • Ankara Üniversitesi Adresli: Evet

Özet

Groundwater contamination by fluoride (F⁻) and arsenic (As) is a serious environmental issue and poses significant human health risks in many developing countries, including Pakistan, particularly in Sindh Province. To investigate a realistic situation of groundwater contamination, a total of 170 groundwater samples were collected and analyzed concerning F- and (As) along with other physicochemical parameters. The concentrations of fluoride (F⁻) and arsenic (As) in groundwater samples ranged from 0.5 to 6.35 mg/L and 0.5 to 22 µg/L, with mean values of 1.82 mg/L and 5.78 µg/L, respectively. Hydrochemical facies result show that water type of the groundwater resources of the study area belong to mixed CaNaHCO3 followed by CaHCO3 type while few samples fall into NaCl type. Gibbs diagrams indicate that rock-water interaction controls groundwater hydrochemistry, with saturation indices showing calcite, dolomite, fluorite, and goethite are saturated in the study area. Machine Learning (ML) models, including Random Forest (RF), Artificial Neural Network (ANN) and Logistic Regression (LR), were applied and the target variable was F- due to its higher concentration in groundwater samples as compared to As. In ML, models the permutation feature, as well as the mean decrease in impurity (MDI), was used to identify the variables affecting F- in the research region. Among the models, RF achieved the highest accuracy (0.94) and sensitivity (0.97), along with a relatively low error rate (0.06). ANN showed strong performance with an accuracy of 0.92 and sensitivity of 0.88, while LR demonstrated comparatively lower sensitivity (0.81) despite achieving an accuracy of 0.90. The study shows that groundwater in the Dadu Canal Command area is affected by dual contamination, and that integrating hydrogeochemical analysis with machine learning helps identify sources and spatial risk, supporting groundwater management in Sindh.