Abstract
Rivers are a significant and low-cost supply of irrigation, industrial, and residential water across the world. Due to urbanization, population growth, and climate change, many rivers are now confronting an irreparable water crisis. According to a study carried out in 2018 by the Central Pollution Control Board (CPCB) of India, there are a total of 351 polluted river stretches in India. Accessing the pollution level in these rivers is a challenge and requires the measurement of several water quality indices including turbidity, coliform levels, total phosphate, etc. These indices are then used to calculate the water quality index which gives us a fair estimate of the level of pollution in water. However, the measurement of all these indices is challenging and requires the use of complex measuring tools. In this paper, we try to solve this problem by building machine learning models that can predict the water quality index just by using a subset of the indices that are required for the calculation of the water quality index. The models trained take turbidity, pH, Dissolved Oxygen (DO), Biological Oxygen Demand (BOD), total coliform, hardness as CaCo3, fluoride, ammonia, Chemical Oxygen Demand (COD) and total suspended solids as input and make predictions for water quality using regression techniques. The models are trained specifically for Bhadra river data, but the technique can be extended for other river stretches and water bodies. We use mean squared error, mean absolute error and R2 score to evaluate our models. An artificial neural network trained with 7 dense layers outperforms other regressors in terms of the evaluation metrics used. With the availability of better datasets, the technique of using regressors for the prediction of water quality could help in the better and faster assessment of water quality which can eventually lead to quicker remedial measures