Factors Affecting Water Consumption in a Medium Size City

Water is the most precious resource on earth; all living organisms depend on water to live, and it forms 2/3 of our planet. Despite its importance, there is a shortage of fresh water in most of the world’s urban cities. Hence, conserving water is a strategic choice for almost all humans.

To put water conservation plan, we must know the amount of water consumption in each sector (industry, agriculture, domestic, …). In this study, we have analyzed a dataset of a sample city, that is found on the Kaggle website. The city is Sonora, Mexico which is a medium-size city.

In this study, we would like to answer the following questions

I have explored the data, downloaded the English description of the features (the original dataset is in Spanish), and explored the dataset properties through python language and its scientific libraries. I found many missing data points, some of them were dropped, and some of them were filled using the method described in this post.

The analysis took four stages, which are explained in this GitHub repository.

The dataset contains two main variables, the average monthly water consumption from 2009 to 2015, and a reference water consumption of 2016.

The effect of the temporal factors

The year effect is illustrated as follows:

The effect of the year on the average-monthly, and reference water consumptions

The month effect:

Now, let’s see the month and year combined together.

The effect of the classification factors

1- The effect of industry

The effect of the existence of industrial consumers on water consumption

It appears that the existence of this category increases both the reference consumption (RC) and the Average Monthly Consumption (AMC), however, the increase is not so significant (from 17 units to 25 units AMC, and from 1100 to 1750 units of RC. Notice that most of the cases are non-industrial (10.4M records) vs. 1.3M records of industrial records; this means that the industry is not a big player in water consumption.

2- The effect of agriculture

The effect of the existence of agricultural consumers on water consumption

Although the agricultural usage consumes 800% more than other categories, it is not so significant consumer according to its few records (0.07% of the records). This shows that Sonora is not an agriculture city, as agricultural usages are limited to parks and gardens. However, the high water consumption of agricultural consumers reflects the importance of applying water conservation practices of irrigation in the cities that agriculture plays a significant role in its structure.

3- The effect of housing consumers

The effect of the existence of home consumers and their divisions on water consumption

Unlike the industry category, the housing records are significantly less than the non-housing records. Additionally, the housing category represents most of the records in our dataset (11.6M vs 32K, 99.73%). Nonhousing records include, for example, industrial, commercial, and agricultural records. Still, although the housing consumes less water (about 7.7%), it is important to encourage water consumption policies for this category due to its dominance as 1% reduction in the housing consumption is equivalent to 30.4% reduction in other categories consumption.

The housing category includes domestic properties (only one family per property), social properties (clubs, etc), and residential properties (care home, nursing homes, blocks of flats, home of multiple occupancies, …). This dataset offers the housing category in bulk, and the domestic residential, and social categories separated; the data show that the domestic properties are 12x the social properties, and 9x the residential properties. It is noticed that the significant difference in consumption appears in the domestic and residential properties, but not the social properties.

The most important consumer to take care of

As highlighted above, the housing category is the most important category to give attention, despite it consumes little amount, it compensates the most water consumption, thus if we succeeded to reduce 1% of houses consumption, it will be better than reducing 30% of industrial consumption.

Note: this result is associated with the current data, other cities should differ depending on the houses total consumption in comparison with agricultural or industrial consumptions.

Can we use machine learning to predict water consumption for this city?

As detailed in the GitHub repository, I have tried three different models with two sets of features, the only reliable result was achieved by applying Linear Regression on the full set of features, this gives a 71.6% of correct predictions of the average monthly consumption, however, the reference consumption could not be predicted reliably by the given set of features.


Water conservation is a vital process, people will not survive if the present water consumption continues. it is highly recommended to perform similar studies on each city in the world, to be able to put a plan to reduce water consumption; if we succeeded, the limited fresh water in our plant will be available for us and for our children.

For more information:

Researcher and Data Analyst

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store