A simple explanation of Bayes’ probability theorem for data science learners

Photo by Riho Kroll on Unsplash

In real life, we can know the number of occurrences of an event relative to other events. For example, let's throw a fair dice. We understand that each face has an equal chance to appear, so we say that the theoretical probability of getting any face is the inverse of the number of faces, i.e., 1/6 as the regular dice has six faces.

Using Data Science (pySpark) to detect customers' churn.


Nowadays, many companies switched their business model from a one-time fee to a monthly or annual subscription. The customers have the right to cancel their subscription at any time, or, in some cases, downgrade to the free subscription model. On the other hand, companies want to keep their customers at the paid level.

Usually, the customers who leave have some signs that they are about to do so. These signs differ from service to another; for example, for a telephone company, the leaving customers usually call the support more frequently, they submit some complaints, or they rarely use the service…

Water is the most precious resource on earth; all living organisms depend on water to live, and it forms 2/3 of our planet. Despite its importance, there is a shortage of fresh water in most of the world’s urban cities. Hence, conserving water is a strategic choice for almost all humans.

To put water conservation plan, we must know the amount of water consumption in each sector (industry, agriculture, domestic, …). In this study, we have analyzed a dataset of a sample city, that is found on the Kaggle website. The city is Sonora, Mexico which is a medium-size city.

Time series are an important form of indexed data found in stocks data, climate datasets, and many other time-dependent data forms. Due to its time-dependency, time series are subject to have missing points due to problems in reading or recording the data.

To apply machine learning models effectively, the time series has to be continuous, as most of the ML models are not designed to deal with missing values. Hence, the rows with missing data should be either dropped or filled with appropriate values.

In time-independent data (non-time-series), a common practice is to fill the gaps with the mean or…

Dr Mohammad El-Nesr

Researcher and Data Analyst

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store