This lecture defines the concept of probability and introduces its main properties.
The aim is to provide a rigorous introduction to the mathematics of probability, although in a gradual manner, with plenty of explanations and examples.
Table of contents
It would be nice to start a course on probability theory by giving a concise, simple and intuitive, yet mathematically rigorous, definition of probability. Unfortunately, this is not possible.
On the one hand, a rigorous definition of probability requires a sophisticated mathematical apparatus and is pretty unintuitive.
On the other hand, simple definitions are often misleading or, at best, tautological.
For example, we could say that probability is a number that quantifies the likelihood of a given event when it is not yet known whether the event will happen or not. This definition is circular because it uses the concept of likelihood, which is a synonym for probability. Nonetheless, we can use it as a starting point. It highlights two important facts:
probability refers to an event;
probability is a number.
By elaborating on these two facts, we will give an (almost entirely) rigorous definition of probability. To this end, we will introduce the concept of event in the next section. Then, we will define probability as a function that attaches numbers to events and satisfies certain "intuitive" properties.
Throughout this lecture, we will assume that you are familiar with the basics of set theory. If you are not, you can revise the basics here.
From a mathematical viewpoint, the things in this list form a set, which we denote by .
We require to satisfy the following two properties:
Mutually exclusive outcomes. Only one of the things in can happen. That is, if happens, then none of the things in the set can happen.
Exhaustive outcomes. At least one of the things in will happen.
If satisfies these two properties it is called a sample space, or space of all possible outcomes. Furthermore,
a subset is called an event (we will briefly explain below that not every subset of the sample space is, strictly speaking, an event; however, on a first reading you can be happy with this definition).
Here is an example of a sample space.
Example Suppose that we toss a die. Six numbers, from 1 to 6, can appear face up, but we do not yet know which one of them will appear. The sample space isEach of the six numbers is a sample point. The outcomes are mutually exclusive because only one number at a time can appear face up. The outcomes are also exhaustive because at least one of the six numbers will appear face up after we toss the die. Define is an event (a subset of ). It can be described as "an odd number appears face up". Now defineAlso is an event and it can be described as "the number 6 appears face up".
Note that the sample space itself is an event because every set is a subset of itself. It is called the sure event.
Also the empty set is an event because it can be considered a subset of . It is called the impossible event.
Now that we have defined the concept of an event, we can think of the probability of an event as a number attached to that tells us how likely it is that will happen.
Unfortunately, this is not yet a real definition because "likely" is a synonym of "probable". We are being circular again! But we are closer to a definition than before. To get even closer, we need to introduce another mathematical concept, that of space of events.
A space of events, which we denote by , is a set of subsets of . In other words, each element of is an event.
Example Consider the same sample space introduced in the previous example (toss of a die):Define the eventsThe setis a space of events (remember that and are events).
In rigorous probability theory, the space of events is required to satisfy certain properties (it is required to be a sigma-algebra). For the moment, we are not discussing these properties, but we will briefly speak about them below, after having defined probability.
We are now ready to define probability.
Definition Denote by a function from the space of events to the set of real numbers, that is, a function that assigns a number to each event . The function is a probability measure if and only if it satisfies the following three properties:
Range: for any event .
Sigma-additivity (or countable additivity): for any sequence of mutually exclusive events (i.e., such that if ).
When a probability measure assigns a number to an event , then is called the probability of .
We are finally done! We have defined probability! We now need to make sure that we fully understand the definition.
Let us recap the main steps made so far:
we have defined the concept of event;
we have created a collection of events called the space of events;
we have created a function on the space of events that assigns a number to each event;
we have said that if such a function satisfies certain properties, then it is a probability measure.
The last point needs to be explained. But before trying to understand why the three mathematical properties above are used to define probability, let us analyze them in more detail.
The Range property is self-explanatory. It just means that the probability of an event is a real number between 0 and 1. This can be thought of as a convention: we decide that probability needs to be a positive number and that events can have at most probability 1.
The Sure thing property says that the highest possible probability needs to be assigned to the sure event (remember that the sample space needs to be exhaustive, so surely one of the things in will happen).
The Sigma-additivity property is a bit more cumbersome. It can be proved (see below) that if sigma-additivity holds, then also the following holds:
The latter property, called finite additivity, while very similar to sigma-additivity, is easier to interpret. It says that if two events are disjoint, then the probability that either one or the other happens is equal to the sum of their individual probabilities.
For concreteness, we now make a simple example that illustrates the properties of probability.
Example Suppose that we flip a coin. The possible outcomes are either tail () or head (), that is,Consider the space of events The following assignment of probabilities satisfies the properties enumerated above:All these probabilities are between 0 and 1, so the range property is satisfied. , so the sure thing property is satisfied. Also sigma-additivity is satisfied becauseand the four couples , , , are the only four possible couples of disjoint sets.
Now that we are familiarized with the three properties of probability, a fundamental question remains to be answered: why have these properties been chosen to define probability?
Basically, it is for historical reasons. Before Andrey Kolmogorov, an eminent Russian mathematician, came up with this definition, statisticians had proposed other definitions (see the next section). These definitions had flaws, but they could all be used to prove that probability needs to satisfy the three properties above.
Kolmogorov proposed to abandon the previous definitions and instead use the three properties as a definition. This approach had already proved successful in a branch of mathematics called measure theory. Actually, Kolmogorov realized that probability was a special measure (bounded by 1) and adapted the definition of measure, which is very similar to the definition of probability given above.
In the next section we report some of the old definitions of probability. Despite their flaws, they can help to improve our understanding of the concept of probability.
This section briefly discusses some old definitions of probability. Although none of them is entirely rigorous and coherent, nor sufficient per se to clarify the meaning of probability, they all touch upon important aspects.
According to the classical definition, when all the possible outcomes of an experiment are equally likely, the probability of an event is the ratio between the number of outcomes that are favorable to the event and the total number of possible outcomes. While intuitive, this definition has two main drawbacks:
it is circular, because it uses the concept of probability to define probability: it is based on the assumption of 'equally likely' outcomes, where equally likely means "having the same probability";
it is limited in scope because it does not allow us to define probability when the possible outcomes are not all equally likely.
According to the frequentist definition, the probability of an event is the relative frequency of the event itself, observed over a large number of repetitions of the same experiment. In other words, it is the limit to which the ratio:converges when the number of repetitions of the experiment tends to infinity. Despite its intuitive appeal, also this definition has some important drawbacks:
it assumes that all probabilistic experiments can be repeated many times, which is false;
it is also somewhat circular, because it implicitly relies on a Law of Large Numbers, which can be derived only after having defined probability.
According to the subjectivist definition, the probability of an event is related to the willingness of an individual to accept bets on that event. Suppose a lottery ticket pays off 1 dollar in case the event occurs and 0 in case the event does not occur. An individual is asked to set a price for this lottery ticket, at which she must be indifferent between being a buyer or a seller of the ticket. The subjective probability of the event is defined to be equal to the price thus set by the individual. Also this definition has some drawbacks:
different individuals can set different prices, therefore preventing an objective assessment of probabilities;
the price an individual is willing to pay to participate in a lottery can be influenced by other factors that have nothing to do with probability; for example, an individual's betting behavior can be influenced by her preferences.
The following subsections discuss other mathematical properties enjoyed by probability.
Here we prove that .
Define a sequence of event as follows: The sequence is a sequence of disjoint events, because the empty set is disjoint from any other set. Then, which impliesand .
A sigma-additive function is also additive:
Define a sequence of events as follows: The sequence is a sequence of disjoint events, because the empty set is disjoint from any other set. Then,
Let be an event and its complement (i.e., the set of all elements of that do not belong to ). Then,
Note thatand that and are disjoint sets. Then, by using the sure thing property and finite additivity, we obtainwhich implies
In other words, the probability that an event does not occur is equal to one minus the probability that it occurs.
We have already seen how to compute in the special case in which and are two disjoint events. In the more general case, in which they are not necessarily disjoint, the formula is
This is proved as follows. First note thatso thatFurthermore the event can be written as follows:and the three events on the right hand side are disjoint. Thus,
If two events and are such that , then
This is easily proved by using additivity:where the latter inequality is a consequence of the fact that (by the range property of probability).
In other words, if occurs less often than , because the latter contemplates more occurrences, then the probability of must be less than the probability of .
In this section we provide entirely rigorous definitions of event and probability.
The definition of event given above is not entirely rigorous.
Often, statisticians work with probability models where some subsets of the sample space are not considered events.
This happens mainly for the following two reasons:
sometimes, the sample space is a really complicated set; to make things simpler, attention is restricted to only some subsets of the sample space;
sometimes, it is possible to coherently assign probabilities to only some subsets of the sample space; in these cases, only the subsets to which probabilities can be assigned are considered events.
Definition is a sigma-algebra on if it is a set of subsets of satisfying the following three properties:
Whole set. .
Closure under complementation. If then also (the complement is the set of all elements of that do not belong to ).
Closure under countable unions. If is a sequence of subsets of belonging to , then
Why is a space of events required to satisfy these properties?
Besides a number of mathematical reasons, it seems pretty intuitive that they must be satisfied (and indeed the best way to approach sigma-algebras for the first time is to memorize their properties and convince yourself that they are reasonable). Let us see why.
Property 1) means that the space of events must include the event "something will happen", quite a trivial requirement!
Property 2) means that if "one of the things in the set will happen" is considered an event, then also "none of the things in the set will happen" is considered an event. This is quite natural: if you are considering the possibility that an event will happen, then, by necessity, you must also be simultaneously considering the possibility that the same event will not happen.
Property 3) is a bit more complex. However, the following property, implied by 3), is probably easier to interpret:It means that if "one of the things in will happen" and "one of the things in will happen" are considered two events, then also "one of the things in or one of the things in will happen" must be considered an event. This simply means that if you are able to separately assess the possibility of two events happening, then you must be able to assess the possibility of at least one of them happening. Property 3) extends this intuitive property to countable collection of events: the extension is needed for mathematical reasons, to derive certain continuity properties of probability measures.
Let us make a final remark on terminology:
a subset of that belongs to the sigma-algebra is called measurable;
a subset of that does not belong to the sigma-algebra is said to be non-measurable.
The definition of probability given above was not entirely rigorous. Now that we have defined sigma-algebras and spaces of events, we can make it completely rigorous.
Definition Let be a sigma-algebra on the sample space . A function is a probability measure if and only if it satisfies the following two properties:
Sure thing. .
Sigma-additivity. Let be any sequence of elements of such that implies . Then,
Nothing new has been added to the definition of probability given in the previous sections. This more rigorous definition just clarifies that a probability measure is a function defined on a sigma-algebra of events. Hence, it is not possible to properly speak of probability for subsets of the sample space that do not belong to the sigma-algebra (i.e., for non-measurable subsets).
Below you can find some exercises with explained solutions.
A ball is drawn at random from an urn containing colored balls. The balls can be either red or blue (no other colors are possible). The probability of drawing a blue ball is . What is the probability of drawing a red ball?
The sample space can be represented as the union of two disjoint events and :where the event can be described as 'a red ball is drawn' and the event can be described as 'a blue ball is drawn'. Note that is the complement of :
We know , the probability of drawing a a blue ball:
We need to find , the probability of drawing a a red ball. Using the formula for the probability of a complement:
Consider a sample space comprising three possible outcomes:
Suppose the probabilities assigned to the three possible outcomes are
Can you find an event whose probability is ?
There are two events whose probability is .
The first one is
By using the formula for the probability of a union of disjoint events, we get
The second one is
By using again the formula for the probability of a union of disjoint events, we obtain
Consider a sample space comprising four possible outcomes:
Consider the three events , and defined as follows:
Suppose their probabilities are
Now, consider a fourth event defined as follows:
First note that, by additivity:
Therefore, in order to compute , we need to compute and .
is found by using additivity on :so that
is found using the fact that one minus the probability of an event is equal to the probability of its complement and the fact that :
As a consequence,
Please cite as:
Taboga, Marco (2021). "The mathematics of probability", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/probability.
Most of the learning materials found on this website are now available in a traditional textbook format.