The Poisson Probability Distribution

The Poisson distribution is popular for modeling the number of times an event occurs in an interval of time or space. A discrete random variable X is said to have a Poisson distribution with parameter u> 0, if, for k = 0, 1, 2, …, the probability mass function of X is given by:

The Poisson distribution is an appropriate model if the following assumptions are true: 1.k is the number of times an event occurs in an interval and k can take values 0, 1, 2, …. 2. The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently. 3. The average rate at which events occur is constant. 4. Two events cannot occur at exactly the same instant; instead, at each very small sub-interval, exactly one event either occurs or does not occur or the actual probability distribution is given by a binomial distribution and the number of trials is sufficiently bigger than the number of successes one is asking about. If these conditions are true, then k is a Poisson random variable, and the distribution of k is a Poisson distribution. Now let us answer the question that was asked at the beginning of the notebook. Mylie has been averaging 3 hits for every 10 times at-bat. What is the probability that she will get exactly 2 hits in her next 5 times at bat? Since the formula is:

Let us use the scipy.stats.poisson.pmf function to further drive home the concept.

In [17]:
from scipy.stats import poisson import matplotlib.pyplot as plt
The probability mass function for poisson is:
poisson.pmf(k) = exp(-mu) * mu**k / k! for k >= 0.
poisson takes mu as shape parameter(mu is the mean/expected value /variance).
The probability mass function above is defined in the “standardized” form. To shift distribution use the loc parameter. Specifically, poisson.pmf(k, mu, loc) is identically equivalent to poisson.pmf(k — loc, mu).

In [18]:
#Calculate a few first moments: mu = 1.5 mean, var, skew, kurt = poisson.stats(mu, moments='mvsk') print('Mean=%.3f,Variance=%.3f'%(mean,var) )
Mean=1.500,Variance=1.500

In [19]:
#pmf(x, mu, loc=0) Probability mass function. #Use the Probability mass function to calculate P(X=2) p= poisson.pmf(2,1.5) p
Out[19]:
0.25102143016698353
We got the same answer as above when we did it by hand. Let us display the probability mass function (pmf) for k >= 0 and < 5:

In [20]:
import numpy as np fig, ax = plt.subplots(1, 1) x = np.arange(0,5) mu = 1.5 ax.plot(x, poisson.pmf(x, mu), 'bo', ms=8, label='poisson pmf') ax.vlines(x, 0, poisson.pmf(x, mu), colors='b', lw=5, alpha=0.5) plt.show() #Freeze the distribution and display the frozen pmf: rv = poisson(mu) ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1, label='frozen pmf') ax.legend(loc='best', frameon=False) plt.show()

In [21]:
x
Out[21]:
array([0, 1, 2, 3, 4])

In [22]:
#Check accuracy of cdf and ppf: prob = poisson.cdf(x, mu) np.allclose(x, poisson.ppf(prob, mu))
Out[22]:
True

In [23]:
#Generate random numbers: import seaborn as sb r = poisson.rvs(mu, size=1000) ax = sb.distplot(r, kde=True, color='green', hist_kws={"linewidth": 25,'alpha':1}) ax.set(xlabel='X=No of Outcomes', ylabel='Probability')

Out[23]:
[Text(0, 0.5, ‘Probability’), Text(0.5, 0, ‘X=No of Outcomes’)]

Methods

Examples that violate the Poisson assumptions

1.Haight, Frank A. (1967), Handbook of the Poisson Distribution, New York, NY, USA: John Wiley & Sons, ISBN 978–0–471–33932–8
2.Koehrsen, William (2019–01–20), The Poisson Distribution and Poisson Process Explained, Towards Data Science, retrieved 2019–09–19
3.Scipy stats

The Multinomial Distribution

Multinomial Distribution

A multinomial distribution is the probability distribution of the outcomes from a multinomial experiment.

Multinomial Experiment

A multinomial experiment is a statistical experiment that has the following properties:
The experiment consists of n repeated trials.

1.)Each trial has a discrete number of possible outcomes.
2.)On any given trial, the probability that a particular outcome will occur is constant.
3.)The trials are independent; that is, the outcome of one trial does not affect the outcome of other trials.

Consider the following statistical experiment. You toss two dice three times and record the outcome on each toss. This is a multinomial experiment because:

The experiment consists of repeated trials. We toss the dice three times.
Each trial can result in a discrete number of outcomes – 2 through 12.
The probability of any outcome is constant; it does not change from one toss to the next.
The trials are independent; that is, getting a particular outcome on one trial does not affect the outcome of other trials.

Another example is that you are given a bag of marbles. Inside the bag are 5 red marbles, 4 white marbles, and 3 blue marbles. Calculate the probability that with 6 trials, you choose 3 marbles that are red, 1 marble that is white, and 2 marbles that are blue, replacing each marble after it is chosen.

Notice that this is not a binomial experiment since there are more than 2 possible outcomes. For binomial experiments, k=2 (2 outcomes). Therefore, we use the binomial experiment formula for problems involving heads or tails, yes or no, or success or failure(the keyword here is a binary outcome for each independent trial). In this problem, there are 3 possible outcomes: red, white, or blue.
Note: A binomial experiment is a special case of a multinomial experiment. Here is the main difference. With a binomial experiment, each trial can result in two – and only two – possible outcomes. With a multinomial experiment, each trial can have two or more possible outcomes.

Multinomial Formula.

Suppose a multinomial experiment consists of n trials, and each trial can result in any of k possible outcomes: E1, E2, . . . , Ek. Suppose, further, that each possible outcome can occur with probabilities p1, p2, . . . , pk. Then, the probability (P) that E1 occurs n1 times, E2 occurs n2 times, . . . , and Ek occurs nk times is:

P = ( n! / ( n1! * n2! * … nk! ) )* ( p1n1 * p2n2 * . . . * pknk )

where n = n1 + n2 + . . . + nk.

The example below illustrates how to use the multinomial formula to compute the probability of an outcome from a multinomial experiment.1

Suppose we have a bowl with 10 marbles – 2 red marbles, 3 green marbles, and 5 blue marbles. We randomly select 4 marbles from the bowl, with replacement. What is the probability of selecting 2 green marbles and 2 blue marbles?

Solution: To solve this problem, we apply the multinomial formula. We know the following:

The experiment consists of 4 trials, so n = 4.
The 4 trials produce 0 red marbles, 2 green marbles, and 2 blue marbles; so nred = 0, ngreen = 2, and nblue = 2.
On any particular trial, the probability of drawing a red, green, or blue marble is 0.2, 0.3, and 0.5, respectively. Thus, pred = 0.2, pgreen = 0.3, and pblue = 0.5
We plug these inputs into the multinomial formula, as shown below:

P = ( n! / ( n1! * n2! * … nk! ) )* ( p1n1 * p2n2 * . . . * pknk )

P = ( 4! / ( 0! * 2! * 2! ) ) * ( (0.2)0 * (0.3)2 * (0.5)2)

P = 0.135

Thus, if we draw 4 marbles with replacement from the bowl, the probability of drawing 0 red marbles, 2 green marbles, and 2 blue marbles is 0.135.

In Austria, 30% of the population has a blood type of O+, 33% has A+, 12% has B+, 6% has AB+, 7% has O-, 8% has A-, 3% has B-, and 1% has AB-. If 15 Austrian citizens are chosen at random, what is the probability that 3 have a blood type of O+, 2 have A+, 3 have B+, 2 have AB+, 1 has O-, 2 have A-, 1 has B-, and 1 has AB-?2

n=15 (15 trials)
p1= (probability of O+)=0.30 
p2=(probability of A+)=0.33 
p3=(probability of B+)=0.12 
p4=(probability of AB+)=0.06 
p5=(probability of O-)=0.07
p6=(probability of A-)=0.08 
p7=(probability of B-)=0.03 
p8=(probability of AB-)=0.01
n1=3 (3 O+) 
n2=2 (2 A+)
n3=3 (3 B+)
n4=2 (2 AB+)
n5=1 (1 O-)
n6=2 (2 A-)
n7=1 (1 B-)
n8=1 (1 AB-)
k=8 (8 possibilities)

P = ( n! / ( n1! * n2! * … nk! ) )* ( p1n1 * p2n2 * . . . * pknk )

P=(15! / (3!2!3!2!1!2!1!1!))×(0.303 x 0.332 x 0.123 x 0.062 x0.071 x0.082 x0.031 x0.011

P=0.000011

Therefore, if 15 Austrian citizens are chosen at random, the probability that 3 have a blood type of O+, 2 have A+, 3 have B+, 2 have AB+, 1 has O-, 2 have A-, 1 has B-, and 1 has AB- is 0.0011%.

scipy.stats.multinomial
scipy.stats.multinomial(n, p, seed=None) =
A multinomial random variable.

Parameters
x: array_like
Quantiles, with the last axis of x denoting the components.

n: int
Number of trials

p: array_like
Probability of a trial falling into each category; should sum to 1

random_state: None or int or np.random.RandomState instance, optional
If int or RandomState, use it for drawing the random variates. If None (or np.random), the global np.random state is used. Default is None.

Notes

n should be a positive integer.
Each element of p should be in the interval [0,1] and the elements should sum to 1(probabilities for the events). If they do not sum to 1, the last element of the p array is not used and is replaced with the remaining probability left over from the earlier elements.

Alternatively, the object may be called (as a function) to fix the n and p parameters, returning a “frozen” multinomial random variable:

The probability mass function for multinomial is

f(x) = ( n! / x1!…..xk!) * (px1….. pxk)

supported on x =(x1….xk) where each xi is a nonnegative integer and their sum is n .

I will use the example above:
In Austria, 30% of the population has a blood type of O+, 33% has A+, 12% has B+, 6% has AB+, 7% has O-, 8% has A-, 3% has B-, and 1% has AB-. If 15 Austrian citizens are chosen at random, what is the probability that 3 have a blood type of O+, 2 have A+, 3 have B+, 2 have AB+, 1 has O-, 2 have A-, 1 has B-, and 1 has AB-?2

n=15 (15 trials)
p1= (probability of O+)=0.30 
p2=(probability of A+)=0.33 
p3=(probability of B+)=0.12 
p4=(probability of AB+)=0.06 
p5=(probability of O-)=0.07
p6=(probability of A-)=0.08 
p7=(probability of B-)=0.03 
p8=(probability of AB-)=0.01
n1=3 (3 O+) 
n2=2 (2 A+)
n3=3 (3 B+)
n4=2 (2 AB+)
n5=1 (1 O-)
n6=2 (2 A-)
n7=1 (1 B-)
n8=1 (1 AB-)
k=8 (8 possibilities)

from scipy.stats import multinomial
rv = multinomial.pmf([3,2,3,2,1,2,1,1],15, [0.3, 0.33, 0.12,0.06,0.07,0.08,0.03,0.01]) 
# multinomial.pmf(x,n,p)
rv


1.1162058001298526e-05

Same as the answer above right?

REFERENCES

1.The Statrek Blog
2.The CK12 Foundation
3.Scipy.stats

A BEGINNER’S JOURNEY

Technology

A Beginner’s Journey

Hello Readers,

I have been delaying this post for aeons(eyerolls here) but you get my drift. I have so many topics to write about that I actually did not write anything(I know,classic,right).
For now I have decided to write about my forays into the world of software-development.

It started last year with I taking a MITx introductory course in Python taught by Professor Eric Grimmson.At first,it was hard but the prize of a certificate and the fact that I was solving problems in Python, writing my own code was fascinating.Three months rolled by and I finished the course.
I was a bit tired so I started playing with HTML and some CSS but I was not serious about proper software development until May this year as I was focused on getting a medical job in another country,which however fell through.
I was demoralized at first, then I stumbled unto the WorldQuant Website which piqued my interest in Data Science.I applied quickly inspite of my daughter screaming into my ear for attention. I forgot about it and decided to explore one of EDx courses in Data Science(UCSanDiego) which helped smoothed the learning curve.
I found that Prof. Eric had given me a solid background in Python ,sharpened my thinking skills and re-ignited my love for mathematics(why I became a medical doctor is another story).
By mid-May, the Worldquant Module in Data Science kicked off as I was accepted and oh boy I loved every minute of the course,thanks to Stack Overflow, Visualize Python and of course Pandas documentation.It was in this course I learned to read documentation.
Since then ,I use the documentation on a language or a library as an accelerated learning tool.
The course was done and dusted in two months but I knew I needed to practise so I headed to Kaggle to do just that.

Right about mid-July ,I applied for the Andela/Google/Pluralsight 4.0 program. I chose the Mobile Web track.At first, Pluralsight rolled out videos on advanced concepts in HTML and with my mouth agape began my introduction to Javascript.
I hated being gobsmacked like that so I headed to Freecodecamp after some digging and started a anvalanche through the HTML,CSS and Javascript sessions. However, I felt I had not really gained a real word working knowledge of these languages so I started a project of buiding a website which helped a lot and by August ,we had a challenge from which I learnt a lot(By then Pluralsight ‘had received some sense’ and rollled out beginner friendly concise videos on three languages which made you beg for more ,only for you to be redirected back with a message’This course is not part of your subscription’,business strategy right?)
I had to defer WorldQuant module two in Data Science to end of September as I wanted to devote more time to web development.

It was in August that I discovered Scrimba, a site that helped me to understand responsive web design,I had watched some Udacity and Pluralsight videos but they could not match the simplicity ,clarity and conciseness of Scimba’s videos
I had also began my foray into Node js but the pluralsight videos available threw me off as they were using Mac so I searched around a bit and found Pedro Mercado’s Node js videos on Noobcoder.com and that made want to learn all about backend.
I had to prioritize my time as I had started Andrew Ng class on Machine Learning.I wanted to quit as he pointed out that one had to use Octave /MATLAB instead of Python.I groaned inwardly ‘Oh not another language’ but I was sold quickly to Octave after my linear Algebre review(A big fat thank you to Prof Ng) and was introduced to vectorization(A mother of two partially homeschooled kids got no time to write clunky code,I am looking at you Python(just joking)).I am in week 3 and I am determined to finish the course at my own pace

You might be wondering why I am combining Data Science with Web Development. Web sites are repositories of data which can be mungled by data science tools ,you need databases stored on backend servers and you need a backend language like Node js ,PHP ,Django framework.
I chose Node js because it is just Javascript, great packages via npm and works well with Nosql databases so a win -win. I understand I have a lot to learn but with persistence, commiting to code everyday,I know by next year,I will be confident enough to apply for jobs.
In the next post,I will be writing about the Google Developer platform and how it can help you cement your skill and knowledge base of Web-based Languages.Keep coding and buiding projects

Create your website at WordPress.com
Get started