Hackerrank 1: 10 days of statistics
Posted on 03/04/2019, in Python, Mathematics.This note is only used for learning on Hackerrank  10 days of Statistics.
tocIn this post
keyboard_arrow_right
Goto this Chalenge.
Day 0 : Mean, Median, Mod, Weighted mean
 mean = mean value $\frac{1}{n}\sum_i x_i$
 median = the number at the center, if the number of elements are odd, it’s the center number, if even, it’s the mean of two center elements.
 mod = number(s) with the most number of appearances.
 Given a set of numbers, X, and corresponding set of weights, W, the weighted mean is calculated as follows

Find
median
withoutnumpy
def find_median(lst): len_lst = len(lst) if len_lst % 2 == 1: return lst[len_lst//2] else: return (lst[len_lst//21] + lst[len_lst//2])/2
 With numpy
import numpy as np from scipy import stats print(np.mean(<list>)) print(np.median(<list>)) print(int(stats.mode(<list>)[0]))
Day 1 : Quartiles, Interquartile Range, Standard Deviation
 Quartile of an ordered data set are the 3 points that split the data set into 4 groups.
 $Q_1$: the middle number between the smallest number in a data set and its median
 $Q_2$: the median ($50^{th}$ percentile) of the data set
 $Q_3$: the middle number between a data set’s median and its largest number
 Algorithm:
 If the number of elements is odd, don’t include the median for each half when seeking $Q_1, Q_2$
 If the number of elements is even, just devide into 2 halves.
 $Q_1$ is the median of first half, $Q_2$ is the median of second half.
 Get the input:
size = int(input()) numbers = list(map(int, input().split()))
 The interquartile range of an array is the difference between its first (Q1) and third (Q3) quartiles
 Output (0.9):
print("{:.1f}".format(Q3Q1))
 Expected value $\mu$ = mean of discrete random variable X.

Variance $\sigma^2$: $\sigma^2 = \dfrac{\Sigma (x_i\mu)}{n}$

Standard deviation $\sigma$: $\sigma^2 = \sqrt{\dfrac{\Sigma (x_i\mu)}{n}}$
 Use
sqrt
:import math as m
Day 2 : Basic probability
 $P(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}$
 $0\le P(A)\le 1$
 $P(A^C) = 1P(A)$
 mutually exclusive or disjoint: $A\cap B=\emptyset, P(A\cap B)=0$
 A,B are disjoint (mutually exclusive): $P(A\cup B) = P(A) + P(B)$
 A,B are independent: $P(A \cap B) = P(A)\times P(B)$
 Genreal: $\vert A\cup B\vert = \vert A\vert + \vert B\vert \vert A\cap B\vert$
Day 3 : Conditional probability
 A, B are considered to be independent if event: $P(B\vert A) = P(B)$
 $P(A\cap B) = P(B\vert A)\times P(A)$
 $P(B\vert A) = \dfrac{P(A\cap B)}{P(A)}$
 Bayes’ theorem:
 Permutations (hoán vị): take relement permutation from n elements (don’t care about order): $nPr = \dfrac{n!}{(nr)!}$
 Combinations (chỉnh hợp): take relement from n elements (care about order): $nCr = \dfrac{nPr}{r!} = \dfrac{n!}{r!(nr)!}$
Day 4 : Binomial distribution
 Tutorial link.
 Random variable X:is the real value function $X: S\to R$ in which there is an event for each interval $I\subseteq R$
 A binomial experiment (or Bernoulli trial) is a statistical experiment that has the following properties:
 The experiment consists of repeated trials.
 The trials are independent.
 The outcome of each trial is either success ($s$) or failure ($f$).
 Bernoulli Random Variable and Distribution: The sample space of a binomial experiment only contains two points, s and f. We define a Bernoulli random variable to be the random variable defined by $X(s)=1$ and $X(f)=0$. If we consider the probability of success to be p and the probability of failure to be q (where $q=1p$), then the probability mass function (PMF) of is:
or
 Binomial distribution: is the binomial probability, meaning the probability of having exactly x successes out of n trials.
 The number of successes is x.
 The total number of trials is n.
 The probability of success of 1 trial is p.
 The probability of failure of 1 trial q, where q=1p.
 Cumulative Probability (CDF): $F_X(x) = P(X\le x)$ and $P(a<X\le b) = F_X(b)F_X(a)$.
Day 4 : Geometric Distribution
 Negative Binomial Experiment: A negative binomial experiment is a statistical experiment that has the following properties:
 The experiment consists of n repeated trials.
 The trials are independent.
 The outcome of each trial is either success (s) or failure (f).
 P(s) is the same for every trial.
 The experiment continues until x successes are observed.
 If x is the number of experiments until the xth success occurs, then X is a discrete random variable called a negative binomial.
 The number of successes to be observed is x.
 The total number of trials is n.
 The probability of success of 1 trial is p.
 The probability of failure of 1 trial q, where q=1p.

$b^{\ast}(x,n,p)$ is the negative binomial probability, meaning the probability of having x1 successes after n1 trials and having x successes after n trials.
 Geometric Distribution: The geometric distribution is a special case of the negative binomial distribution that deals with the number of Bernoulli trials required to get a success (i.e., counting the number of failures before the first success). Recall that X is the number of successes in n independent Bernoulli trials, so for each i (where $1\le i \le n$):
 The geometric distribution is a negative binomial distribution where the number of successes is 1. We express this with the following formula:
Day 5 : Poisson Distribution
 A Poisson experiment is a statistical experiment that has the following properties:
 The outcome of each trial is either success or failure.
 The average number of successes ($\lambda$) that occurs in a specified region is known.
 The probability that a success will occur is proportional to the size of the region.
 The probability that a success will occur in an extremely small region is virtually zero.
 A Poisson random variable is the number of successes that result from a Poisson experiment. The probability distribution of a Poisson random variable is called a Poisson distribution:
 $e=2.71828$
 $\lambda$ is the average number of successes that occur in a specified region.
 k is the actual number of successes that occur in a specified region.
 $P(k,\lambda)$ is the Poisson probability, which is the probability of getting exactly k successes when the average number of successes is $\lambda$.
 Check examples & special case here.
 Special case: X has Poisson Distribution, $E[X] = \lambda = Var(X), Var(X) = E[X^2]  (E[X])^2$
Day 5 : Normal Distribution
 The probability density of normal distribution is:
 $\mu$ is the mean (or expectation) of the distribution. It is also equal to median and mode of the distribution.
 $\sigma^2$ is the variance.
 $\sigma$ is the standard deviation.
 In python, we can use
numpy.random.normal
(ref)  If $\mu=0, \sigma=1$ then the normal distribution is known as standard normal distribution
 Every normal distribution can be represented as standard normal distribution:
 Consider a realvalued random variable, X. The cumulative distribution function of X (or just the distribution function of X) evaluated at x is the probability that X will take a value less than or equal to x:
 The cumulative distribution function for a function with normal distribution is (erf = error function):
 In python, we can use
math.erf
function.