menu
Anh-Thi DINH

Hackerrank 1: 10 days of statistics

Posted on 03/04/2019, in Python, Mathematics.

This note is only used for learning on Hackerrank - 10 days of Statistics.

keyboard_arrow_right Goto this Chalenge.

Day 0 : Mean, Median, Mod, Weighted mean

  • mean = mean value $\frac{1}{n}\sum_i x_i$
  • median = the number at the center, if the number of elements are odd, it’s the center number, if even, it’s the mean of two center elements.
  • mod = number(s) with the most number of appearances.
  • Given a set of numbers, X, and corresponding set of weights, W, the weighted mean is calculated as follows
  • Find median without numpy

      def find_median(lst):
          len_lst = len(lst)
          if len_lst % 2 == 1:
                  return lst[len_lst//2]
          else:
                  return (lst[len_lst//2-1] + lst[len_lst//2])/2
    
  • With numpy
      import numpy as np
      from scipy import stats 
    	
      print(np.mean(<list>))
      print(np.median(<list>))
      print(int(stats.mode(<list>)[0]))
    

Day 1 : Quartiles, Interquartile Range, Standard Deviation

  • Quartile of an ordered data set are the 3 points that split the data set into 4 groups.
    • $Q_1$: the middle number between the smallest number in a data set and its median
    • $Q_2$: the median ($50^{th}$ percentile) of the data set
    • $Q_3$: the middle number between a data set’s median and its largest number
    • Algorithm:
      • If the number of elements is odd, don’t include the median for each half when seeking $Q_1, Q_2$
      • If the number of elements is even, just devide into 2 halves.
      • $Q_1$ is the median of first half, $Q_2$ is the median of second half.
  • Get the input:
      size = int(input())
      numbers = list(map(int, input().split()))
    
  • The interquartile range of an array is the difference between its first (Q1) and third (Q3) quartiles
  • Output (0.9): print("{:.1f}".format(Q3-Q1))
  • Expected value $\mu$ = mean of discrete random variable X.
  • Variance $\sigma^2$: $\sigma^2 = \dfrac{\Sigma (x_i-\mu)}{n}$

  • Standard deviation $\sigma$: $\sigma^2 = \sqrt{\dfrac{\Sigma (x_i-\mu)}{n}}$

  • Use sqrt: import math as m

Day 2 : Basic probability

  • $P(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}$
  • $0\le P(A)\le 1$
  • $P(A^C) = 1-P(A)$
  • mutually exclusive or disjoint: $A\cap B=\emptyset, P(A\cap B)=0$
  • A,B are disjoint (mutually exclusive): $P(A\cup B) = P(A) + P(B)$
  • A,B are independent: $P(A \cap B) = P(A)\times P(B)$
  • Genreal: $\vert A\cup B\vert = \vert A\vert + \vert B\vert -\vert A\cap B\vert$

Day 3 : Conditional probability

  • A, B are considered to be independent if event: $P(B\vert A) = P(B)$
  • $P(A\cap B) = P(B\vert A)\times P(A)$
  • $P(B\vert A) = \dfrac{P(A\cap B)}{P(A)}$
  • Bayes’ theorem:
  • Permutations (hoán vị): take r-element permutation from n elements (don’t care about order): $nPr = \dfrac{n!}{(n-r)!}$
  • Combinations (chỉnh hợp): take r-element from n elements (care about order): $nCr = \dfrac{nPr}{r!} = \dfrac{n!}{r!(n-r)!}$

Day 4 : Binomial distribution

  • Tutorial link.
  • Random variable X:is the real value function $X: S\to R$ in which there is an event for each interval $I\subseteq R$
  • A binomial experiment (or Bernoulli trial) is a statistical experiment that has the following properties:
    • The experiment consists of repeated trials.
    • The trials are independent.
    • The outcome of each trial is either success ($s$) or failure ($f$).
  • Bernoulli Random Variable and Distribution: The sample space of a binomial experiment only contains two points, s and f. We define a Bernoulli random variable to be the random variable defined by $X(s)=1$ and $X(f)=0$. If we consider the probability of success to be p and the probability of failure to be q (where $q=1-p$), then the probability mass function (PMF) of is:

or

  • Binomial distribution: is the binomial probability, meaning the probability of having exactly x successes out of n trials.
    • The number of successes is x.
    • The total number of trials is n.
    • The probability of success of 1 trial is p.
    • The probability of failure of 1 trial q, where q=1-p.
  • Cumulative Probability (CDF): $F_X(x) = P(X\le x)$ and $P(a<X\le b) = F_X(b)-F_X(a)$.

Day 4 : Geometric Distribution

  • Negative Binomial Experiment: A negative binomial experiment is a statistical experiment that has the following properties:
    • The experiment consists of n repeated trials.
    • The trials are independent.
    • The outcome of each trial is either success (s) or failure (f).
    • P(s) is the same for every trial.
    • The experiment continues until x successes are observed.
  • If x is the number of experiments until the xth success occurs, then X is a discrete random variable called a negative binomial.
  • The number of successes to be observed is x.
  • The total number of trials is n.
  • The probability of success of 1 trial is p.
  • The probability of failure of 1 trial q, where q=1-p.
  • $b^{\ast}(x,n,p)$ is the negative binomial probability, meaning the probability of having x-1 successes after n-1 trials and having x successes after n trials.

  • Geometric Distribution: The geometric distribution is a special case of the negative binomial distribution that deals with the number of Bernoulli trials required to get a success (i.e., counting the number of failures before the first success). Recall that X is the number of successes in n independent Bernoulli trials, so for each i (where $1\le i \le n$):
  • The geometric distribution is a negative binomial distribution where the number of successes is 1. We express this with the following formula:

Day 5 : Poisson Distribution

  • A Poisson experiment is a statistical experiment that has the following properties:
    • The outcome of each trial is either success or failure.
    • The average number of successes ($\lambda$) that occurs in a specified region is known.
    • The probability that a success will occur is proportional to the size of the region.
    • The probability that a success will occur in an extremely small region is virtually zero.
  • A Poisson random variable is the number of successes that result from a Poisson experiment. The probability distribution of a Poisson random variable is called a Poisson distribution:
  • $e=2.71828$
  • $\lambda$ is the average number of successes that occur in a specified region.
  • k is the actual number of successes that occur in a specified region.
  • $P(k,\lambda)$ is the Poisson probability, which is the probability of getting exactly k successes when the average number of successes is $\lambda$.
  • Check examples & special case here.
  • Special case: X has Poisson Distribution, $E[X] = \lambda = Var(X), Var(X) = E[X^2] - (E[X])^2$

Day 5 : Normal Distribution

  • The probability density of normal distribution is:
  • $\mu$ is the mean (or expectation) of the distribution. It is also equal to median and mode of the distribution.
  • $\sigma^2$ is the variance.
  • $\sigma$ is the standard deviation.
  • In python, we can use numpy.random.normal (ref)
  • If $\mu=0, \sigma=1$ then the normal distribution is known as standard normal distribution
  • Every normal distribution can be represented as standard normal distribution:
  • Consider a real-valued random variable, X. The cumulative distribution function of X (or just the distribution function of X) evaluated at x is the probability that X will take a value less than or equal to x:
  • The cumulative distribution function for a function with normal distribution is (erf = error function):
Top