T-test in Python

Yeju Ham
2 min readMar 11, 2021
  1. np.random.binomial
np.random.binomial(n = 1, p = 0.5, size = 10)result = array([1, 0, 0, 1, 0, 0, 1, 1, 0, 1])
#you flip One coin and the odd of getting a front is 50. You try it 100 times.

2. P-value

There are two conditions to reject null hypothesis.

  • T-value >
  • P-value < alpha(defalut =0.05)

3. One sample t-test : stats.ttest_1samp(data, expected mean)

from scipy import stats

# ttest_1samp 함수의 파라미터 1) Sample 데이터, 2) 비교하려는 값(expected value(mean))

stats.ttest_1samp(df. 0.5)

-scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate', alternative='two-sided')[source]

Calculate the T-test for the mean of ONE group of scores.

This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.

df가 비교하려는 평균과 같다는게 null hypothesis 로 인식한다.

reference :

4. two sample t-test : stats.ttest_ind(data1, data2)

1. two-tailed & two sample t-test#H0 : df1.mean() = df2.mean()
#H1 : df1.mean() != df2.mean()
print(np.mean(df1))
print(np.mean(df2))

stats.ttest_ind(df1, df2)
Ttest_indResult(statistic=0.4629256014492562, pvalue=0.6455096880085703)#conclusion
pvalue is over 0.1 and you cannot conclude that you can reject the null hypothesis. In other words, there are enough evidence to support df1.mean() is equivalent to df2.mean()
2. one-tailed & two sample t-test#H0 : mean. df2 = mean.df3
#H1 : mean.df2 > mean.df3
stats.ttest_ind(df2, df3, equal_var = False, alternative = "greater")
#Ttest_indResult(statistic=0.4629256014492563, pvalue=0.322803351843706)


#conclusion
There are no enough evidence to reject HO. In other words, you cannot refuse that df2 and df3 has the equivalent mean.

scipy.stats.ttest_ind

Calculate the T-test for the means of two independent samples of scores.

This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

  • euql_var : True(defalut) means that 2 samples assumes equal population variances.

So, if they have different variances each other, put False here.

  • alternative : ‘two-sided’(defalut), ‘less’(one-sided), ‘greater’(one-sided)

So, if you are doing a one- sided t-test, you should remember not to forget less or greater.

  • reference

: scipy.stats.ttest_ind — SciPy v1.6.1 Reference Guide

5. np.random.choice

np_array = np.random.choice(df1, size = 10)RESULT : choice random 10 things out of df1

6. np.unique(df)

np.unique(df)#removing all common values and return the unique values. 

<REVIEW>

import pandas as pd
import numpy as np
from scipy import stats
!pip install --upgrade scipy

df= pd.read_csv('url', sep = '\t', skiprows = 1)
df= df.replace({'-' : 0})
df = df.drop([0,26,27]) #drop the rows index number 0,26,27
df1 = pd.to_numeric(tree['jelly'].str.replace(',',''))
#data 'jelly' shows the sales volum with thousand(,) merk. Removing , and converting str to numeric.
# t-test of the hypothesis, there are about 400 average jelly sales.
print(df1.mean())
#print(np.mean(df1))
print(stats.ttest_1samp(df1, 400))
stats.ttest_1samp(df1,400).pvalue

<learned>

df.drop([0,26,27]) = df.iloc[1:26]

  • numpy reference

: https://rfriend.tistory.com/284

--

--

Yeju Ham

learner, writer, traveler, data science beginner with the whole passion