This is very basic things that everyone begins the Pandas should know.
- open the excel data
import pandas as pd
url = 'https://ds-lecture-data.s3.ap-northeast-2.amazonaws.com/stocks/Travel.xlsx'
df1 = pd.read_excel(url, sheet_name = '008770 ')
df2 = pd.read_excel(url, sheet_name = '035250')
df1 = df1.T
new_header = df1.iloc[0]
df1 = df1[1:]
df1.columns = new_header
df2 = df2.T
header_name = df2.iloc[0]
df2 = df2[1:]
df2.columns = header_name
- open the excel file : df.read_excel(“ url ”, sheet_name= “ ”)
- transpose the row and column : df.T / df.transpose()
- when there is one more line as like (0 1 2 …) and you want to remove it : iloc[0]
2. check the data
df1.shape
df2.shape
df1.isnull().sum()
df2.isnull().sum()
df1 = df1.fillna(0)
df2 = df2.fillna(0)
!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm ~/.cache/matplotlib -rf
import matplotlib.pyplot as plt
plt.rc('font', family='NanumBarunGothic')
df1['FCF'].plot.bar()
df2['FCF'].plot.bar()
- check the dimension of the data : df.shape
- find where is N/A and replace it to 0
→ df.insull() : True/False, when it is N/A it returns True. No N/A = False
→ df.insull().sum() : summarize all True by each groups in the row.
- replace N/A to certain number : df.fillna(‘number’)
then N/A is replaced to 0. So there is no N/A or blank in this data no more. It is represented as 0
3. df[‘ — ‘].plot.bar()
df1['FCF'].plot.bar()
extract a column FCF and visualize it with the bar plot.
4. python export csv
df1_t.to_csv('df1.csv',index =False)
df2_t.to_csv('df2.csv',index= False)
df.to_csv(‘file_name’, index = False)