csv - How to create correlation matrix only for specific columns combination using python? -
i have 3 columns hour, factor(affect car parking), parkingspaces.i able draw correlation matrix calculation correlation among combination , want display 1 correlation matrix of 5 different files correlation among columns only.
import numpy np import pandas pd import seaborn sns import math import matplotlib.pyplot plt %matplotlib inline sns.set(style="darkgrid") creche_holiday =pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_holiday.csv") creche_reading = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_reading.csv") creche_study = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_study.csv") creche_working = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_working.csv") creche_exam = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_exam.csv") catted = pd.concat([d.reset_index(drop=true) d in [creche_working,creche_holiday,creche_reading,creche_study,creche_exam]], axis=1, keys=['working','holiday', 'reading', 'study','exam']) catted = catted.rename_axis(['creche', 'parking'], axis=1) corrmat = catted.corr() # generate mask upper triangle mask = np.zeros_like(corrmat, dtype=np.bool) mask[np.triu_indices_from(mask)] = true # set matplotlib figure f, ax = plt.subplots(figsize=(12, 11)) # generate custom diverging colormap cmap = sns.diverging_palette(220,10,as_cmap=true) #sns.heatmap(corrmat, vmax=.3, center=0,square=true, linewidths=.5, cbar_kws={"shrink": .5}) # draw heatmap mask , correct aspect ratio sns.heatmap(corrmat,fmt=".2g",annot=true,cmap=cmap,linewidths=1,cbar=true,vmin=0, vmax=1,center=0,mask=mask)
you can use drop
method of dataframe both drop rows , columns not wish plot in heat map.
consider following dataframe 4 total columns of 2 need plotted.
df = pd.dataframe(np.array([[1,2,3,4,5],[5,4,3,2,1],[3,5,6,7,8],[1,2,3,4,5]]).t) df.columns = ['value','column_to_drop','stuff','other_column_to_drop']
results in dataframe.
value column_to_drop stuff other_column_to_drop 1 5 3 1 2 4 5 2 3 3 6 3 4 2 7 4 5 1 8 5
quite want remove column_to_drop
, other_column_to_drop
final heatmap.
to need run following code. first create correlation matrix again. after creating correlation matrix drop column_to_drop , other_column_to_drop both rows , columns of correlation matrix.
corr_df=df.corr() heatmap_df=corr_df.drop(['column_to_drop','other_column_to_drop']).drop(['column_to_drop','other_column_to_drop'],axis=1)
then can create heatmap on final dataframe.
sns.heatmap(heatmap_df)
you can of course choose additional steps on heatmap_df prior plotting. creating mask not plot out same values twice.
Comments
Post a Comment