csv - How to create correlation matrix only for specific columns combination using python? -

April 15, 2011

i have 3 columns hour, factor(affect car parking), parkingspaces.i able draw correlation matrix calculation correlation among combination , want display 1 correlation matrix of 5 different files correlation among columns only.

import numpy np  import pandas pd import seaborn sns import math import matplotlib.pyplot plt %matplotlib inline sns.set(style="darkgrid")  creche_holiday =pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_holiday.csv") creche_reading = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_reading.csv")   creche_study = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_study.csv")   creche_working = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_working.csv")   creche_exam = pd.read_csv("d:\data analysis\practicum\dcu car parking data\new folder\creche_exam.csv")   catted = pd.concat([d.reset_index(drop=true) d in [creche_working,creche_holiday,creche_reading,creche_study,creche_exam]],                    axis=1, keys=['working','holiday', 'reading', 'study','exam'])  catted = catted.rename_axis(['creche', 'parking'], axis=1)  corrmat = catted.corr() # generate mask upper triangle mask = np.zeros_like(corrmat, dtype=np.bool) mask[np.triu_indices_from(mask)] = true # set matplotlib figure f, ax = plt.subplots(figsize=(12, 11))  # generate custom diverging colormap cmap = sns.diverging_palette(220,10,as_cmap=true)  #sns.heatmap(corrmat, vmax=.3, center=0,square=true, linewidths=.5, cbar_kws={"shrink": .5}) # draw heatmap mask , correct aspect ratio sns.heatmap(corrmat,fmt=".2g",annot=true,cmap=cmap,linewidths=1,cbar=true,vmin=0, vmax=1,center=0,mask=mask)

you can use drop method of dataframe both drop rows , columns not wish plot in heat map.

consider following dataframe 4 total columns of 2 need plotted.

df = pd.dataframe(np.array([[1,2,3,4,5],[5,4,3,2,1],[3,5,6,7,8],[1,2,3,4,5]]).t) df.columns = ['value','column_to_drop','stuff','other_column_to_drop']

results in dataframe.

value   column_to_drop  stuff   other_column_to_drop 1   5   3   1 2   4   5   2 3   3   6   3 4   2   7   4 5   1   8   5

quite want remove column_to_drop , other_column_to_drop final heatmap.

to need run following code. first create correlation matrix again. after creating correlation matrix drop column_to_drop , other_column_to_drop both rows , columns of correlation matrix.

corr_df=df.corr() heatmap_df=corr_df.drop(['column_to_drop','other_column_to_drop']).drop(['column_to_drop','other_column_to_drop'],axis=1)

then can create heatmap on final dataframe.

sns.heatmap(heatmap_df)

resulting in heatmap.

you can of course choose additional steps on heatmap_df prior plotting. creating mask not plot out same values twice.

Search This Blog

Insert

csv - How to create correlation matrix only for specific columns combination using python? -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -