pandas - How can I choose what to print out from a python dictionary and write it out to another file? -
i have file read in called 'peaks_ee.xpk' , have dictionary in atom name key , chemical shift value.
this sample of peaks_ee.xpk file:
label dataset sw sf 1h 1h_2 noesy_f1ef2e.nv 4807.69238281 4803.07373047 600.402832031 600.402832031 1h.l 1h.p 1h.w 1h.b 1h.e 1h.j 1h.u 1h_2.l 1h_2.p 1h_2.w 1h_2.b 1h_2.e 1h_2.j 1h_2.u vol int stat comment flag0 flag8 flag9 0 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 1 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 2 {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 3 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 4 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 5 {2.h1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 6 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 7 {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 8 {1.h1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0 9 {1.h8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.h8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
this python code:
import pandas pd result = {} text = 'fe' filename = 'fe_yellow.xpk' if text == 'ee': df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5) shift1= df["1h.p"] shift2= df["1h_2.p"] if filename == 'ee_pinkh1.xpk': mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25)) elif filename == 'ee_pinkh2.xpk': mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5)) result = df[mask] result = result[["1h.l","1h_2.l"]] tclust_atom = open("tclust.txt","a") tclust_atom.write(str(result))
the output is:
1h.l 1h_2.l 25 {5.h2'} {5.h1'} 26 {5.h2'} {5.h1'} 27 {5.h2'} {6.h5} 42 {7.h2'} {7.h1'} 43 {7.h2'} {7.h1'} 44 {7.h2'} {8.h5} 60 {9.h2'} {9.h1'} 61 {9.h2'} {9.h1'} 62 {9.h2'} {10.h5} 87 {12.h2'} {12.h1'} 88 {12.h2'} {12.h1'} 89 {12.h2'} {13.h5} 132 {18.h2'} {18.h1'} 133 {18.h2'} {18.h1'} 146 {20.h2'} {20.h1'} 147 {20.h2'} {20.h1'} 154 {21.h2'} {21.h1'} 155 {21.h2'} {21.h1'} 169 {23.h2'} {23.h1'} 170 {23.h2'} {23.h1'} 171 {23.h2'} {24.h5}
instead, want output like: atom 1 5.h2' 5.h1' atom 2 5.h2' 5.h1' atom 3 5.h2' 6.h5 atom 4 7.h2' 7.h1' atom 5 7.h2' 7.h1' atom 6 7.h2' 8.h5 atom 7 9.h2' 9.h1' atom 8 9.h2' 9.h1' atom 9 9.h2' 10.h5 atom 10 12.h2' 12.h1' atom 11 12.h2' 12.h1' atom 12 12.h2' 13.h5 atom 13 18.h2' 18.h1' atom 14 18.h2' 18.h1' atom 15 20.h2' 20.h1' atom 16 20.h2' 20.h1' atom 17 21.h2' 21.h1' atom 18 21.h2' 21.h1' atom 19 23.h2' 23.h1' atom 20 23.h2' 23.h1' atom 21 23.h2' 24.h5
so want rid of first line , rid of curly braces in file have, , want add word "atom" next each line along number (starting 1 going n)
and example, atom 1 , atom 2 same, how can print once instead of twice?
check if helps you. replace last 2 lines in code code:
for col in result.columns: result[col] = result[col].str.strip('{} ') result.drop_duplicates(keep='first', inplace=true) result = result.set_index([['atom '+str(i) in range(1,len(result)+1)]]) tclust_atom = open("tclust.txt", "a") result.to_string(tclust_atom, header=false)
the for
loop strips spaces , curly braces series in df. drop_duplicates
, name suggests, drops duplicate rows df. , set_index
replaces integer index index each entry of form 'atom #'.
Comments
Post a Comment