High performance lookup table in Fortran -
part of simulation code requires me find opacity given density , temperature. there no analytic relation this; standard method use 2d array opacity(i,j)
correspond opacity density(i)
, temperature(j)
, , run bilinear interpolation find exact opacity.
this bottleneck in group's code - each time step, interpolation routine called 100 million times different densities , temperatures, , accounts 20% of runtime. current code shown below - there tricks use improve it? use intel fortran 16, options -o3 -xavx -mcmodel=medium
function smoothopc(den, temp, ig, opc, rhot) implicit none real(kind=8), intent(in) :: den,temp integer,intent(in) :: ig real(kind=8), intent(in), dimension(1:50, 1:50, 1:52) :: opc real(kind=8), intent(in), dimension(1:50, 1:2) :: rhot real(kind=8) :: rho, te, smoothopc, r1, r2, t1, t2, & interpolation, denominator, a, b, c, d, opc11, & opc12, opc21, opc22, t, tp, r, rp integer :: rid,tid,i rho = den * 1d - 3 !g/cc te = temp / (1.6d - 19) !ev tid = -1 rid = -1 = 1, 49 r = rhot(i, 1) rp = rhot(i + 1, 1) t = rhot(i, 2) tp = rhot(i + 1, 2) if (rho .ge. r) rid = endif if (te .ge. t) tid = endif enddo r1 = rhot(rid, 1) r2 = rhot(rid + 1, 1) t1 = rhot(tid, 2) t2 = rhot(tid + 1, 2) opc11 = opc(rid, tid, ig + 4) opc12 = opc(rid, tid + 1, ig + 4) opc21 = opc(rid + 1, tid, ig + 4) opc22 = opc(rid + 1, tid + 1, ig + 4) denominator = (r2 - r1) * (t2 - t1) = r2 - rho b = rho - r1 c = t2 - te d = te - t1 interpolation = * (c * opc11 + d * opc12) + b * & (c * opc21 + d * opc22) smoothopc = interpolation / denominator return end function smoothopc
if understand correctly, doing loop through whole of rhot
find indices use afterwards find values in opc
.
if columns of rhot
sorted, may faster write binary search instead (there overhead, have test).
also, don't quite understand condition assign rid
, tid
(it seem logical assign rid if r <= rho < rp
). may missing how rhot
built.
a trick might try: convert rho*m
integer, m may instance power of 2 or 10 (multiply 1000 3 digits of precision). value rounded integer index array elements correct (or nearest) values of rid
. if not correct rid
, may have smaller range check. if scale not linear, may transform rho
first.
another possible trick: store previous rid
, tid
indices. if calls follow rather continuous evolution, new indices near previous ones. if code has paralellized @ point, it's not idea introduces sequential dependences between calls.
Comments
Post a Comment