High performance lookup table in Fortran -
part of simulation code requires me find opacity given density , temperature. there no analytic relation this; standard method use 2d array opacity(i,j) correspond opacity density(i) , temperature(j), , run bilinear interpolation find exact opacity.
this bottleneck in group's code - each time step, interpolation routine called 100 million times different densities , temperatures, , accounts 20% of runtime. current code shown below - there tricks use improve it? use intel fortran 16, options -o3 -xavx -mcmodel=medium
function smoothopc(den, temp, ig, opc, rhot) implicit none real(kind=8), intent(in) :: den,temp integer,intent(in) :: ig real(kind=8), intent(in), dimension(1:50, 1:50, 1:52) :: opc real(kind=8), intent(in), dimension(1:50, 1:2) :: rhot real(kind=8) :: rho, te, smoothopc, r1, r2, t1, t2, & interpolation, denominator, a, b, c, d, opc11, & opc12, opc21, opc22, t, tp, r, rp integer :: rid,tid,i rho = den * 1d - 3 !g/cc te = temp / (1.6d - 19) !ev tid = -1 rid = -1 = 1, 49 r = rhot(i, 1) rp = rhot(i + 1, 1) t = rhot(i, 2) tp = rhot(i + 1, 2) if (rho .ge. r) rid = endif if (te .ge. t) tid = endif enddo r1 = rhot(rid, 1) r2 = rhot(rid + 1, 1) t1 = rhot(tid, 2) t2 = rhot(tid + 1, 2) opc11 = opc(rid, tid, ig + 4) opc12 = opc(rid, tid + 1, ig + 4) opc21 = opc(rid + 1, tid, ig + 4) opc22 = opc(rid + 1, tid + 1, ig + 4) denominator = (r2 - r1) * (t2 - t1) = r2 - rho b = rho - r1 c = t2 - te d = te - t1 interpolation = * (c * opc11 + d * opc12) + b * & (c * opc21 + d * opc22) smoothopc = interpolation / denominator return end function smoothopc
if understand correctly, doing loop through whole of rhot find indices use afterwards find values in opc.
if columns of rhot sorted, may faster write binary search instead (there overhead, have test).
also, don't quite understand condition assign rid , tid (it seem logical assign rid if r <= rho < rp). may missing how rhot built.
a trick might try: convert rho*m integer, m may instance power of 2 or 10 (multiply 1000 3 digits of precision). value rounded integer index array elements correct (or nearest) values of rid. if not correct rid, may have smaller range check. if scale not linear, may transform rho first.
another possible trick: store previous rid , tid indices. if calls follow rather continuous evolution, new indices near previous ones. if code has paralellized @ point, it's not idea introduces sequential dependences between calls.
Comments
Post a Comment