large data volumes - Slowly Increasing Save Time When Saving As in Excel -


ok, stack overflow, far can tell, one's doozy.

i have created macro enabled excel file that, when run, performs following actions (high level):

  1. user selects template file (which macro enabled) via file dialog
  2. user selects data files (which not macro enabled) via file dialog
  3. macro steps through data files and, 1 one, opens them, formats data, migrates data new worksheet in intermediary workbook, , closes data file without saving it
  4. once files have been looped through, intermediary workbook saved, kept open
  5. once data files have been looped through, each sheet of intermediary workbook looped through, data in current worksheet transferred template file, , template file saved new, uniquely labeled file. 1 row of data in now-data-containing file copied summary sheet

(it's bit more complicated that, important aspects far can tell)

here's problem; number of data files being selected in thousands (so far biggest run we've attempted 4000 files). macro progresses, time takes these files save gets steadily longer. starts @ 5 seconds, end of files taking 5 minutes save.

the clue have there's iteration feature i've added that, once data files have been looped through, closes template file , opens new instance of different settings, , starts process on again. causes save time go normal , start growing again. summary file saved , closed during step, , new 1 opened new iteration.

i've considered closing , reopening template file every hundred data files or so, , i'll implement if have to, i'd rather proper solution problem band-aid approach. if open , close template file every time, avoid time problem, macro becomes wildly unstable, crash @ random points during run (but sometimes).

this on computer isolated internet or kind of network , saving solid state drive (we've tried control quite few variables).

anyhow, i'm pretty stumped, suggestions welcome!

option explicit  public sub example()     dim trial integer, trials integer, dataset integer     dim trialchecker boolean     dim starttime double, waittime double     dim startdate date     dim filesavename string     dim copieddatarange range     dim summaryruntimes worksheet, calcs worksheet, cutoffsshifts worksheet     dim sheetobjects() worksheet     dim intermediaryworkbook workbook, summary workbook, template workbook      application.screenupdating = false     application.calculation = xlcalculationmanual      'the 1 , trials set lbound , ubound funcitons, premise same     trial = 1 trials         workbooks.add         set summary = activeworkbook         'i use 1 sheet keep track of how long different parts of code take run         set summaryruntimes = summary.worksheets(1)         summaryruntimes.name = "run times"         summaryruntimes.cells(1, 1).value = "id"         summaryruntimes.cells(1, 2).value = "data copy time (s)"         summaryruntimes.cells(1, 3).value = "formula copy , calc time (s)"         summaryruntimes.cells(1, 4).value = "summary copy time (s)"         summaryruntimes.cells(1, 5).value = "save , cleanup time (s)"          'sheetnames defined elsewhere in code (it's global variable right now. intend change later).         'it's array of strings 6 elements.         counter = lbound(sheetnames) ubound(sheetnames)             summary.worksheets.add             summary.activesheet.name = sheetnames(counter)         next counter          'again, templatelocation defined elsewhere. it's string grabbed filedialog         workbooks.open (templatelocation)         set template = activeworkbook         set calcs = template.sheets("calcs")         set cutoffsshifts = template.sheets("log cutoffs & shifts")          'sheetobjects used convenient reference various sheets in template file. found         'it cleaned code bit. might it's unnecessary.         counter = lbound(sheetnames) ubound(sheetnames)             set sheetobjects(counter) = template.sheets(sheetnames(counter))         next counter          'this parameters given trial set in template file. trialchecker set elsewhere         '(it checks yes/no dropdown in original spreadsheet). parameteraddresses range that's grabbed         'table object in original spreadsheet , contains these parameters go in template file. these         'will not change depending on trial, column = 1. trialparameters in same table, ,         'parameters themselves. these depend on trial, , column equal trial number         if trialchecker = true             counter = lbound(parameteraddresses) ubound(parameteraddresses)                 cutoffsshifts.range(parameteraddresses(counter, 1)).value = trialparameters(counter, trial)             next counter         end if          dataset = 1 intermediaryworkbook.worksheets.count - 1             'this start timers             starttime = timer             startdate = date              'this data copied intermediary file template. it's 5             'columns wide, can number of rows. summaryruntimes statement merely grabbing unique             'identifier of given worksheet             intermediaryworkbook                 set copieddatarange = calcs.range("$a$3:$e$" & .worksheets(counter).usedrange.rows.count + 1)                 copieddatarange.value = intermediaryworkbook.worksheets(counter).range("$a$2:$e$" & .worksheets(counter).usedrange.rows.count).value                 summaryruntimes.cells(counter + 1, 1) = calcs.cells(3, 1).value             end              'first timestamp             summaryruntimes.cells(counter + 1, 2) = cstr(round(86400 * (date - startdate) + timer - starttime, 1))             starttime = timer             startdate = date              'this statement copies down formulas go data (which aobut 100 columsn worth of formuals).             'throughout process, calculation set manual, calculation manually triggered here (don't ask             'me why twice. if recall, it's because pivot tables weird)             set copiedformularange = calcs.range("$f$3:$kl$" & calcs.usedrange.rows.count)             copiedformularange.filldown             application.calculate             template.refreshall             application.calculate              'second timestamp             summaryruntimes.cells(counter + 1, 3) = cstr(round(86400 * (date - startdate) + timer - starttime, 1))             starttime = timer             startdate = date              'this separate function copies data template file summary sheet.             'i know can't see code, know copies 6 sets of 7 cells,             'as far can tell, it's not causing problem. timestamp supports idea,             'it's consistent , short             call summarypopulate(summary, sheetnames, sheetobjects, r)             r = r + 1              'third timestamp             summaryruntimes.cells(counter + 1, 4) = cstr(round(86400 * (date - startdate) + timer - starttime, 1))             starttime = timer             startdate = date              'these following few lines meant save template file new file. mentioned,             'things bogged down. filenamesuffix string set via inputbox. trialnames set via table object             'mentioned above, , array of strings.             application.displayalerts = false              if trialchecker = true                 filesavename = folderlocation & "\" & replace(calcs.cells(3, 1).value, "/", " ") & " ooip " & filenamesuffix & " - " & trialnames(1, trial) & ".xlsm"             else                 filesavename = folderlocation & "\" & replace(calcs.cells(3, 1).value, "/", " ") & " ooip " & filenamesuffix & ".xlsm"             end if              template.saveas filename:=filesavename, conflictresolution:=xllocalsessionchanges              application.displayalerts = true              'this part clears copied data , formulas. added 2 set nothing lines in hopes             'solve problem, doesn't seem             copieddatarange.clearcontents             copieddatarange.offset(1, 0).rows.delete             set copieddatarange = nothing             set copiedformularange = nothing              'fourth , final timestamp             summaryruntimes.cells(counter + 1, 5) = cstr(round(86400 * (date - startdate) + timer - starttime, 1))              'it seems run bit better if there's wait line here, i'm not sure why. waittime             'is grabbed original worksheet, , double             application.wait (timeserial(hour(now()), minute(now()), second(now()) + waittime))          next dataset          'this saves summary file , closes , template file. process starts anew.         'this seems key resetting reduces run times.         if trialchecker = true             summary.saveas filename:=folderlocation & "\" & "ooip summary " & filenamesuffix & " - " & trialnames(1, trial) & ".xlsx"         else             summary.saveas filename:=folderlocation & "\" & "ooip summary " & filenamesuffix & ".xlsx"         end if         template.close false         summary.close false     next trial      application.screenupdating = true     application.calculation = xlcalculationautomatic      intermediaryworkbook.close false end sub 

sorry post answer, isn't, need bit of space here. went through code, found intermediateworkbook isn't defined , decided defining wouldn't make difference. feel have done might think of doing , study of code not discover haven't discovered. therefore solution in first separating processes , joining them again in different way - or perhaps not. key "solution": if parts can't joined, let them run separately. therefore task set create separate parts.

part 1 described in points 2 4, i.e. creation of intermediate workbook. haven't stated why user must select template before workbook created if has bearing template can opened , closed. important part of suggestion end process when intermediate workbook saved. close it. close template. , project done - part 1.

part 2 open intermediate file , loop through data, creating new files. each of these files based on template. may have provide code enable selection of correct template if there several choose form , if data in intermediate workbook don't support automatic selection. in process have intermediate workbook open plus 1 new file @ time. each file closed before new 1 created. @ end of process intermediate file closed well. (btw occurs me handling of template might cause of problem. in process description template never opened. instead, new workbooks created based on it, , inventor's design.)

part 3 create or open summary file. open each of newly created workbooks , copy 1 row summary. close each workbook , open next. @ end of process close summary workbook.

joining parts: frankly, try integrate part 3 part 2 outset. don't believe having 1 workbook open make difference. if split tasks.

your 2 or 3 separate procedures should in add-in perhaps or workbook nothing hold code (adding 1 more open workbook 2 or 3 others - excel can handle easily). code in workbook add sub calls 2 or 3 procs in turn, 1 after other.

in program structure problem might resurface in part 2 when might take progressively more time save each new workbook. if happens nature of problem have shifted , should easier understand and, hopefully, easier solve.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -