r - Filtering multiple csv files while importing into data frame -
i have large number of csv files want read r. column headings in csvs same. want import rows each file data frame variable within given range (above min threshold & below max threshold), e.g.
v1 v2 v3 1 x q 2 2 c w 4 3 v e 5 4 b r 7
filtering v3 (v3>2 & v3<7) should results in:
v1 v2 v3 1 c w 4 2 v e 5
so fare import data csvs 1 data frame , filtering:
#read data files filenames <- list.files(path = workdir) mergedfiles <- do.call("rbind", sapply(filenames, read.csv, simplify = false)) fileid <- row.names(mergedfiles) fileid <- gsub(".csv.*", "", fileid) #combining data file ids combfiles=cbind(fileid, mergedfiles) #filtering data according criteria resultfile <- combfiles[combfiles$v3 > min & combfiles$v3 < max, ]
i rather apply filter while importing each single csv file data frame. assume loop best way of doing it, not sure how. appreciate suggestion.
edit
after testing suggestion mnel, worked, ended different solution:
filenames = list.files(path = workdir) mzlist = list() for(i in 1:length(filenames)){ tempdata = read.csv(filenames[i]) mz.idx = which(tempdata[ ,1] > minmz & tempdata[ ,1] < maxmz) mz1 = tempdata[mz.idx, ] mzlist[[i]] = data.frame(mz1, filename = rep(filenames[i], length(mz.idx))) } resultfile = do.call("rbind", mzlist)
thanks suggestions!
here approach using data.table
allow use fread
(which faster read.csv
) , rbindlist
superfast implementation of do.call(rbind, list(..))
perfect situation. has function between
library(data.table) filenames <- list.files(path = workdir) alldata <- rbindlist(lapply(filenames, function(x,mon,max) { xx <- fread(x, sep = ',') xx[, fileid := gsub(".csv.*", "", x)] xx[between(v3, lower=min, upper = max, incbounds = false)] }, min = 2, max = 3))
if individual files large , v1
integer values might worth setting v3
key using binary search, may quicker import , run filtering.
Comments
Post a Comment