python - Is it good to normalization/standardization data having large number of features with zeros -

April 15, 2015

i'm having data around 60 features , zeros of time in training data 2-3 cols may have values( precise perf log data). however, test data have values in other columns.

i've done normalization/standardization(tried both separately) , feed pca/svd(tried both separately). used these features in fit model but, giving inaccurate results.

whereas, if skip normalization/standardization step , directly feed data pca/svd , model, giving accurate results(almost above 90% accuracy).

p.s.: i've anomaly detection using isolation forest algo.

why these results varying?

normalization , standarization (depending on source used equivalently, i'm not sure mean each 1 in case, it's not important) general recommendation works in problems data more or less homogeneously distributed. anomaly detection is, definition, not kind of problem. if have data set of examples belong class a , few belong class b, possible (if not necessary) sparse features (features zero) discriminative problem. normalizing them turn them 0 or zero, making hard classifier (or pca/svd) grasp importance. not unreasonable better accuracy if skip normalization, , shouldn't feel doing "wrong" because "supposed it"

i don't have experience anomaly detection, have unbalanced data sets. consider form of "weighted normalization", computation of mean , variance of each feature weighted value inversely proportional number of examples in class (e.g. examples_a ^ alpha / (examples_a ^ alpha + examples_b ^ alpha), alpha small negative number). if sparse features have different scales (e.g. 1 0 in 90% of cases , 3 in 10% of cases , 0 in 90% of cases , 80 in 10% of cases), scale them common range (e.g. [0, 1]).

in case, said, not apply techniques because supposed work. if doesn't work problem or particular dataset, rightful not use (and trying understand why doesn't work may yield useful insights).

Search This Blog

Insert

python - Is it good to normalization/standardization data having large number of features with zeros -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -