classification - Can a machine learning model provide information about mean and standard deviation of data on which it was trained? -
consider parametric binary classifier (such logistic regression, svm etc.) trained on dataset (say containing 2 features e.g. blood pressure , cholesterol level). the dataset thrown away , trained model can used black box (no tweaks , inside information can gathered trained model). set of data points can provided , labels predicted.
is possible information mean and/or standard deviation and/or range of features of dataset on model trained? if yes, how so? , if no, why can't we?
thank response! :)
svm not provide information data statistics, maximum margin classifier , finds best separating hyperplane between 2 datasets in feature space, linear combination of "support vectors". if use kernel functions, combination in kernel space, not in original feature space. svm not have straightforward probabilistic interpretation whatsoever.
logistic regression discriminative classifer , models conditional probability p (y|x,w) y label, x data , w features. after maximum likelihood training left w , again discriminator (hyperplane) in feature space, don't have features again.
the following can considered. use gaussian classifier. assume class produced prior class probability p (y). class conditional density p (x|y,w) produces data. bayes rule, have: p (y|x,w) = (p (y)p (x|y,w))/p (x). if define class conditional density p (x|y,w) gaussian, parameter set w consists of mean vector m , covariance matrix c of x, assuming being produced class y. remember that, work based on assumption current data vector belongs specific class. conditioned on w, better option mean vector: e [x|w]. expectation of x respect p (x|w). comes down weighted average of mean vectors class y=0 , y=1, respect prior class probabilities. same should work covariance well, needs derived properly, not %100 sure right now.
Comments
Post a Comment