python - Performing Logistic Regression with a large number of features? -


i have dataset 330 samples , 27 features each sample, binary class problem logistic regression.

according "rule if ten" need @ least 10 events each feature included. though, have imbalanced dataset, 20% o positive class , 80% of negative class.

that gives me 70 events, allowing approximately 7/8 features included in logistic model.

i'd evaluate features predictors, don't want hand pick features.

so suggest? should make possible 7 features combinations? should evaluate each feature alone association model , pick best ones final model?

i'm curious handling of categorical , continuous features, can mix them? if have categorical [0-1] , continuous [0-100], should normalize?

your best choice use l1 regularized logistic regression (aka lasso regression). in case you're not familiar it, algorithm automatically selects of features penalizing not lead increased accuracy (in layman terms).

you can increase/decrease regularization strength (it's parameter) till model achieved highest accuracy (or other metric) on test set or in cross-validation procedure.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -