apache spark - PYSPARK: how to get weights from CrossValidatorModel? -
i have trained logistic regression model using cross validation using following code https://spark.apache.org/docs/2.1.0/ml-tuning.html
now want weights , intercept, error:
attributeerror: 'crossvalidatormodel' object has no attribute 'weights'
how can these attributes?
*the same problem (trainingsummary = cvmodel.summary)
from pyspark.ml import pipeline pyspark.ml.classification import logisticregression pyspark.ml.evaluation import binaryclassificationevaluator pyspark.ml.feature import hashingtf, tokenizer pyspark.ml.tuning import crossvalidator, paramgridbuilder # prepare training documents, labeled. training = spark.createdataframe([ (0, "a b c d e spark", 1.0), (1, "b d", 0.0), (2, "spark f g h", 1.0), (3, "hadoop mapreduce", 0.0), (4, "b spark who", 1.0), (5, "g d y", 0.0), (6, "spark fly", 1.0), (7, "was mapreduce", 0.0), (8, "e spark program", 1.0), (9, "a e c l", 0.0), (10, "spark compile", 1.0), (11, "hadoop software", 0.0) ], ["id", "text", "label"]) # configure ml pipeline, consists of tree stages: tokenizer, hashingtf, , lr. tokenizer = tokenizer(inputcol="text", outputcol="words") hashingtf = hashingtf(inputcol=tokenizer.getoutputcol(), outputcol="features") lr = logisticregression(maxiter=10) pipeline = pipeline(stages=[tokenizer, hashingtf, lr]) # treat pipeline estimator, wrapping in crossvalidator instance. # allow jointly choose parameters pipeline stages. # crossvalidator requires estimator, set of estimator parammaps, , evaluator. # use paramgridbuilder construct grid of parameters search over. # 3 values hashingtf.numfeatures , 2 values lr.regparam, # grid have 3 x 2 = 6 parameter settings crossvalidator choose from. paramgrid = paramgridbuilder() \ .addgrid(hashingtf.numfeatures, [10, 100, 1000]) \ .addgrid(lr.regparam, [0.1, 0.01]) \ .build() crossval = crossvalidator(estimator=pipeline, estimatorparammaps=paramgrid, evaluator=binaryclassificationevaluator(), numfolds=2) # use 3+ folds in practice # run cross-validation, , choose best set of parameters. cvmodel = crossval.fit(training) # prepare test documents, unlabeled. test = spark.createdataframe([ (4, "spark j k"), (5, "l m n"), (6, "mapreduce spark"), (7, "apache hadoop") ], ["id", "text"]) # make predictions on test documents. cvmodel uses best model found (lrmodel). prediction = cvmodel.transform(test) selected = prediction.select("id", "text", "probability", "prediction") row in selected.collect(): print(row)
logisticregression
model has coefficients
not weights
. other can done below:
cvmodel # best model crossvalidator .bestmodel # last stage in pipeline .stages[-1] .coefficients)
Comments
Post a Comment