apache spark - PYSPARK: how to get weights from CrossValidatorModel? -

July 15, 2010

i have trained logistic regression model using cross validation using following code https://spark.apache.org/docs/2.1.0/ml-tuning.html

now want weights , intercept, error:

attributeerror: 'crossvalidatormodel' object has no attribute 'weights'

how can these attributes?

*the same problem (trainingsummary = cvmodel.summary)

from pyspark.ml import pipeline     pyspark.ml.classification import logisticregression     pyspark.ml.evaluation import binaryclassificationevaluator     pyspark.ml.feature import hashingtf, tokenizer     pyspark.ml.tuning import crossvalidator, paramgridbuilder  # prepare training documents, labeled. training = spark.createdataframe([     (0, "a b c d e spark", 1.0),     (1, "b d", 0.0),     (2, "spark f g h", 1.0),     (3, "hadoop mapreduce", 0.0),     (4, "b spark who", 1.0),     (5, "g d y", 0.0),     (6, "spark fly", 1.0),     (7, "was mapreduce", 0.0),     (8, "e spark program", 1.0),     (9, "a e c l", 0.0),     (10, "spark compile", 1.0),     (11, "hadoop software", 0.0) ], ["id", "text", "label"])  # configure ml pipeline, consists of tree stages: tokenizer, hashingtf, , lr. tokenizer = tokenizer(inputcol="text", outputcol="words") hashingtf = hashingtf(inputcol=tokenizer.getoutputcol(), outputcol="features") lr = logisticregression(maxiter=10) pipeline = pipeline(stages=[tokenizer, hashingtf, lr])  # treat pipeline estimator, wrapping in crossvalidator instance. # allow jointly choose parameters pipeline stages. # crossvalidator requires estimator, set of estimator parammaps, , evaluator. # use paramgridbuilder construct grid of parameters search over. # 3 values hashingtf.numfeatures , 2 values lr.regparam, # grid have 3 x 2 = 6 parameter settings crossvalidator choose from. paramgrid = paramgridbuilder() \     .addgrid(hashingtf.numfeatures, [10, 100, 1000]) \     .addgrid(lr.regparam, [0.1, 0.01]) \     .build()  crossval = crossvalidator(estimator=pipeline,                           estimatorparammaps=paramgrid,                           evaluator=binaryclassificationevaluator(),                           numfolds=2)  # use 3+ folds in practice  # run cross-validation, , choose best set of parameters. cvmodel = crossval.fit(training)  # prepare test documents, unlabeled. test = spark.createdataframe([     (4, "spark j k"),     (5, "l m n"),     (6, "mapreduce spark"),     (7, "apache hadoop") ], ["id", "text"])  # make predictions on test documents. cvmodel uses best model found (lrmodel). prediction = cvmodel.transform(test) selected = prediction.select("id", "text", "probability", "prediction") row in selected.collect():     print(row)

logisticregression model has coefficients not weights. other can done below:

cvmodel     # best model crossvalidator     .bestmodel     # last stage in pipeline     .stages[-1]     .coefficients)

Search This Blog

Insert

apache spark - PYSPARK: how to get weights from CrossValidatorModel? -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -