python - Create Feature Using K-Nearest Neighbors -


i'm relatively new python , machine learning, i've been working on building out predictive model mortgage prices. i'm struggling using k-nearest neighbor algorithm create feature.

here's how understand mechanics of want accomplish:

  1. i have 2 data files: mortgages sold , mortgages listed
  2. in both data files have same features (including lat/long).
  3. i want create column in mortgages listed represents median price of closely related homes in immediate area.
  4. i'll use methodology listed in 3 create columns 1-3 months, 4-6 months, 7-12 months.
  5. another column trend of 3 columns.

i've found on knn imputation, doesn't seem i'm looking for.

how go executing idea? there resources may have missed help?

any guidance appreciated. thanks!

so, understand, want fit knn model using mortgages sold data predict prices mortgages listed data. classical knn problem need find nearest features vectors in sold data each feature vector in listed data, , take median of feature vectors.

  • consider there n rows in sold data, , feature vectors each row x1,x2, ..., xn , corresponding prices p1, p2, ..., pn

    x_train = [x1, x2, ..., xn]

    y_train = [p1, p2, ..., pn]

  • note here each xi feature vector , representative of ith row

  • for now, consider want 5 closest rows in sold data each row in listed data. so, knn model parameter here might need optimised later is:

    number_of_neighbours = 5

  • now, training code this:

    from sklearn.neighbors import kneighborsclassifier

    knn_model = kneighborsclassifier(n_neighbors=number_of_neighbours)

    knn_model.fit(x_train, y_train)

  • for prediction, consider there m rows in listed data, , feature vectors each row f1, f2, ..., fm. corresponding median prices z1, z2, ..., zm need determined.

    x_test = [f1, f2, ..., fm]

  • note feature vectors in x_train , x_test should vectorized using same vectorizer/transformer. read more vectorizers here.

  • the prediction code this:

    y_predicted = knn_model.predict(x_test)

  • each element of y_predicted list contain (in case) 5 closest prices y_train. is:

    y_predicted = [(p11, p12, .., p15), (p21, p22, .., p25), .., (pm1, pm2, .., pm5)]

  • for each jth element of y_predicted:

    import numpy np

    zj = np.median(np.array([pj1, pj2, .., pj5]))

  • hence, in way, can find median price zj each row of listed data

  • now, coming parameter optimisation part. hyper-parameter in knn model number_of_neighbours. can find optimal value of parameter dividing x_train 80:20 ratio. train on 80% part , cross-validate on remaining 20% part. once, sure accuracy numbers enough, can use value of hyper-parameter number_of_neighbours prediction on y_test.

  • in end, month-wise analysis, need create month-wise models. example, m1 = trained on 1-3 month sold data, m2 = trained on 4-6 month sold data, m3 = trained on 7-12 month sold data, etc.

reference: http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.kneighborsclassifier.html


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -