python - Why does index cause typeerror in pandas dataframe groupby agg method? -
i'm building aggregate following code:
import numpy import pandas orders = pandas.read_csv( "orders.csv", dtype={ "order_id": numpy.int32, "user_id": numpy.int32, "eval_set": "category", "order_number": numpy.int8, "order_dow": numpy.int8, "order_hour_of_day": numpy.int8, "days_since_prior_order": numpy.float64 } ) orders.set_index('order_id', inplace=true, drop=false) prior_order_products = pandas.read_csv( "order_products__prior.csv", dtype={ "order_id": numpy.int32, "product_id": numpy.int32, "add_to_cart_order": numpy.int16, "reordered": numpy.int8 } ) prior_order_products.set_index(['order_id', 'product_id'], inplace=true, drop=false) prior_order_products = prior_order_products.join(orders, how="inner", on='order_id', rsuffix='_') prior_order_products.drop('order_id_', inplace=true, axis=1) del orders prior_order_products['user_product_id'] =\ 100000 * prior_order_products["user_id"].astype(numpy.int64) + prior_order_products["product_id"] user_products = prior_order_products.\ groupby('user_product_id', sort=false).\ agg({'order_id': ['size', 'last'], 'add_to_cart_order': 'sum'}) it gives following error:
traceback (most recent call last): file "c:/users/strategy/pycharmprojects/test/main.py", line 52, in <module> agg({'order_id': ['size', 'last'], 'add_to_cart_order': 'sum'}) ... typeerror: '<' not supported between instances of 'numpy.ndarray' , 'str' i can error away if comment line
prior_order_products.set_index(['order_id', 'product_id'], inplace=true, drop=false) also, can error away if limit number of rows read prior_order_products. file not malformed, no data missing or wrong format.
what error mean? how related index on prior_order_products? how related number of rows in prior_order_products?
Comments
Post a Comment