python - Scrapy webscraping an overwatch profile page -

March 15, 2014

i'm new python, , coding in general. i'm trying make webcrawler scrapes data overwatch player page (eg: https://playoverwatch.com/en-gb/career/pc/eu/taimou-2526) tried using portia, , worked in cloud, couldn't working when exported scrapy code. here screenshot of portia spider.

here code of spider (exported portia scrapy): owdata.py

from __future__ import absolute_import       #!/usr/bin/python # -*- coding: utf-8 -*- __future__ import absolute_import  scrapy import request scrapy.linkextractors import linkextractor scrapy.loader import itemloader scrapy.loader.processors import identity scrapy.spiders import rule  utils.spiders import baseportiaspider utils.starturls import feedgenerator, fragmentgenerator utils.processors import item, field, text, number, price, date, url (image, regex) items import portiaitem   class owdata(baseportiaspider):      name = 'owdata'     allowed_domains = [u'playoverwatch.com']     start_urls = \         [u'https://playoverwatch.com/en-gb/career/pc/eu/taimou-2526']     rules = [rule(linkextractor(allow=(), deny='.*'),              callback='parse_item', follow=true)]   items = [[]]

and here items.py code:

 __future__ import absolute_import  import scrapy collections import defaultdict scrapy.loader.processors import join, mapcompose, identity w3lib.html import remove_tags .utils.processors import text, number, price, date, url, image   class portiaitem(scrapy.item):     fields = defaultdict(     lambda: scrapy.field(         input_processor=identity(),         output_processor=identity()     ) )  def __setitem__(self, key, value):     self._values[key] = value  def __repr__(self):     data = str(self)     if not data:         return '%s' % self.__class__.__name__     return '%s(%s)' % (self.__class__.__name__, data)  def __str__(self):     if not self._values:         return ''     string = super(portiaitem, self).__repr__()     return string   class careeroverviewoverwatch1item(portiaitem): field1 = scrapy.field(     input_processor=text(),     output_processor=join(), ) melee_final_blows = scrapy.field(     input_processor=text(),     output_processor=join(), ) table = scrapy.field(     input_processor=text(),     output_processor=join(), ) tr = scrapy.field(     input_processor=text(),     output_processor=join(),

when run spider using:

scrapy crawl owdata -o data.csv

i empty data.csv file. i'm guessing there's wrong items? think xpath line should //tbody, again, know nothing python, xpath or scrapy...

Search This Blog

Insert

python - Scrapy webscraping an overwatch profile page -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -