python - Scrapy webscraping an overwatch profile page -
i'm new python, , coding in general. i'm trying make webcrawler scrapes data overwatch player page (eg: https://playoverwatch.com/en-gb/career/pc/eu/taimou-2526) tried using portia, , worked in cloud, couldn't working when exported scrapy code. here screenshot of portia spider.
here code of spider (exported portia scrapy): owdata.py
from __future__ import absolute_import #!/usr/bin/python # -*- coding: utf-8 -*- __future__ import absolute_import scrapy import request scrapy.linkextractors import linkextractor scrapy.loader import itemloader scrapy.loader.processors import identity scrapy.spiders import rule utils.spiders import baseportiaspider utils.starturls import feedgenerator, fragmentgenerator utils.processors import item, field, text, number, price, date, url (image, regex) items import portiaitem class owdata(baseportiaspider): name = 'owdata' allowed_domains = [u'playoverwatch.com'] start_urls = \ [u'https://playoverwatch.com/en-gb/career/pc/eu/taimou-2526'] rules = [rule(linkextractor(allow=(), deny='.*'), callback='parse_item', follow=true)] items = [[]]
and here items.py code:
__future__ import absolute_import import scrapy collections import defaultdict scrapy.loader.processors import join, mapcompose, identity w3lib.html import remove_tags .utils.processors import text, number, price, date, url, image class portiaitem(scrapy.item): fields = defaultdict( lambda: scrapy.field( input_processor=identity(), output_processor=identity() ) ) def __setitem__(self, key, value): self._values[key] = value def __repr__(self): data = str(self) if not data: return '%s' % self.__class__.__name__ return '%s(%s)' % (self.__class__.__name__, data) def __str__(self): if not self._values: return '' string = super(portiaitem, self).__repr__() return string class careeroverviewoverwatch1item(portiaitem): field1 = scrapy.field( input_processor=text(), output_processor=join(), ) melee_final_blows = scrapy.field( input_processor=text(), output_processor=join(), ) table = scrapy.field( input_processor=text(), output_processor=join(), ) tr = scrapy.field( input_processor=text(), output_processor=join(),
when run spider using:
scrapy crawl owdata -o data.csv
i empty data.csv file. i'm guessing there's wrong items? think xpath line should //tbody, again, know nothing python, xpath or scrapy...
Comments
Post a Comment