Skip to content

Commit

Permalink
FP7_URLs
Browse files Browse the repository at this point in the history
  • Loading branch information
Simon Hardy committed Feb 13, 2018
1 parent 6144384 commit 9370a1a
Show file tree
Hide file tree
Showing 8 changed files with 4,048 additions and 1 deletion.
2 changes: 1 addition & 1 deletion spiders/cordis_spider.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class CordisSpider(scrapy.Spider):
def parse(self, response):
# Misconfiguration to check - eu in response.xpath not needed
#for eu in response.xpath('//*[@id="container-pack"]'):
if response.xpath('//*[@id="ica:content"][contains(.,"water") and contains(.,"drinking water")]'):
if response.xpath('//*[@id="ica:content"][contains(.,"water") or contains(.,"drinking") or contains(.,"nitrates")]'):
item = CordisItem()
item['Meta'] = response.xpath('/html/head/meta[23]').extract()
item['Project_ACR'] = response.xpath('//*[@id="dynamiccontent"]/div[1]/h1/text()').extract()
Expand Down
Binary file modified spiders/cordis_spider.pyc
Binary file not shown.
508 changes: 508 additions & 0 deletions spiders/dataset-textgenrnn.xml

Large diffs are not rendered by default.

File renamed without changes.
385 changes: 385 additions & 0 deletions wp_urls/FP7_ENERGY_URLs.csv

Large diffs are not rendered by default.

507 changes: 507 additions & 0 deletions wp_urls/FP7_ENVIRONMENT_URLs.csv

Large diffs are not rendered by default.

2,304 changes: 2,304 additions & 0 deletions wp_urls/FP7_ICT_URLs.csv

Large diffs are not rendered by default.

343 changes: 343 additions & 0 deletions wp_urls/FP7_INFRASTRUCTURES_URLs.csv

Large diffs are not rendered by default.

0 comments on commit 9370a1a

Please sign in to comment.