diff --git a/__init__.py b/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/__init__.pyc b/__init__.pyc new file mode 100644 index 0000000..0182af6 Binary files /dev/null and b/__init__.pyc differ diff --git a/data-sample.csv b/data-sample.csv new file mode 100644 index 0000000..c8423b7 --- /dev/null +++ b/data-sample.csv @@ -0,0 +1,2 @@ +Title,url,Countries,spider,server,project,Teaser,date,Article,Subject +Trending Science: Bicarbonate of soda could spare women in developing countries the need and risk of a caesarean section,,United Kingdom,,,,"Lack of access to a caesarean section, or complications arising from one, accounts for many deaths in developing countries, but now a team of scientists has identified that a simple drink of bicarbonate of soda could make all the difference.",,"A simple kitchen stable that is cheap, accessible and easy to use has the potential to save lives, as a recent study has demonstrated. Labour fails when contractions are not strong enough and treatment with oxytocin is usually the next step. If that doesn’t work then a Caesarean section can be the solution.,But in rural environments in developing countries these may not be options and if a C-section can be carried out at all, there may be complications. The World Health Organisation explains that almost all maternal deaths (99 %) occur in developing countries and that the risk of maternal mortality is highest for adolescent girls under 15 years old. Complications in pregnancy and childbirth is a leading cause of death among adolescent girls in developing countries.,A simple sachet of sodium bicarbonate from the corner shop could help women give birth naturally,A study just conducted, involving 200 women, found that, when dissolved in water, bicarbonate of sodium enables between 17 and 20 % of women having slow or difficult labours to give birth naturally, without harming their babies. ,Professor Susan Wray, from the University of Liverpool, and a team of researchers at the Karolinska Institute in Sweden, gave bicarbonate of soda to 100 women in labour experiencing difficulties, as well as oxytocin. Another 100 women were treated with just oxytocin. The results, published in the journal of Maternal-Fetal & Neonatal Medicine, found those who had bicarbonate of soda increased their chances of a vaginal delivery.,‘The study was conducted with clinical colleagues in Sweden, and there at the corner shop you can buy this as an antacid, it really is low rent,’ said Prof Wray.,Why sodium bicarb?, on BBC Radio 4’s Today Programme, Prof Wray explained that studies at the University of Liverpool had found that the levels of acidity in the blood surrounding the uterus of women suffering a failure to progress in labour was significantly higher than any other group. ,Prof Wray and her team hypothesised that if they could neutralise that acid in these women, that would help them to have a normal, spontaneous vaginal delivery and avoid the surgery. Without knowing which group they were in, one group received oxytocin alone, while the other had bicarbonate of sodium to in the hope of neutralising the acid in their uterus, then oxytocin one hour later.,Describing the outcome as ‘amazing’, Professor Wray added, ‘We were able to significantly increase the number of women having a spontaneous delivery, avoiding the emergency Caesarean section. Not by just a few percent, but by around 17-20 %.’,She stressed that the study was a small, randomised controlled study. ‘But nevertheless we had 100 women in each of the two groups of our study and that was sufficient to rule out confounding factors like differences in BMI.’ ,A simple solution to an urgent problem could be on the way ,If the work they carried out with the cohort of 200 is replicable, the researchers could have proven a way to reduce maternal mortality and suffering using a very cheap, shop floor medication and kitchen cabinet staple. The team are really keen to replicate the results in more centres, but what Prof Wray is really looking forward to doing is getting one branch of the study up and running in sub-Saharan Africa. Liverpool has good links with hospitals in Uganda and Malawi, for example. ,‘In those low resource settings I’m sorry to say that women still die in large numbers in childbirth and this failure to progress is one of the reasons. So if those women could have this as a treatment, avoid surgery which, in any case may not be available to them or when it is, it’s not without significant risk, that would be just wonderful. Because you don’t need to keep this in the fridge, don’t need electricity… it’s so exciting.’", diff --git a/data-sample.jl b/data-sample.jl new file mode 100644 index 0000000..88ca1a8 --- /dev/null +++ b/data-sample.jl @@ -0,0 +1 @@ +{"Article": ["A simple kitchen stable that is cheap, accessible and easy to use has the potential to save lives, as a recent study has demonstrated. Labour fails when contractions are not strong enough and treatment with oxytocin is usually the next step. If that doesn\u2019t work then a Caesarean section can be the solution.", "But in rural environments in developing countries these may not be options and if a C-section can be carried out at all, there may be complications. The World Health Organisation explains that almost all maternal deaths (99 %) occur in developing countries and that the risk of maternal mortality is highest for adolescent girls under 15 years old. Complications in pregnancy and childbirth is a leading cause of death among adolescent girls in developing countries.", "A simple sachet of sodium bicarbonate from the corner shop could help women give birth naturally", "A study just conducted, involving 200 women, found that, when dissolved in water, bicarbonate of sodium enables between 17 and 20 % of women having slow or difficult labours to give birth naturally, without harming their babies. ", "Professor Susan Wray, from the University of Liverpool, and a team of researchers at the Karolinska Institute in Sweden, gave bicarbonate of soda to 100 women in labour experiencing difficulties, as well as oxytocin. Another 100 women were treated with just oxytocin. The results, published in the journal of Maternal-Fetal & Neonatal Medicine, found those who had bicarbonate of soda increased their chances of a vaginal delivery.", "\u2018The study was conducted with clinical colleagues in Sweden, and there at the corner shop you can buy this as an antacid, it really is low rent,\u2019 said Prof Wray.", "Why sodium bicarb?", " on BBC Radio 4\u2019s Today Programme, Prof Wray explained that studies at the University of Liverpool had found that the levels of acidity in the blood surrounding the uterus of women suffering a failure to progress in labour was significantly higher than any other group. ", "Prof Wray and her team hypothesised that if they could neutralise that acid in these women, that would help them to have a normal, spontaneous vaginal delivery and avoid the surgery. Without knowing which group they were in, one group received oxytocin alone, while the other had bicarbonate of sodium to in the hope of neutralising the acid in their uterus, then oxytocin one hour later.", "Describing the outcome as \u2018amazing\u2019, Professor Wray added, \u2018We were able to significantly increase the number of women having a spontaneous delivery, avoiding the emergency Caesarean section. Not by just a few percent, but by around 17-20 %.\u2019", "She stressed that the study was a small, randomised controlled study. \u2018But nevertheless we had 100 women in each of the two groups of our study and that was sufficient to rule out confounding factors like differences in BMI.\u2019 ", "A simple solution to an urgent problem could be on the way ", "If the work they carried out with the cohort of 200 is replicable, the researchers could have proven a way to reduce maternal mortality and suffering using a very cheap, shop floor medication and kitchen cabinet staple. The team are really keen to replicate the results in more centres, but what Prof Wray is really looking forward to doing is getting one branch of the study up and running in sub-Saharan Africa. Liverpool has good links with hospitals in Uganda and Malawi, for example. ", "\u2018In those low resource settings I\u2019m sorry to say that women still die in large numbers in childbirth and this failure to progress is one of the reasons. So if those women could have this as a treatment, avoid surgery which, in any case may not be available to them or when it is, it\u2019s not without significant risk, that would be just wonderful. Because you don\u2019t need to keep this in the fridge, don\u2019t need electricity\u2026 it\u2019s so exciting.\u2019"], "Teaser": ["Lack of access to a caesarean section, or complications arising from one, accounts for many deaths in developing countries, but now a team of scientists has identified that a simple drink of bicarbonate of soda could make all the difference."], "Countries": ["United Kingdom"], "Title": ["Trending Science: Bicarbonate of soda could spare women in developing countries the need and risk of a caesarean section"]} diff --git a/data-sample.json b/data-sample.json new file mode 100644 index 0000000..e0815e7 --- /dev/null +++ b/data-sample.json @@ -0,0 +1,6 @@ +[{ + "Article": ["A simple kitchen stable that is cheap, accessible and easy to use has the potential to save lives, as a recent study has demonstrated. Labour fails when contractions are not strong enough and treatment with oxytocin is usually the next step. If that doesn\u2019t work then a Caesarean section can be the solution.", "But in rural environments in developing countries these may not be options and if a C-section can be carried out at all, there may be complications. The World Health Organisation explains that almost all maternal deaths (99 %) occur in developing countries and that the risk of maternal mortality is highest for adolescent girls under 15 years old. Complications in pregnancy and childbirth is a leading cause of death among adolescent girls in developing countries.", "A simple sachet of sodium bicarbonate from the corner shop could help women give birth naturally", "A study just conducted, involving 200 women, found that, when dissolved in water, bicarbonate of sodium enables between 17 and 20 % of women having slow or difficult labours to give birth naturally, without harming their babies. ", "Professor Susan Wray, from the University of Liverpool, and a team of researchers at the Karolinska Institute in Sweden, gave bicarbonate of soda to 100 women in labour experiencing difficulties, as well as oxytocin. Another 100 women were treated with just oxytocin. The results, published in the journal of Maternal-Fetal & Neonatal Medicine, found those who had bicarbonate of soda increased their chances of a vaginal delivery.", "\u2018The study was conducted with clinical colleagues in Sweden, and there at the corner shop you can buy this as an antacid, it really is low rent,\u2019 said Prof Wray.", "Why sodium bicarb?", " on BBC Radio 4\u2019s Today Programme, Prof Wray explained that studies at the University of Liverpool had found that the levels of acidity in the blood surrounding the uterus of women suffering a failure to progress in labour was significantly higher than any other group. ", "Prof Wray and her team hypothesised that if they could neutralise that acid in these women, that would help them to have a normal, spontaneous vaginal delivery and avoid the surgery. Without knowing which group they were in, one group received oxytocin alone, while the other had bicarbonate of sodium to in the hope of neutralising the acid in their uterus, then oxytocin one hour later.", "Describing the outcome as \u2018amazing\u2019, Professor Wray added, \u2018We were able to significantly increase the number of women having a spontaneous delivery, avoiding the emergency Caesarean section. Not by just a few percent, but by around 17-20 %.\u2019", "She stressed that the study was a small, randomised controlled study. \u2018But nevertheless we had 100 women in each of the two groups of our study and that was sufficient to rule out confounding factors like differences in BMI.\u2019 ", "A simple solution to an urgent problem could be on the way ", "If the work they carried out with the cohort of 200 is replicable, the researchers could have proven a way to reduce maternal mortality and suffering using a very cheap, shop floor medication and kitchen cabinet staple. The team are really keen to replicate the results in more centres, but what Prof Wray is really looking forward to doing is getting one branch of the study up and running in sub-Saharan Africa. Liverpool has good links with hospitals in Uganda and Malawi, for example. ", "\u2018In those low resource settings I\u2019m sorry to say that women still die in large numbers in childbirth and this failure to progress is one of the reasons. So if those women could have this as a treatment, avoid surgery which, in any case may not be available to them or when it is, it\u2019s not without significant risk, that would be just wonderful. Because you don\u2019t need to keep this in the fridge, don\u2019t need electricity\u2026 it\u2019s so exciting.\u2019"], + "Teaser": ["Lack of access to a caesarean section, or complications arising from one, accounts for many deaths in developing countries, but now a team of scientists has identified that a simple drink of bicarbonate of soda could make all the difference."], + "Countries": ["United Kingdom"], + "Title": ["Trending Science: Bicarbonate of soda could spare women in developing countries the need and risk of a caesarean section"] +}] diff --git a/data-sample.xml b/data-sample.xml new file mode 100644 index 0000000..cf198c7 --- /dev/null +++ b/data-sample.xml @@ -0,0 +1,4 @@ + + +
A simple kitchen stable that is cheap, accessible and easy to use has the potential to save lives, as a recent study has demonstrated. Labour fails when contractions are not strong enough and treatment with oxytocin is usually the next step. If that doesn’t work then a Caesarean section can be the solution.But in rural environments in developing countries these may not be options and if a C-section can be carried out at all, there may be complications. The World Health Organisation explains that almost all maternal deaths (99 %) occur in developing countries and that the risk of maternal mortality is highest for adolescent girls under 15 years old. Complications in pregnancy and childbirth is a leading cause of death among adolescent girls in developing countries.A simple sachet of sodium bicarbonate from the corner shop could help women give birth naturallyA study just conducted, involving 200 women, found that, when dissolved in water, bicarbonate of sodium enables between 17 and 20 % of women having slow or difficult labours to give birth naturally, without harming their babies. Professor Susan Wray, from the University of Liverpool, and a team of researchers at the Karolinska Institute in Sweden, gave bicarbonate of soda to 100 women in labour experiencing difficulties, as well as oxytocin. Another 100 women were treated with just oxytocin. The results, published in the journal of Maternal-Fetal & Neonatal Medicine, found those who had bicarbonate of soda increased their chances of a vaginal delivery.‘The study was conducted with clinical colleagues in Sweden, and there at the corner shop you can buy this as an antacid, it really is low rent,’ said Prof Wray.Why sodium bicarb? on BBC Radio 4’s Today Programme, Prof Wray explained that studies at the University of Liverpool had found that the levels of acidity in the blood surrounding the uterus of women suffering a failure to progress in labour was significantly higher than any other group. Prof Wray and her team hypothesised that if they could neutralise that acid in these women, that would help them to have a normal, spontaneous vaginal delivery and avoid the surgery. Without knowing which group they were in, one group received oxytocin alone, while the other had bicarbonate of sodium to in the hope of neutralising the acid in their uterus, then oxytocin one hour later.Describing the outcome as ‘amazing’, Professor Wray added, ‘We were able to significantly increase the number of women having a spontaneous delivery, avoiding the emergency Caesarean section. Not by just a few percent, but by around 17-20 %.’She stressed that the study was a small, randomised controlled study. ‘But nevertheless we had 100 women in each of the two groups of our study and that was sufficient to rule out confounding factors like differences in BMI.’ A simple solution to an urgent problem could be on the way If the work they carried out with the cohort of 200 is replicable, the researchers could have proven a way to reduce maternal mortality and suffering using a very cheap, shop floor medication and kitchen cabinet staple. The team are really keen to replicate the results in more centres, but what Prof Wray is really looking forward to doing is getting one branch of the study up and running in sub-Saharan Africa. Liverpool has good links with hospitals in Uganda and Malawi, for example. ‘In those low resource settings I’m sorry to say that women still die in large numbers in childbirth and this failure to progress is one of the reasons. So if those women could have this as a treatment, avoid surgery which, in any case may not be available to them or when it is, it’s not without significant risk, that would be just wonderful. Because you don’t need to keep this in the fridge, don’t need electricity… it’s so exciting.’
Lack of access to a caesarean section, or complications arising from one, accounts for many deaths in developing countries, but now a team of scientists has identified that a simple drink of bicarbonate of soda could make all the difference.<value>Trending Science: Bicarbonate of soda could spare women in developing countries the need and risk of a caesarean section</value>United Kingdom
+
\ No newline at end of file diff --git a/items.py b/items.py new file mode 100644 index 0000000..2d39c68 --- /dev/null +++ b/items.py @@ -0,0 +1,28 @@ +# -*- coding: utf-8 -*- + +# Define here the models for your scraped items +# +# See documentation in: +# http://doc.scrapy.org/en/latest/topics/items.html + +import scrapy +from scrapy.item import Item, Field + + +class CordisnewsItem(scrapy.Item): + # define the fields for your item here like: + # name = scrapy.Field() + + # Cordis news + Title = Field() + Teaser = Field() + Article = Field() + Subject = Field() + Countries = Field() + + # Housekeeping fields + url = Field() + project = Field() + spider = Field() + server = Field() + date = Field() diff --git a/items.pyc b/items.pyc new file mode 100644 index 0000000..f15dc96 Binary files /dev/null and b/items.pyc differ diff --git a/middlewares.py b/middlewares.py new file mode 100644 index 0000000..4816f53 --- /dev/null +++ b/middlewares.py @@ -0,0 +1,56 @@ +# -*- coding: utf-8 -*- + +# Define here the models for your spider middleware +# +# See documentation in: +# http://doc.scrapy.org/en/latest/topics/spider-middleware.html + +from scrapy import signals + + +class CordisnewsSpiderMiddleware(object): + # Not all methods need to be defined. If a method is not defined, + # scrapy acts as if the spider middleware does not modify the + # passed objects. + + @classmethod + def from_crawler(cls, crawler): + # This method is used by Scrapy to create your spiders. + s = cls() + crawler.signals.connect(s.spider_opened, signal=signals.spider_opened) + return s + + def process_spider_input(self, response, spider): + # Called for each response that goes through the spider + # middleware and into the spider. + + # Should return None or raise an exception. + return None + + def process_spider_output(self, response, result, spider): + # Called with the results returned from the Spider, after + # it has processed the response. + + # Must return an iterable of Request, dict or Item objects. + for i in result: + yield i + + def process_spider_exception(self, response, exception, spider): + # Called when a spider or process_spider_input() method + # (from other spider middleware) raises an exception. + + # Should return either None or an iterable of Response, dict + # or Item objects. + pass + + def process_start_requests(self, start_requests, spider): + # Called with the start requests of the spider, and works + # similarly to the process_spider_output() method, except + # that it doesn’t have a response associated. + + # Must return only requests (not items). + for r in start_requests: + yield r + + def spider_opened(self, spider): + spider.logger.info('Spider opened: %s' % spider.name) diff --git a/pipelines.py b/pipelines.py new file mode 100644 index 0000000..b9c7c7e --- /dev/null +++ b/pipelines.py @@ -0,0 +1,11 @@ +# -*- coding: utf-8 -*- + +# Define your item pipelines here +# +# Don't forget to add your pipeline to the ITEM_PIPELINES setting +# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html + + +class CordisnewsPipeline(object): + def process_item(self, item, spider): + return item diff --git a/settings.py b/settings.py new file mode 100644 index 0000000..b532fa0 --- /dev/null +++ b/settings.py @@ -0,0 +1,90 @@ +# -*- coding: utf-8 -*- + +# Scrapy settings for cordisnews project +# +# For simplicity, this file contains only settings considered important or +# commonly used. You can find more settings consulting the documentation: +# +# http://doc.scrapy.org/en/latest/topics/settings.html +# http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html +# http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html + +BOT_NAME = 'cordisnews' + +SPIDER_MODULES = ['cordisnews.spiders'] +NEWSPIDER_MODULE = 'cordisnews.spiders' + + +# Crawl responsibly by identifying yourself (and your website) on the user-agent +#USER_AGENT = 'cordisnews (+http://www.yourdomain.com)' + +# Obey robots.txt rules +ROBOTSTXT_OBEY = True + +# Configure maximum concurrent requests performed by Scrapy (default: 16) +#CONCURRENT_REQUESTS = 32 + +# Configure a delay for requests for the same website (default: 0) +# See http://scrapy.readthedocs.org/en/latest/topics/settings.html#download-delay +# See also autothrottle settings and docs +#DOWNLOAD_DELAY = 3 +# The download delay setting will honor only one of: +#CONCURRENT_REQUESTS_PER_DOMAIN = 16 +#CONCURRENT_REQUESTS_PER_IP = 16 + +# Disable cookies (enabled by default) +#COOKIES_ENABLED = False + +# Disable Telnet Console (enabled by default) +#TELNETCONSOLE_ENABLED = False + +# Override the default request headers: +#DEFAULT_REQUEST_HEADERS = { +# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', +# 'Accept-Language': 'en', +#} + +# Enable or disable spider middlewares +# See http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html +#SPIDER_MIDDLEWARES = { +# 'cordisnews.middlewares.CordisnewsSpiderMiddleware': 543, +#} + +# Enable or disable downloader middlewares +# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html +#DOWNLOADER_MIDDLEWARES = { +# 'cordisnews.middlewares.MyCustomDownloaderMiddleware': 543, +#} + +# Enable or disable extensions +# See http://scrapy.readthedocs.org/en/latest/topics/extensions.html +#EXTENSIONS = { +# 'scrapy.extensions.telnet.TelnetConsole': None, +#} + +# Configure item pipelines +# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html +#ITEM_PIPELINES = { +# 'cordisnews.pipelines.CordisnewsPipeline': 300, +#} + +# Enable and configure the AutoThrottle extension (disabled by default) +# See http://doc.scrapy.org/en/latest/topics/autothrottle.html +#AUTOTHROTTLE_ENABLED = True +# The initial download delay +#AUTOTHROTTLE_START_DELAY = 5 +# The maximum download delay to be set in case of high latencies +#AUTOTHROTTLE_MAX_DELAY = 60 +# The average number of requests Scrapy should be sending in parallel to +# each remote server +#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0 +# Enable showing throttling stats for every response received: +#AUTOTHROTTLE_DEBUG = False + +# Enable and configure HTTP caching (disabled by default) +# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings +#HTTPCACHE_ENABLED = True +#HTTPCACHE_EXPIRATION_SECS = 0 +#HTTPCACHE_DIR = 'httpcache' +#HTTPCACHE_IGNORE_HTTP_CODES = [] +#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage' diff --git a/settings.pyc b/settings.pyc new file mode 100644 index 0000000..91e4add Binary files /dev/null and b/settings.pyc differ diff --git a/spiders/__init__.py b/spiders/__init__.py new file mode 100644 index 0000000..ebd689a --- /dev/null +++ b/spiders/__init__.py @@ -0,0 +1,4 @@ +# This package will contain the spiders of your Scrapy project +# +# Please refer to the documentation for information on how to create and manage +# your spiders. diff --git a/spiders/__init__.pyc b/spiders/__init__.pyc new file mode 100644 index 0000000..395d802 Binary files /dev/null and b/spiders/__init__.pyc differ diff --git a/spiders/basic.py b/spiders/basic.py new file mode 100644 index 0000000..83e5362 --- /dev/null +++ b/spiders/basic.py @@ -0,0 +1,21 @@ +# -*- coding: utf-8 -*- +import scrapy +from scrapy.loader import ItemLoader +from cordisnews.items import CordisnewsItem + + +class BasicSpider(scrapy.Spider): + name = 'basic' + allowed_domains = ['cordis.europa.eu'] + start_urls = ['http://cordis.europa.eu/news/rcn/%d_en.html' %(n) for n in range(128792, 128793)] + + def parse(self, response): + + l = ItemLoader(item=CordisnewsItem(), response=response) + l.add_xpath('Title', '//*[@id="dynamiccontent"]/div[1]/h1/text()') + l.add_xpath('Teaser', '//*[@id="dynamiccontent"]/div[2]/div[1]/div[1]/text()') + l.add_xpath('Article', '//*[@id="dynamiccontent"]/div[2]/div[1]/div[3]/text()') + l.add_xpath('Subject', '/*[@id="subjects"]/a/text()') + l.add_xpath('Countries', '//*[@id="Country"]/div[2]/ul/li/text()') + + return l.load_item() diff --git a/spiders/basic.pyc b/spiders/basic.pyc new file mode 100644 index 0000000..7223304 Binary files /dev/null and b/spiders/basic.pyc differ