Skip to content

Commit

Permalink
Scrape with keywords
Browse files Browse the repository at this point in the history
  • Loading branch information
Simon Hardy committed Feb 2, 2018
1 parent 35af904 commit 31ea82a
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,12 @@ Run ```scrapy crawl cordis -o "filename"."extension"```.
* If you want to download information about a specific project you will have to change the following ```start_urls = ['http://cordis.europa.eu/project/rcn/%d_en.html' %(n) for n in range(210216, 210217)]``` in ```spiders/cordis_spider.py```.

* You can also extract from specific urls (sample urls.txt H2020 EU1)
* ```name = 'cordis'
```
name = 'cordis'
f = open("urls.txt")
start_urls = [url.strip() for url in f.readlines()]
f.close()```
f.close()
```

* You can decide to scrape which information extract by modifying the keywords ```if response.xpath('//*[@id="ica:content"][contains(.,"water") and contains(.,"drinking water")]'):```

Expand Down

0 comments on commit 31ea82a

Please sign in to comment.