-
Notifications
You must be signed in to change notification settings - Fork 0
/
scrape_food_user_input.py
90 lines (77 loc) · 2.7 KB
/
scrape_food_user_input.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
scrape_food_user_input.py
=====
Get user input to interface with zagat or yelp. For example... using the short form of the zagat url:
http://www.zagat.com/search?text=thai&where[name]=Chicago
...we can substitute any value for the city. Use user input and string interpolation to construct a new URL. List the restaurants as you did in the scrape_food exercise.
1. Import the modules that you'll need (requests, bs4)
1. Loop forever
2. Ask for a city
3. Construct a url using that city
4. Use the requests module to get the url
5. Create a beautiful soup object using the text from the result of the request
6. Find all of the names of the restaurants and print them out
7. (OPTIONAL) change the search so that you can enter a different cuisine
8. (OPTIONAL) what other info can you scrape?... output that as well!
Expected Output:
What city plz?
>New York City
Chao Thai
Pure Thai Cookhouse
.
.
.
Beyond Thai Kitchen
What city plz?
>Chicago
Thai Grill & Noodle Bar
Butterfly Sushi Bar & Thai Cuisine
Butterfly Sushi Bar & Thai Cuisine
.
.
.
"""
import requests
import bs4
import re
url = "http://www.yelp.com"
#fullpath = url + path
while True:
area = raw_input("What city plz? ")
food = raw_input("Cuisine? ")
search = "http://www.yelp.com/search?find_desc=&find_loc=%s&cflt=%s" % (area, food)
req = requests.get(search)
soup = bs4.BeautifulSoup(req.text)
rests = soup.find_all('a', id = re.compile("bizTitleLink.*"))
print ''
for r in rests:
print r.string
#if r.has_key("href"):
# print r["href"]
h = bs4.BeautifulSoup(requests.get((url+r["href"])).text).find_all('p', "hours")
a = bs4.BeautifulSoup(requests.get((url+r["href"])).text).find_all("span", "street-address")
p = bs4.BeautifulSoup(requests.get((url+r["href"])).text).find_all("span", "tel")
for x in h:
try:
print " "+x.string
except TypeError:
print "NoneType object"
for x in a:
try:
print " "+x.string
except TypeError:
print "NoneType object"
for x in p:
try:
print " "+x.string
except TypeError:
print "NoneType object"
"""
for h in bs4.BeautifulSoup(requests.get((url+r["href"])).text).find_all('p', "hours"):
print " "+h.string
for a in bs4.BeautifulSoup(requests.get((url+r["href"])).text).find_all("span", "street-address"):
print " "+a.string
"""
#print "".join(str(t) for t in ((bs4.BeautifulSoup(requests.get((url+r["href"])).text).find_all("p", "hours")))
#print l.string
print ''