-
Notifications
You must be signed in to change notification settings - Fork 17
Architecture
This project runs 100% on AWS, using 4 different services, with a really easy setup. The goal of this project was to write something unique (not necessarily fast) and fun to work on.
The pieces of this puzzle are the following AWS Services: AWS Lambda, DynamoDB, Cloudwatch and RDS MYSQL. Now, how do they fit together?
In this project, the starting point of the process (where the Bootstrapper starts) is this map of cities. After the Bootstrapper does a HTTP Get Request for this page, it parses the URLs that leads to each City (Burnaby, North Vancouver, Chilliwack etc), and goes into each one of them. At this step, the pages being fetched by the process have this format. The bootstrapper, once inside each City Page, will then insert the URL of each of these localities within each city into a DynamoDB Table.
The second Layer of lambda functions will be triggered, receiving the batches of locality urls inserted on the first DynamoDB Table, and navigating through their result pages. If we take Brentwood Park as an example, It has (as of the time of this writing) 2 pages, so the process will go through each page, fetch the URL of every listing it finds, and add into a second DynamoDB Table.
At last, the last layer of Lambda functions will fetch the page of every listing inserted into Dynamo and scrape all the useful information it finds out of these pages and stores the results into a RDS MySQL Database.