A practical, interactive, beginner-friendly introduction to Python, web scraping, sentiment analysis, and data visualization. No previous experience required, but we will cover a ton of material quickly. The bulk of the material is in a Python Notebook created with Google Colab.
You can't learn EVERYTHING here, but you can learn enough to get excited and comfortable to keep working and learning on your own! By the end of the course, you'll have a tool you can start using today, and place to get started from building your own tools.
You will start with no data, and make this Bar Graph Showing Vail, CO Headline Sentiment:
-
Web browser to use Google Colab, or your own way to open a Python Notebook.
-
Python Notebook Open in one of three ways:
- You can open the Colab notebook directly from this link: https://colab.research.google.com/github/nicholasgriffen/intro-web-scraping/blob/master/NewsPageAnalysis.ipynb
- Or open the copy from this repo in Colab by going to File > Open Notebook > Github and pasting this URL
https://github.com/nicholasgriffen/intro-web-scraping/blob/master/NewsPageAnalysis.ipynb
- Or click the "Open in Colab" button here: https://github.com/nicholasgriffen/intro-web-scraping/blob/master/NewsPageAnalysis.ipynb
Make a copy by clicking file
and make copy
or save to drive
HTML is one of the main building blocks of the web!
-
<html>
designates an HTML document -
<head>
contains undisplayed information about the document -
<title>
Creates a title for the document -
<body>
contains displayed information -
<header>, <main>, <footer>
denotes which part of the page elements belong -
<h1> - <h6>
create section headings (h1 biggest, h6 Smallest) -
<p>
creates paragraphs -
<a href=""></a>
(anchor), activates a link in the page -
<ul>, <ol>
creates lists<li>
contains items in lists
-
<br>
Inserts a single line break
most HTML tags require an opening and a closing tag. There are a few however that do not:
<img src="">
creates an image in the page<br>
creates a break in the content<input type="">
creates an input field<hr>
Creates a line in the page
IDs and classes are very similar. These are used to target specific elements(You'll see more examples in CSS section).
-
<h1 id="profile-header"></h1>
-
<h1 class="subject-header"></h1>
-
IDs should only be used once on a page. IDs can also be used to bring the user to a specific part of the page.
your-site/#profile-picture
will load the page near the profile picture. -
Classes can be used multiple times on a page.
See More tags here
Learn more HTML here
- Go to a web page
- right click
- select
inspect element
We're going to use Python to do our data collection. Below are external modules used in the Notebook.
We will use the Requests module to visit a URL and get web elements.
We will use Beautiful Soup is used to parse HTML and extract the information we need.
We'll use pandas to do some analysis and visulizations on our data
We'll use NLTK(Natural Language Toolkit) to do some simple natural language processing on some text.
Brought to you by Galvanize