CrossOver: Web Scraping Tools

Overview

CrossOver is a web scraping toolset, initiated in 2021 with funding from the European Union’s programme on the financing of Pilot Projects and Preparatory Actions in the field of “Communications Networks, Content and Technology” under the grant agreement LC-01682253, as well as from the Mozilla Technology Fund. The project was led by EU DisinfoLab in collaboration with CheckFirst, Apache, and Savoir-Devenir. For more detailed information about the project, please visit here.

Media Highlights

Here are some key media coverages that provide insights into the impact and relevance of the CrossOver project:

Nieuws in de Klas: This article discusses the role of CrossOver in combating disinformation in the digital age.
Politico: This piece highlights the geopolitical implications of the CrossOver project.
OpenFacto: This article delves into how Google's autocomplete function can inadvertently spread disinformation, and how tools like CrossOver can help mitigate this.
Daar Daar: This piece discusses the role of CrossOver in tracking and analyzing Russian propaganda in Belgium.
Science Media Hub: This article reviews the technological advancements of CrossOver in the field of data science and media.

Installation

To install CrossOver, run the following command in your terminal: pip install .

For YouTube search emulation, you will need to install youtube-dl, a required dependency. Use the following command: pip install -e git+https://github.com/ytdl-org/youtube-dl#egg=youtube_dl

Please note that these commands may vary depending on your setup environment.

Usage

Google Autocomplete

Use crossover -g -i queries.csv to load an input file containing queries. For each entry, it returns the suggested autocomplete searches provided by the Google Autocomplete API. The responses are printed to stdout in a machine-readable format, along with a screenshot.

YouTube Search Emulation

Use crossover -y -i queries.csv to load an input file containing queries. It simulates web browsing to the YouTube search page, parses all results, and collects their metadata. It then collects metadata from the recommended videos for each search result. All metadata are printed to stdout in a machine-readable format.

Reddit Trends

Use crossover -r -i queries.csv to load an input file containing queries. It returns the trending posts of the corresponding subreddit. The returned metadata fields include ID, author, title, number of likes, number of comments, URL to the post, timestamp of content scraping, and timestamp shift (estimation of the post's age at the time of scraping).

Twitter Trends

As of July 2023, due to changes applied by Twitter, this feature is no longer maintained.

License

Please refer to the LICENSE file for licensing information.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
crossover		crossover
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
queries.csv		queries.csv
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossOver: Web Scraping Tools

Overview

Media Highlights

Installation

Usage

Google Autocomplete

YouTube Search Emulation

Reddit Trends

Twitter Trends

License

Copyright

About

Releases

Packages

Contributors 2

Languages

License

CrossOverSocial/CrossOver-Scrapers

Folders and files

Latest commit

History

Repository files navigation

CrossOver: Web Scraping Tools

Overview

Media Highlights

Installation

Usage

Google Autocomplete

YouTube Search Emulation

Reddit Trends

Twitter Trends

License

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages