Skip to content

Latest commit

 

History

History
194 lines (131 loc) · 7.22 KB

ACKNOWLEDGMENTS.md

File metadata and controls

194 lines (131 loc) · 7.22 KB

Acknowledgments

The Internet Yellow Pages could not exist without all the awesome prior research and data sources. We list all of them here, if possible with their corresponding licenses, to which you will need to conform if you use the public instance or create a dump that includes these data sources.

Please refer to the READMEs in the respective crawler directories for more information.

Alice-LG

We retrieve route server looking glass snapshots from the following IXPs.

Name URL
AMS-IX https://lg.ams-ix.net/
BCIX https://lg.bcix.de/
DE-CIX https://lg.de-cix.net/
IX.br https://lg.ix.br/
LINX https://alice-rs.linx.net/
Megaport https://lg.megaport.com/
Netnod https://lg.netnod.se/

APNIC

We use APNIC's AS population estimate.

BGPKIT

We use the as2rel, peer-stats, and pfx2as datasets from BGPKIT.

Use of this data is authorized under their Acceptable Use Agreement.

BGP.Tools

We use AS names, AS tags, and anycast prefix tags provided by BGP.Tools.

CAIDA

We use two datasets from CAIDA which use is authorized under their Acceptable Use Agreement.

CAIDA AS Rank https://doi.org/10.21986/CAIDA.DATA.AS-RANK.

and

The CAIDA UCSD IXPs Dataset, https://www.caida.org/catalog/datasets/ixps

Cisco

We use the Cisco Umbrella Popularity List.

Citizen Lab

We use URL testing lists from The Citizen Lab.

Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website Censorship. https://github.com/citizenlab/test-lists.

This data is licensed under CC BY-NC-SA 4.0. No changes were made to the data.

Cloudflare

We use the radar/dns/top/ases, radar/dns/top/locations, radar/ranking/top, and radar/datasets endpoints of the Clouflare Radar API.

This data is licensed under CC BY-NC 4.0. No changes were made to the data.

Emile Aben

We use AS names provided by Emile Aben and others with permission (Hi Emile!).

Internet Health Report

We use three datasets from the Internet Health Report (that's us!): Country Dependency, AS Hegemony, and Route Origin Validation.

This data is licensed under CC BY-NC-SA 4.0. No changes were made to the data.

Internet Intelligence Lab

We use the AS to organization mapping from the Internet Intelligence Lab at Georgia Tech.

Z. Chen, Z. Bischof, C. Testart, A. Dainotti, "AS to Organization Mapping", Internet Intelligence Lab at Georgia Tech, https://github.com/InetIntel/Dataset-AS-to-Organization-Mapping

Use of this data is authorized under their Acceptable Use Agreement.

Number Resource Organization

We use the extended allocation and assignment reports provided by the Number Resource Organization.

OpenINTEL

We use several datasets from OpenINTEL, a joint project of the University of Twente, SURF, SIDN Labs and NLnet Labs.

The tranco1m and umbrella1m datasets are licensed under CC BY-NC-SA 4.0. No changes were made to the data. In addition, there are Terms of Use for this data.

The DNS Dependency Graph tool is a joint project of the University of Twente and IIJ Research Laboratory.

Other datasets are used with permission from OpenINTEL.

Packet Clearing House

We use the daily routing snapshots from Packet Clearing House.

This data is licensed under CC BY-NC-SA 3.0. No changes were made to the data.

PeeringDB

We use the fac, ix, ixlan, netfac, and org endpoints of the PeeringDB API.

Use of this data is authorized under their Acceptable Use Policy.

RIPE NCC

We use AS names, Atlas measurement information, and RPKI data from the RIPE NCC and RIPE Atlas.

SimulaMet

We use rDNS data from RIR-data.org, a joint project of SimulaMet and the University of Twente.

Alfred Arouna, Ioana Livadariu, and Mattijs Jonker. "Lowering the Barriers to Working with Public RIR-Level Data." Proceedings of the 2023 Workshop on Applied Networking Research (ANRW '23).

Stanford

We use the Stanford ASdb dataset provided by the Stanford Empirical Security Research Group.

ASdb: A System for Classifying Owners of Autonomous Systems. Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric. ACM Internet Measurement Conference (IMC), November 2021.

Tranco

We use the Tranco list provided by the DistriNet Research Unit KU Leuven, TU Delft, and LIG.

The Tranco list combines lists from five providers:

  1. Cisco Umbrella
  2. Majestic (available under a CC BY 3.0 license)
  3. Farsight
  4. Chrome User Experience Report (CrUX) (available under a CC BY-SA 4.0 license)
  5. Cloudflare Radar (available under a CC BY-NC 4.0 license).

Virginia Tech

We use the RoVista dataset provided by the NetSecLab group at Virginia Tech.

RoVista: Measuring and Understanding the Route Origin Validation (ROV) in RPKI. Weitong Li, Zhexiao Lin, Md. Ishtiaq Ashiq, Emile Aben, Romain Fontugne, Amreesh Phokeer, and Taejoong Chung. ACM Internet Measurement Conference (IMC), October 2023.