hash_join_codes_KNL

Vectorized implementations of hash join algorithms on Intel Xeon Phi (KNL)

Overview

Advanced processor architectures have been driving new designs, implementations and optimizations of main-memory hash join algorithms recently. The newly released Intel Xeon Phi many-core processor of the Knights Landing architecture (KNL) embraces interesting hardware features such as many low-frequency out-of-order cores connected on a 2D mesh, and high-bandwidth multi-channel memory (MCDRAM).

In these implementations, we experimentally revisit the state-of-the-art main-memory hash join algorithms to study how the new hardware features of KNL affect the algorithmic design and tuning as well as to identify the opportunities for further performance improvement on KNL. In detail, we implement the state-of-the-art simple hash join (denoted as SHJ), partitioned hash join (with and without NUMA-aware optimizations, denoted as PHJ and CPRA, respectively). Our experiments show that, although many existing optimizations are still valid on KNL with proper tuning, even the state-of-the-art algorithms have severely underutilized the memory bandwidth and other hardware resources.

Our implementations are based on codes from Polychroniou et al. which were designed for the previous generation of the Intel Xeon Phi. Their publication is listed in the following: Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 1493-1508. DOI: https://doi.org/10.1145/2723372.2747645

Architecture of Intel Xeon Phi

The Intel Xeon Phi of the Knights Landing architecture (KNL) is illustrated in the above figure. The KNL model we use consists of 32 tiles, 16 GB MCDRAM and other hardware components which are all connected to a 2D mesh. Each tile tightly couples two low-frequency out-of-order x86-based cores, and two 512-bit Vector Processing Units (VPUs). Each core has its L1 caches and shares a 1 MB L2 cache with the other peer in the same tile. Through the memory channels, KNL connects to at most 400 GB DDR main memory. For more details, please refer to our publication listed below.

Performance evaluation

The above figure compares the execution time of three state-of-the-art hash join algorithms when executed in the flat (where only the DDR is used), hybrid, and the cache mode. The hybrid mode helps to reduce the execution time by 29\%, 53\%, and 58\% for NPJ, PHJ, and CPRA, respectively. The cache mode further reduces the execution time by 65\%, 12\%, and 7\% for these three algorithms, respectively. Among them, PHJ performs the best. Among the three hash join algorithms, NPJ benefits a lot from the high-bandwidth MCDRAM configured as a third-level cache, and the other two have only minor performance gain. Comparing the hybrid mode and the cache mode, we find that, the larger the MCDRAM cache, the better the performance. For more results, please refer to our publication listed below.

Prerequisites

Hardware

Intel Xeon Phi Many-core processor of the Knights Landing Architecture
x86-based Intel CPUs with AVX-512 support (not verified yet)

Software

Linux
Intel C/C++ 17.0.2 20170213
The memkind library

Build

make npj
make phj
make cpra

Generate input relations

make write
./write [#threads] [size of the outer relation] [size of the input relation]

Please update the paths for input relations in the source files of hash joins accordinly after this generation.

Run hash joins

./npj [#threads] [size of the outer relation] [size of the input relation]
./phj [#threads] [size of the outer relation] [size of the input relation]
./cpra [#threads] [size of the outer relation] [size of the input relation]

Publication

Xuntao Cheng, Bingsheng He, Xiaoli Du, and Chiew Tong Lau. 2017. A Study of Main-Memory Hash Joins on Many-core Processor: A Case with Intel Knights Landing Architecture. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17). ACM, New York, NY, USA, 657-666. DOI: https://doi.org/10.1145/3132847.3132916

Ongoing work

Based on our exisiting study, we are exploing several directions to further exploit the many-core architecture and die-stacked High Bandwidth Memory for main-memory databases. We list three major directions in the following. By far, we have explored the first two directions and submitted our research manuscripts to related conferences.

Optimizing hash joins algorithms taking advantage of opportunities identified in exisiting studies
Deploying hash tables on many-core architectures with die-stacked High Bandwidth Memory
Optimizing query processing on such hardware platforms

Contact

Email Cheng Xuntao.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cpra		cpra
cpra2.cpp		cpra2.cpp
hj.h		hj.h
makefile		makefile
npj		npj
npj.cpp		npj.cpp
phj		phj
phj.cpp		phj.cpp
rand.h		rand.h
write.cpp		write.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hash_join_codes_KNL

Overview

Architecture of Intel Xeon Phi

Performance evaluation

Prerequisites

Hardware

Software

Build

Generate input relations

Run hash joins

Publication

Ongoing work

Contact

About

Releases

Packages

Languages

License

xtcyclist/hash_join_codes_KNL

Folders and files

Latest commit

History

Repository files navigation

hash_join_codes_KNL

Overview

Architecture of Intel Xeon Phi

Performance evaluation

Prerequisites

Hardware

Software

Build

Generate input relations

Run hash joins

Publication

Ongoing work

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages