Download Hash Join Performance - Computer and Information Science - Lecture Slides and more Slides Applications of Computer Sciences in PDF only on Docsity!
Improving Hash Join Performance
through Prefetching
1
Outline
____________________________
- Overview
- Proposed Techniques
- Experimental setup
- Performance evaluation
- Conclusion
Hash Join Performance
________________________________
- Suffer from CPU Cache Stalls - Most of execution time is wasted on data cache misses
- 82% for partition, 73% for join
- Because of random access patterns in memory
Solution: Cache Prefetching
___________________________
- Cache prefetching has been successfully
applied to several types of applications.
- exploit cache prefetching to improve hash
join performance.
Overcoming These Challenges
- Evaluate two new prefetching
techniques:
Group prefetching
- try to hide cache miss latency across a group tuples
Software-pipelined prefetching
- avoid these intermittent stalls
Group Prefetching
- Hide cache miss latency across a group tuples.
- Then combine the processing of a group of tuples into a
single loop body and rearrange the probe operations into stages
- Process the tuples for a stage and then move to the next
stage
- Add prefetch instructions to the algorithm.
- issue prefetch instructions in one code stage for the
memory references in the next code stage.
Group vs. Software-Pipelined Prefetching
Hiding latency:
- Software-pipelined pref is always able to hide all latencies
Book-keeping overhead:
- Software-pipelined pref has more overhead
Code complexity:
- Group prefetching is easier to implement
- Natural group boundary provides a place to do necessary processing left (e.g. for read-write conflicts)
- A natural place to send outputs to the parent operator if pipelined operator is needed
Experimental Setup
- Use a simple schema for both the build and
probe relations
- Every tuple contains a 4 byte join attribute
and a fixed length payload
- Perform join without selections and
projections.
- Assume the join phase uses 50MB memory to
join a pair of build and probe partition
Performance Evaluation cont..
User-Mode CPU Cache Performance
This technique achieved 3.02-4.04X speedups over original hash join
Performance Evaluation cont..
Join Performance varying Memory Latency
-prefecthing techniques are effective even when the processor/memory speed gap increases dramatically
Conclusion
- Even though prefetching is a promising technique for
improving CPU cache performance, applying it to the
hash join algorithm is not straightforward
(due to the dependencies within the processing of a single tuple and the randomness of Hashing)
- Experimental results demonstrated that hash join
performance can be improved by using group
prefetching and software-pipelined prefetching
techniques.
- Several practicle issues when used on DBMS that
targets multiple architectures