Education, Science, Technology, Innovation and Life
Open Access
Sign In

A Lowest Cost RDD Caching Strategy for Spark

Download as PDF

DOI: 10.23977/amce.2019.005

Author(s)

Yuyang Wang, Tianlei Zhou

Corresponding Author

Yuyang Wang

ABSTRACT

Spark abstracts intermediate results into RDD in memory and manages them with LRU strategy to improve performance. However, RDD will be reloaded in many cases because RDD for different computing tasks have different lifecycle, which incurs additional system overhead. In this paper we proposed a lowest cost replacement strategy as Spark's cache replacement strategy to eliminate this problem. This strategy preemptively evicts RDD with small weight values from memory based on the weight model. And then, in this process, we select the solution with the lowest cost to replace the RDD in memory to improve the efficiency of Spark. Finally, experiment results show that strategy we proposed can speed up the efficiency of the whole cluster.

KEYWORDS

RDD, Spark memory management, Memory computing, Cache strategy

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.