By embedding the distribution of keys in indexing structure, learned indexes can minimize the index size and maximize the lookup performance. Yet, one of the problems in the present learned index is the long index-building time. The conventional learned index requires a complete traversal of the entire dataset, which makes it less practical than traditional index. This paper challenges the efficiency of build time to make the learned index practical. Our approach for a build time-efficient learned index is to employ sampled learning. In this paper, we present two error-bounded sampling schemes: Sample EB-PLA, and Sample EB-Histogram. Although sampling is a simple idea, there are several considerations to make it practical. For example, sampling interval, errorboundness, and index hyper-parameters are inter-related each other, presenting complicated trade-offs between build-time, index size, accuracy and lookup latency. Throughout the extensive experiments over six real-world datasets, we show that the index-building time can be efficiently reduced over an order of magnitude by our sampling schemes. The results reveal that the sampling expands the design space of learned indexes, including the build-time as well as lookup performance and index size. Our Pareto analysis shows that a learned index can be built more efficiently than a traditional index through sampling.
The purpose of this paper is proposing a novel search mechanism, called SLR (Segmented Linear Regression) search, based on the concept of learned index. It is motivated by our observation that a lot of big data, collected and used by previous studies, have a linearity property, meaning that keys and their stored locations show a strong linear correlation. This observation leads us to design SLR search where we apply segmentation into the well-known machine learning algorithm, linear regression, for identifying a location from a given key. We devise two segmentation techniques, equal-size and error-aware, with the consideration of both prediction accuracy and segmentation overhead. We implement our proposal in LevelDB, Googleโs key-value store, and verify that it can improve search performance by up to 12.7%. In addition, we find that the equal-size technique provides efficiency in training while the error-aware one is tolerable to noisy data.
Best Paper Award
๋ฐ์ดํฐ์ ์ ์ฆ๊ฐ์ ๋ฐ๋ผ ๋ฐ์ํ ์ ํต์ ์ธ ์ธ๋ฑ์ค์ ํฌ๊ธฐ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด Learned Index ๊ธฐ๋ฒ์ด ์ ์ํ์๋ค. ๋ํ์ ์ธ ์ฝ๊ธฐ ์ ์ฉ Learned Index์ธ RMI(Recursive Model Indexes)๋ ์ฌ๋ฌ ๊ณ์ธต์์ ๋ฐ์ดํฐ ๊ตฌ ๊ฐ์ ์ธ๋ถํํ ๊ตฌ์กฐ๋ฅผ ๊ฐ์ง๋ค. RMI์์ ํค ํ์์ ์ฌ๋ฌ ๊ณ์ธต์ ๋ชจ๋ธ์ ๊ฑฐ์ณ ์์น๋ฅผ ๊ฒฐ์ ํ๊ณ ๋ฏธ๋ฆฌ ์ธก์ ํ ์ค ์ฐจ๋ฒ์๋ฅผ ํ์ํ์ฌ ์ต์ข ์์น์ ์ ๊ทผํ๋ ๊ณผ์ ์ ๊ฐ์ง๋ค. RMI์ ์ค์ฐจ๋ฒ์ ํ์์ ๋ชจ๋ธ ์์ธก๊ณผ ๋น์ทํ ์ค๋ฒํค ๋๋ฅผ ๊ฐ์ง๋ค๋ ๋จ์ ์ด ์๋ค. ๋ณธ ๋ ผ๋ฌธ์์๋ RMI์ ์ค์ฐจ๋ฒ์ ํ์ ์ฑ๋ฅ์ ๊ฐ์ ํ๊ธฐ ์ํด Branchless Binary Search์ SIMD(Single Instruction Multiple Data) Linear Search ๊ธฐ๋ฒ์ ์ ์ฉํ์ฌ ํ์ ์ง์ฐ ์๊ฐ์ ์ต๋ 1.73, 2.97๋ฐฐ ๊ฐ์์์ผฐ๋ค. (To address the size issues of traditional indexes caused by the increase in data volume, the Learned Index technique was proposed. A representative read-only Learned Index, RMI (Recursive Model Indexes), has a structure that segments data ranges across multiple layers. In RMI, key retrieval involves determining the position through several layers of models and accessing the final position by searching within a pre-measured error range. However, the error range search in RMI has a drawback of having an overhead similar to model prediction. In this paper, we apply Branchless Binary Search and SIMD (Single Instruction Multiple Data) Linear Search techniques to improve the error range search performance of RMI, reducing the search latency by up to 1.73 and 2.97 times, respectively)
Best Presentation Award
๋ณธ ๋ ผ๋ฌธ์์๋ updatable learned index์ ๊ณต์ ํ ๋น๊ต๋ฅผ ์ํด ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ ์ ์ฑ ์ด ๋ค๋ฅธ ๋ ์ธ๋ฑ์ค ALEX์ LIPP๋ฅผ ๋ถ์ํ์ฌ ์ธ๋ฑ์ค ํฌ๊ธฐ๋ฅผ ๋น์ทํ๊ฒ ์ฌ์ฉํ ๋์ ์ฑ๋ฅ์ ๋น๊ตํ๋ค. ๋ํ, ํค ์ฝ์ ์์ ๋ฐ์ํ๋ ๊ณผ์ ์ ํ์๊ณผ ์ฝ์ ์ผ๋ก ๊ตฌ๋ถํ์ฌ ๋ด๋ถ ์์ ์ ๋ถ์ํจ์ผ๋ก์จ ์ํฌ๋ก๋์ ๋ฐ๋ฅธ ์ฑ๋ฅ ๋ณ๋ชฉ ์์ธ์ ๋ถ์ํ๋ค. (In this paper, we conduct a fair comparison of updatable learned indexes by analyzing two indexes with different memory usage policies, ALEX and LIPP, and compare their performance when the index sizes are similar. Additionally, we differentiate the processes that occur during key insertion into search and insertion operations, and analyze the internal tasks to identify the performance bottlenecks according to the workload.)
์ธ๋ฑ์ค๋ ๋ฐ์ดํฐ๋ฒ ์ด์ค ์์คํ ์ ํต์ฌ ์์๋ก์ต๊ทผ์๋ ํ์ ํค์ ์์น ์์ธก์ ๋จธ์ ๋ฌ๋ , ๋ชจ๋ธ์ ํ์ฉํ๋ ๊ฐ ์ฃผ๋ชฉ๋ฐ๊ณ ์๋ค์ด๊ธฐ์ ๋ ํ์ ์ฑ๋ฅ์ด Learned Index . Learned Index ๋ฐ์ด๋๊ณ ์ธ๋ฑ์ค ํฌ๊ธฐ๊ฐ ์์ ์ฅ์ ์ด ์์ง๋ง์์ ์ฐ์ฐ์ ์ง์ํ์ง ์๋ ํ๊ณ๊ฐ ์์๋ค, . ๋ ์ถ๊ฐ ๊ณต๊ฐ์ ํ ๋นํ์ฌ ์์ ์ฐ์์ ์ง์ํ์ง๋ง์ฝ์ ์ฑ๋ฅ์ ์ํด Updatable Learned Index , ๋ฐฐ์ด์ ํฌ๊ธฐ๋ฅผ ์ฌ์ ๋กญ๊ฒ ์ค์ ํ๋ฉด์ ์ธ๋ฑ์ค์ ํฌ๊ธฐ๊ฐ ์ฆ๊ฐํ๋ค์ธ๋ฑ์ค ํฌ๊ธฐ๋ ์ ์ฅ ์ฉ๋๊ณผ . ๋น์ฉ ์ธก๋ฉด์์ ์ค์ํ ์์์ด์ง๋ง์ด์ ์ฐ๊ตฌ๋ค์ ๋ค๋ฅธ ์ฑ๋ฅ ์งํ์ ๋นํด ์ด๋ฅผ ์๋์ ์ผ๋ก , ๊ฐ๊ณผํ๋ ๊ฒฝํฅ์ด ์๋ค๋ณธ ๋ ผ๋ฌธ์์๋ ๋ ๊ฐ์ง ๋ํ์ ์ธ ์ธ ์ . Updatable Learned Index ALEX ๋ฅผ ๋์์ผ๋ก ์ธ๋ฑ์ค ํฌ๊ธฐ์ ์ํฅ์ ๋ฏธ์น๋ ๋ด๋ถ ์ธ์๋ฅผ ๋ถ์ํ๊ณ ํฌ๊ธฐ๋ฅผ ๊ณ ๋ คํ์ฌ LIPP , ์ฒ๋ฆฌ๋์ ๋น๊ตํ๋ค์ธ๋ฑ์ค ํฌ๊ธฐ๊ฐ ๋์ผํ ๋์ํฌ๋ก๋์์๋ ๊ฐ๋ฐ . , Read-Only ALEX , Mixed ์ํฌ๋ก๋์์๋ ๊ฐ ๋ ๋์ ์ฒ๋ฆฌ๋์ ๋ณด์ธ๋ค์ด๋ ๊ธฐ์กด ์ฐ๊ตฌ์์ ํฌ๊ธฐ๋ฅผ Write-Only LIPP . ๊ณ ๋ คํ์ง ์๊ณ ์ฑ๋ฅ์ ๋น๊ตํ ๋๊ฐ ๋ชจ๋ ์ํฌ๋ก๋์์ ๋ณด๋ค ๋ ๋์ ์ฒ๋ฆฌ๋์ , LIPP ALEX ๋ณด์ธ ๊ฒ๊ณผ ๋์กฐ์ ์ด๋ค๋ํ์ธ๋ฑ์ค ํฌ๊ธฐ๊ฐ ํ์ ์ฑ๋ฅ์๋ ํฐ ์ํฅ์ ๋ฏธ์น์ง๋ง ์ฝ์ ์ฑ๋ฅ์๋ . , ๋ ๋ฏธ์น๋ฏ๋ก ๋ค๋ฅธ ๋ณ๋ชฉ ์ง์ ์ด ์กด์ฌํจ์ ์ง์ ํ๋ฉฐ์ ๊ฐ์ ๋ฐฉํฅ์ , Updatable Learned Index ์ ์ํ๋ค. (Indexes are a key component of database systems, and recently, there has been growing interest in utilizing machine learning models to predict the position of search keys. Early models, such as the Learned Index, offered excellent search performance and small index sizes but had limitations in supporting update operations. The Updatable Learned Index, which allocates additional space to support update operations, increases the size of the index by setting the array size generously for insertion performance. Index size is a crucial factor in terms of storage capacity and cost, but previous studies have tended to overlook this aspect compared to other performance metrics. In this paper, we analyze internal factors affecting the index size of two representative indexes, Updatable Learned Index and ALEX, and compare throughput considering size. When the index size is the same, ALEX shows higher throughput in read-only workloads, while LIPP shows higher throughput in mixed workloads. This contrasts with previous studies where LIPP demonstrated higher throughput than ALEX in all workloads without considering size. Additionally, we point out that index size significantly impacts search performance but less so on insertion performance, indicating the presence of other bottlenecks. We suggest directions for improving the Updatable Learned Index.)
Learned index๋ ๋จธ์ ๋ฌ๋ ๋ชจ๋ธ์ ํ์ฉํด ํค์ ํค์ ์์น๋ฅผ ํ์ตํ์ฌ ์์ฒญ๋ฐ์ ํค์ ์์น๋ฅผ ๋น ๋ฅด๊ฒ ํ์ํ๋ ์ธ๋ฑ์ค ๊ตฌ์กฐ์ด๋ค. Learned index๋ ์ ํต์ ์ธ ์ธ๋ฑ์ค๋ณด๋ค ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ด ์ ๊ณ ํ์ ์ฑ๋ฅ์ด ๋ฐ์ด๋ ์ฃผ๋ชฉ๋ฐ๊ณ ์ ๋ค. ํ์ง๋ง learned index๋ ์ธ๋ฑ์ค๋ฅผ ๋น๋ํ๋ ๊ณผ์ ์์ ๋ด๋ถ ๋จธ์ ๋ฌ๋ ๋ชจ๋ธ์ ํ์ต์์ผ์ผ ํ๊ณ , ๊ทธ ํ์ต ์๊ฐ ์ผ๋ก ์ธํด ์ ํต์ ์ธ ์ธ๋ฑ์ค๋ณด๋ค ๋น๋ ์๊ฐ์ด ํจ์ฌ ๊ธธ๋ค๋ ๋จ์ ์ด ์กด์ฌํ๋ค. ๋ณธ ๋ ผ๋ฌธ์์๋ learned index์ ๋น๋ ์๊ฐ ํ๊ณ๋ฅผ ๊ทน๋ณตํ๊ธฐ ์ํด ๋ด๋ถ ๋จธ์ ๋ฌ๋ ํ์ต ์๊ณ ๋ฆฌ์ฆ์ SIMD๋ฅผ ์ ์ฉํด ํ์ต ์๊ฐ์ ๊ฐ์ํ๋ ๊ธฐ๋ฒ์ ์ ์ ํ๋ค. ๊ทธ ๊ฒฐ๊ณผ, ๋ํ์ ์ธ learned index ๊ตฌ์กฐ์ธ RMI๋ SIMD๋ฅผ ํ์ฉํ ๋ณ๋ ฌ ํ์ต์ ํตํด ๊ธฐ์กด๋ณด๋ค ๋น๋ ์๊ฐ์ ์ต๋ ์ฝ 1.4๋ฐฐ ํฅ์๋๊ณ ํ์ ์๊ฐ๊ณผ ์ค์ฐจ์จ์ ๊ธฐ์กด๊ณผ ๊ฑฐ์ ๋์ผํ ์์น๋ฅผ ๋ณด์๋ค. (The learned index is an index structure that utilizes machine learning models to learn the positions of keys, allowing for quick retrieval of requested key positions. The learned index has gained attention due to its lower memory usage and superior search performance compared to traditional indexes. However, the learned index has a significant drawback: it requires training the internal machine learning model during the index building process, which results in much longer build times compared to traditional indexes. In this paper, we propose a technique to accelerate the training time by applying SIMD to the internal machine learning algorithms, thereby overcoming the build time limitations of the learned index. As a result, the representative learned index structure, RMI, achieved up to approximately 1.4 times faster build times through parallel training using SIMD, while maintaining almost the same search times and error rates as the original.)
๋ฐ์ดํฐ์ ๊ธ์ํ ์ฆ๊ฐ์ ๋ฐ๋ผ, ๊ธฐ์กด์ ์ธ๋ฑ์ค ๋ฐฉ์์ด ํ๊ณ์ ์ง๋ฉดํ๊ฒ ๋์ด ์๋ก์ด ์ ํ์ ์ธ๋ฑ์ค์ธ Learned Index๊ฐ ๊ฐ๋ฐ๋์๋ค. Recursive Model Index (RMI)๋ ๋ฐ์ดํฐ์ ์ ์ฅ๋ ์์น๋ฅผ ์์ธกํ๋ ๊ฒฝ์ ์ ์ธ Learned Index ๊ตฌ์กฐ๋ก, ๊ธฐ์กด ์ธ๋ฑ์ค๋ณด๋ค ๋น ๋ฅธ ์๋ต ์๋์ ์์ ๊ณต๊ฐ์ ์ฌ์ฉ์ ๊ฐ์ง๋ค. SIMD(Single Instruction Multiple Data)๋ ๋จ์ผ ๋ช ๋ น์ด์ ์ฌ๋ฌ ๋ฐ์ดํฐ๋ฅผ ๋ณ๋ ฌ ๊ณ์ฐํ๋ ๋ฐฉ์์ด๋ค. ๋ณธ ๋ ผ๋ฌธ์ RMI์ ์ ์ธก ๊ณผ์ ์ SIMD๋ฅผ ์ ์ฉํ์ฌ ์์ธก ์๋๋ฅผ ํฅ์์ํค๊ณ , ์ฑ๋ฅ ๋ถ์์ ํตํด ๊ทธ ์ํฅ์ ํ๊ฐํ๋ค. SIMD๋ก ๊ฐ ์ํ๋ RMI์ ์์ธก ์ฑ๋ฅ์45%ย 75% ํฅ์๋์๊ณ , ํ์ ์ฑ๋ฅ์ 10%ย 70% ํฅ์๋์๋ค. (With the rapid increase in data, traditional indexing methods have faced limitations, leading to the development of a new type of index, the Learned Index. Recursive Model Index (RMI) is a classical Learned Index structure that predicts the stored position of data, offering faster response times and smaller space usage compared to traditional indexes. SIMD (Single Instruction Multiple Data) is a technique that performs parallel computation on multiple data points with a single instruction. This paper applies SIMD to the prediction process of RMI to enhance prediction speed and evaluates the impact through performance analysis. The prediction performance of SIMD-accelerated RMI improved by 45% to 75%, and the search performance improved by 10% to 70%.)
๊ตฌ๊ธ์์ ๊ฐ๋ฐํ LSM-tree ๊ธฐ๋ฐ์ ํค-๋ฐธ๋ฅ ๋ฐ์ดํฐ๋ฒ ์ด์ค LevelDB๋ ์ฝ๊ธฐ ์ฐ์ฐ์ ์ฑ๋ฅ ์ฆ๊ฐ๋ฅผ ์ํด๋ธ๋ฃธ ํํฐ ๊ธฐ๋ฅ์ ์ ๊ณตํ๊ณ ์๋ค. ํ์ง๋ง ๋ธ๋ฃธ ํํฐ์ false-positive๋ก ์ธํ ์ฑ๋ฅ ์ ํ ํ์์ด ์กด์ฌํ๋ฉฐ, ์ด๋ฌํ false-Positive๋ hit-ratio๊ฐ ๋ฎ์ ๋ฐ์ดํฐ๋ฒ ์ด์ค์์ ๋ ๋ง์ด ๋ฐ์ํ๋ค. ๊ทธ๋ ๊ธฐ์, ๋ณธ ๋ ผ๋ฌธ์์๋ LevelDB์ ReadRandom ์ฐ์ฐ ์ฑ๋ฅ์ ์ธก์ ํจ์ผ๋ก์จ ๋ฐ์ดํฐ๋ฒ ์ด์ค์ hit-ratio์ ๋ฐ๋ฅธ ์ต์ ์ bits-perkey ๊ฐ์ ํ์ธํ์๋ค. (The key-value database LevelDB, developed by Google based on the LSM-tree structure, offers a bloom filter feature to enhance read operation performance. However, this bloom filter can lead to performance degradation due to false positives, which occur more frequently in databases with low hit ratios. Therefore, in this paper, we measured the ReadRandom operation performance of LevelDB to determine the optimal bits-per-key value depending on the databaseโs hit ratio.)
Key-value ์์ผ๋ก ๋ฐ์ดํฐ๋ฅผ ์ ์ฅํ๋ LevelDB๋ LSM-tree ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ๋ฐ์ดํฐ๋ฅผ ํจ์จ์ ์ผ๋ก ์ ์ฅ ํ๋ค. ํ์ง๋ง ๊ณ์ธต์ ๊ตฌ์กฐ๋ก ์ธํด ์ฝ๊ธฐ ์์ ์ ์ฑ๋ฅ์ด ์ ํ๋๋ค. ์ด์ LevelDB๋ ๋น ๋ฅธ ์ฝ๊ธฐ ์์ ์ ์ฑ๋ฅ์ ์ํด SSTable์ ์บ์ฑํ๋ ์ธ๋ฑ์ค ์บ์์ ๋ฐ์ดํฐ๋ฅผ ์บ์ฑํ๋ ๋ธ๋ก ์บ์๋ฅผ ์ฌ์ฉํ๋ค. ๋ณธ ๋ ผ๋ฌธ์์๋ SSTable ์ ๊ตฌ์กฐ์ SSTable์ key-value ๋ฐ์ดํฐ๋ฅผ ์ฝ๋ ์์ ์์ ์บ์ ์ฌ์ฉ์ ๋ถ์ํ์๋ค. ๋ํ ์ธ๋ฑ์ค ์บ์์ ๊ฐ ์์ ๋ธ๋ก ์บ์์ ํฌ๊ธฐ๋ฅผ ์กฐ์ ํ์ฌ ์ฑ๋ฅ ๋ณํ๋ฅผ ๊ด์ฐฐํ์๋ค. ์คํ ๊ฒฐ๊ณผ ๋ฐ์ดํฐ ๋ก๋ ํฌ๊ธฐ์ ๋ฐ๋ผ์ ์ธ๋ฑ์ค ์บ์ ๊ฐ์์ ๋ํ ์ฝ๊ธฐ ์ฑ๋ฅ ํฅ์์ด ๊ด์ฐฐ๋์๊ณ , ๋ธ๋ก ์บ์์ ํฌ๊ธฐ๊ฐ ์ฆ๊ฐํจ์ ๋ฐ๋ผ ์ฝ๊ธฐ ์ฑ๋ฅ ํฅ์์ด ๊ด์ฐฐ ๋์๋ค. (LevelDB, which stores data in key-value pairs, efficiently manages data based on the LSM-tree structure. However, its hierarchical structure leads to reduced read operation performance. To address this, LevelDB utilizes an index cache that caches SSTables and a block cache that caches data to enhance the speed of read operations. This paper analyzes the use of caches in reading key-value data from SSTables and examines the structural aspects of SSTables. Additionally, we adjusted the number of index caches and the size of the block cache to observe performance changes. Experimental results showed that read performance improvement was observed with an increase in the number of index caches depending on the data load size, and read performance enhancement was noted as the size of the block cache increased.)
ํค-๋ฐธ๋ฅ ๋ฐ์ดํฐ๋ฒ ์ด์ค๋ WAL(Write-Ahead-Log) ๋ฐฉ์์ผ๋ก ๋ฐ์ดํฐ์ ์ผ๊ด์ฑ์ ์ ๊ณตํ๋ค. ๊ทธ๋ฌ๋ ๋ฐ์ดํฐ ๋ฅผ WAL์ ์ฐ๋ ๊ณผ์ ๋๋ WAL์์ ๋ณต๊ตฌํ๋ ๊ณผ์ ์์ ๋ฐ์ดํฐ์ ์์์ด ๋ฐ์ํ ์ ์๊ณ RocksDB์์ ๋ ์ด๋ฅผ ์ฒดํฌ์ฌ์ผ๋ก ํ์ธํ๋ค. ๋ณธ ๋ ผ๋ฌธ์์๋ ๋ฉ๋ชจ๋ฆฌ ๋ด ์ฒดํฌ์ฌ์ ์ ๊ณตํ๋ per key-value checksum์ด ์ ์ฉ๋์์ ๋ WAL์ ์ฐ๊ธฐ ๋ฐ ๋ณต๊ตฌ ๊ณผ์ ๊ณผ ์คํ์ ํตํด ์ฑ๋ฅ์ ๋ฏธ์น๋ ์ํฅ์ ๋ํด ๋ถ์ํ๋ค. (Key-value databases provide data consistency using the Write-Ahead-Log (WAL) method. However, data corruption can occur during the process of writing to the WAL or recovering from it, and RocksDB uses checksums to verify this. This paper analyzes the impact on performance of applying per key-value checksums in the writing and recovery processes of the WAL, as tested through experiments.)
์ค๋๋ ๋ฐ์ดํฐ์ ์ข ๋ฅ๊ฐ ๋ค์ํด์ง๊ณ ๋ฐ์ดํฐ ์ ๋ํ ์ฆ๊ฐํ๊ณ ์๋ค. ์ด์ ๊ด๋ จ Google์ฌ์์๋ LSM-Tree ๊ธฐ๋ฐ NoSQL ๋ฐ์ดํฐ ๋ฒ ์ด์ค LevelDB๋ฅผ ๊ฐ๋ฐํ๋ค. LevelDB๋ ๋ฐ์ดํฐ ํ์ผ๋ค์ ํจ๊ณผ์ ์ผ๋ก ๊ด๋ฆฌํ๊ธฐ ์ํด ์ปด ํฉ์ (Compaction)์์ ์ ์ํํ์ฌ ์ค๋ณต๋ ๋ฐ์ดํฐ๋ฅผ ์ ๊ฑฐํ๊ณ ์๋กญ๊ฒ ์ ๋ ฌ๋ ๋ฐ์ดํฐ ํ์ผ์ ์์ฑํ๋ฉฐ ์ด๋ ์ฑ๋ฅ์ ํฐ ์ํฅ์ ์ค๋ค. ๋ฐ๋ผ์ ๋ณธ Compaction์ ์คํํ๋ ๋ฐฉ๋ฒ์ ๋ฐ๋ผ ์ฑ๋ฅ์ ์ํฅ์ ์ค ๊ฒ์ด๋ผ ํ๋จํ์ฌ Compaction ์์ ์ ์คํํ๋ 2๊ฐ์ง ๊ตฌ์กฐ์ ๋ํด ์์๋ณด๊ณ ์คํ ์ฌ๋ถ์ ๋ฐ๋ผ Read ์ฑ๋ฅ ๋ณํ๋ฅผ ๊ด์ฐฐํ๋ค. (As the variety and volume of data continue to grow today, Google has developed the LSM-Tree based NoSQL database, LevelDB. LevelDB performs compaction processes to effectively manage data files, removing duplicate data and creating newly organized data files, which significantly impacts performance. Therefore, considering that the method of executing this compaction could affect performance, this study explores two different structures for performing compaction and observes changes in read performance based on whether or not compaction is executed.)