Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression

Author(s): Tekin Bicer, Jian Yin, Gagan Agrawal
Venue: 2014 ACM/IEEE International Symposium on Cluster, Cloud and Grid Computing
Date: 2014


Using an increasing number of cores in parallel computer systems are allowing scientific simulations to be executed with increasing spatial and temporal granularity. However, this also leads to bigger data sets that need to be output, stored, managed, and then visualized and/or analyzed using a variety of methods. The researchers of this study wanted to see if they could use compression to help solve this issue, since compression can decrease the size of generated data sets, and by reducing the volume of data to be written on disks, lower execution times.

The researchers looked into popular libraries in which to write their compression algorithms. They decided that either Parallel NetCDF (PnetCDF) or HDF5 formats were good for what they were trying to do. Both of these formats provide high performance parallel I/O operations, making it great for what they were going to do. They first started with compression with sparse storage, and then moved into compression with dense storage. In sparse storage, the data was compressed with some unused space in between. In dense storage, data was compressed altogether. The experiment showed that dense storage reduced execution time by 50%, while sparse storage reduced it by 35%.

The researchers thought that this research experiment was a success and that different experiments show that compression
can significantly improve the I/O throughput of applications.