Compression of Very Sparse Column Oriented Data

Autores

  • Vinicius Fulber Garcia UFSM
  • Sergio Luis Sardi Mergen Universidade Federal de Santa Maria (UFSM), Santa Maria

DOI:

https://doi.org/10.5902/2448190422772

Palavras-chave:

compression, column oriented databases

Resumo

Column oriented databases store columns contiguously on disk. The adjacency of values from the same domain leads to a reduced information entropy. Consequently, compression algorithms are able to achieve better results. Columns whose values have a high cardinality are usually compressed using variations of the LZ method. In this paper, we consider the usage of simpler methods based on run-length and symbols probability in scenarios where datasets are very sparse. Our experiments show in which cases the simple methods evaluated provide promising results.

Downloads

Não há dados estatísticos.

Referências

Abadi, D., Madden, S., e Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 671–682. ACM.

Abadi, D. J. et al. (2007). Column stores for wide and sparse data. In CIDR, pages 292–297.

Ailamaki, A., DeWitt, D. J., Hill, M. D., e Skounakis, M. (2001). Weaving relations for cache performance. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pages 169–180, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Burrows, M. e Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm (technical report).

Council, T. P. P. (2008). Tpc-h benchmark specification.

Deutsch, L. P. (1996). Deflate compressed data format specification version 1.3.

Huffman, D. A. et al. (1952). A method for the construction of minimum redundancy codes. Proceedings of the IRE, 40(9):1098–1101.

Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., e Bear, C. (2012).

The vertica analytic database: C-store 7 years later. Proc. VLDB Endow., 5(12):1790–1801.

Matei, G. e Bank, R. C. (2010). Column-oriented databases, an alternative for analytical environment. Database Systems Journal, 1(2):3–16.

Skodras, A., Christopoulos, C., e Ebrahimi, T. (2001). The jpeg 2000 still image compression standard. IEEE Signal processing magazine, 18(5):36–58.

Witten, I. H., Neal, R. M., e Cleary, J. G. (1987). Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540.

Ziv, J. e Lempel, A. (1978). Compression of individual sequences via variable-rate coding. Information Theory, IEEE Transactions on, 24(5):530–536.

Zukowski, M., Heman, S., Nes, N., e Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE ’06, pages 59–, Washington, DC, USA. IEEE Computer Society.

Downloads

Publicado

2016-10-11

Como Citar

Garcia, V. F., & Mergen, S. L. S. (2016). Compression of Very Sparse Column Oriented Data. Revista ComInG - Communications and Innovations Gazette, 1(2), 61–73. https://doi.org/10.5902/2448190422772

Edição

Seção

Artigos científicos

Artigos mais lidos pelo mesmo(s) autor(es)