Compression of Very Sparse Column Oriented Data

Autor/innen

  • Vinicius Fulber Garcia UFSM
  • Sergio Luis Sardi Mergen UFSM

DOI:

https://doi.org/10.5902/2448190422772

Schlagworte:

compression, column oriented databases

Abstract

Column oriented databases store columns contiguously on disk. The adjacency of values from the same domain leads to a reduced information entropy. Consequently, compression algorithms are able to achieve better results. Columns whose values have a high cardinality are usually compressed using variations of the LZ method. In this paper, we consider the usage of simpler methods based on run-length and symbols probability in scenarios where datasets are very sparse. Our experiments show in which cases the simple methods evaluated provide promising results.

Downloads

Keine Nutzungsdaten vorhanden.

Literaturhinweise

Abadi, D., Madden, S., e Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 671–682. ACM.

Abadi, D. J. et al. (2007). Column stores for wide and sparse data. In CIDR, pages 292–297.

Ailamaki, A., DeWitt, D. J., Hill, M. D., e Skounakis, M. (2001). Weaving relations for cache performance. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pages 169–180, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Burrows, M. e Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm (technical report).

Council, T. P. P. (2008). Tpc-h benchmark specification.

Deutsch, L. P. (1996). Deflate compressed data format specification version 1.3.

Huffman, D. A. et al. (1952). A method for the construction of minimum redundancy codes. Proceedings of the IRE, 40(9):1098–1101.

Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., e Bear, C. (2012).

The vertica analytic database: C-store 7 years later. Proc. VLDB Endow., 5(12):1790–1801.

Matei, G. e Bank, R. C. (2010). Column-oriented databases, an alternative for analytical environment. Database Systems Journal, 1(2):3–16.

Skodras, A., Christopoulos, C., e Ebrahimi, T. (2001). The jpeg 2000 still image compression standard. IEEE Signal processing magazine, 18(5):36–58.

Witten, I. H., Neal, R. M., e Cleary, J. G. (1987). Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540.

Ziv, J. e Lempel, A. (1978). Compression of individual sequences via variable-rate coding. Information Theory, IEEE Transactions on, 24(5):530–536.

Zukowski, M., Heman, S., Nes, N., e Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE ’06, pages 59–, Washington, DC, USA. IEEE Computer Society.

Downloads

Veröffentlicht

2016-10-11

Zitationsvorschlag

Garcia, V. F., & Mergen, S. L. S. (2016). Compression of Very Sparse Column Oriented Data. Revista ComInG - Communications and Innovations Gazette, 1(2), 61–73. https://doi.org/10.5902/2448190422772

Ausgabe

Rubrik

Artigos científicos

Am häufigsten gelesenen Artikel dieser/dieses Autor/in