Compression of Very Sparse Column Oriented Data
DOI:
https://doi.org/10.5902/2448190422772Palavras-chave:
compression, column oriented databasesResumo
Column oriented databases store columns contiguously on disk. The adjacency of values from the same domain leads to a reduced information entropy. Consequently, compression algorithms are able to achieve better results. Columns whose values have a high cardinality are usually compressed using variations of the LZ method. In this paper, we consider the usage of simpler methods based on run-length and symbols probability in scenarios where datasets are very sparse. Our experiments show in which cases the simple methods evaluated provide promising results.Downloads
Referências
Abadi, D., Madden, S., e Ferreira, M. (2006). Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 671–682. ACM.
Abadi, D. J. et al. (2007). Column stores for wide and sparse data. In CIDR, pages 292–297.
Ailamaki, A., DeWitt, D. J., Hill, M. D., e Skounakis, M. (2001). Weaving relations for cache performance. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, pages 169–180, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Burrows, M. e Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm (technical report).
Council, T. P. P. (2008). Tpc-h benchmark specification.
Deutsch, L. P. (1996). Deflate compressed data format specification version 1.3.
Huffman, D. A. et al. (1952). A method for the construction of minimum redundancy codes. Proceedings of the IRE, 40(9):1098–1101.
Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., e Bear, C. (2012).
The vertica analytic database: C-store 7 years later. Proc. VLDB Endow., 5(12):1790–1801.
Matei, G. e Bank, R. C. (2010). Column-oriented databases, an alternative for analytical environment. Database Systems Journal, 1(2):3–16.
Skodras, A., Christopoulos, C., e Ebrahimi, T. (2001). The jpeg 2000 still image compression standard. IEEE Signal processing magazine, 18(5):36–58.
Witten, I. H., Neal, R. M., e Cleary, J. G. (1987). Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540.
Ziv, J. e Lempel, A. (1978). Compression of individual sequences via variable-rate coding. Information Theory, IEEE Transactions on, 24(5):530–536.
Zukowski, M., Heman, S., Nes, N., e Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22Nd International Conference on Data Engineering, ICDE ’06, pages 59–, Washington, DC, USA. IEEE Computer Society.
Downloads
Publicado
Como Citar
Edição
Seção
Licença
Os manuscritos aceitos e publicados são de propriedade da revista ComInG.
Os originais deverão ser acompanhados de documentos de transferência de direitos autorais contendo assinatura dos autores.
A carta de direitos autorais deve ser enviada para o e-mail coming@inf.ufsm.br
É vedada a submissão integral ou parcial do manuscrito a qualquer outro periódico. A responsabilidade do conteúdo dos artigos é exclusiva dos autores.
É vedada a tradução para outro idioma sem a autorização escrita do Editor ouvida a Comissão Editorial.
ENGLISH
Manuscripts accepted and published are the property of the journal ComInG.
The originals must be accompanied by documentation of copyright transfer containing the signature of the authors.
You may not submit full or partial manuscript to another journal. The responsibility of the article's content is exclusive of the authors.
You may not translating into another language without the written permission of the Editor after consultation with the Editorial Board.