From Data Islands to Sharing Data in the Cloud: the Evolution of Data Integration in Biological Data Repositories

Autores

  • Vinicius Vielmo Cogo LaSIGE, Faculdade de Ciências Universidade de Lisboa, Portugal
  • Alysson Neves Bessani LaSIGE, Faculdade de Ciências Universidade de Lisboa, Portugal

DOI:

https://doi.org/10.5902/2448190421133

Resumo

Biological data repositories were often data islands with unharmo- nized formats, models, and protocols. Their integration evolved along the years and sharing data in multi-tenant infrastructures is a reality now. In this article, we illustrate this evolution by presenting real-world cases from the bioinformat- ics area and collect the best practices and current trends that future solutions should observe from these examples. Finally, we situate the platform being cre- ated by the BiobankCloud project in the scenario of integrating biological data.

Downloads

Não há dados estatísticos.

Referências

Alves, F., Cogo, V. V., Wandelt, S., Leser, U., and Bessani, A. (2015). On-demand index- ing for referential compression of DNA sequences. PLoS ONE, 10(7):e0132460.

Bessani, A. et al. (2013). DepSky: Dependable and secure storage in cloud-of-clouds. ACM Transactions on Storage, 9(4).

Bessani, A. et al. (2015). BiobankCloud: a platform for the secure storage, sharing, and processing of large biomedical data sets. In the First International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH 2015).

Brandt, J., Bux, M., and Leser, U. (2015). Cuneiform: a functional language for large scale scientific data analysis. In Proceedings of the Workshops of the EDBT/ICDT, vol. 1330, pages 7–16.

Brazhnik, O. and Jones, J. F. (2007). Anatomy of data integration. J. Biomed. Inform., 40(3):252–269.

Bux, M., Brandt, J., Lipka, C., Hakimzadeh, K., Dowling, J., and Leser, U. (2015). SAAS- FEE: Scalable scientific workflow execution engine. Proceedings of the VLDB Endow- ment, 8(12).

Cochrane, G., Karsch-Mizrachi, I., and Nakamura, Y. (2011). The international nucleotide sequence database collaboration. Nucleic Acids Res., 39(suppl 1):D15–D18.

Cogo, V. V., Bessani, A., Couto, F. M., and Verissimo, P. (2015). A high-throughput method to detect privacy-sensitive human genomic data. In Proc. of the Workshop on Privacy in the Electronic Society (WPES 2015).

Crosswell, L. C. and Thornton, J. M. (2012). ELIXIR: a distributed infrastructure for European biological data. Trends Biotechnol., 30(5):241–242.

Goecks, J., Nekrutenko, A., Taylor, J., et al. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11(8):R86.

Haas, L. M. et al. (2001). DiscoveryLink: A system for integrated access to life sciences data sources. IBM Syst. J., 40(2):489–511.

Hernandez, T. and Kambhampati, S. (2004). Integration of biological sources: current systems and challenges ahead. SIGMOD Rec., 33(3):51–60.

Lenzerini, M. (2002). Data integration: A theoretical perspective. In Proc. of the 21st PODS, pages 233–246. ACM.

Litton, J.-E. et al. (2003). Data modeling and data communication in GenomEUtwin. Twin Res., 6(5):383–390.

Louie, B. et al. (2007). Data integration and genomic medicine. J. Biomed. Inform., 40(1):5–16.

Marx, V. (2013). Biology: The big challenges of big data. Nature, 498(7453):255–260. Mayer, G. (2009). Data management in systems biology I - overview and bibliography.

CoRR, abs/0908.0411.

Muilu, J., Peltonen, L., and Litton, J.-E. (2007). The federated database—a basis for biobank-based post-genome studies, integrating phenome and genome data from 600 000 twin pairs in Europe. Eur. J. Hum. Genet., 15(7):718–723.

Mu ̈ller, H. et al. (2015). State-of-the-art and future challenges in the integration of biobank catalogues. In Smart Health, pages 261–273. Springer.

Norlin, L. et al. (2012). A minimum data set for sharing biobank samples, information, and data: MIABIS. Biopreserv. Biobank, 10(4):343–348.

Ollier, W., Sprosen, T., and Peakman, T. (2005). UK Biobank: from concept to reality. Pharmacogenomics J., 6(6):639–646.

Sheth, A. P. (1999). Changing focus on interoperability in information systems: from sys- tem, syntax, structure to semantics. In Interoperating geographic information systems, pages 5–29. Springer.

Stein, L. D. (2003). Integrating biological databases. Nat. Rev. Genet., 4(5):337–345.

Verissimo, P. E. and Bessani, A. (2013). E-biobanking: What have you done to my cell samples? IEEE Security&Privacy, 11(6):62–65.

Downloads

Publicado

2016-02-03

Como Citar

Cogo, V. V., & Bessani, A. N. (2016). From Data Islands to Sharing Data in the Cloud: the Evolution of Data Integration in Biological Data Repositories. Revista ComInG - Communications and Innovations Gazette, 1(1), 01–11. https://doi.org/10.5902/2448190421133

Edição

Seção

Artigos científicos