From Data Islands to Sharing Data in the Cloud: the Evolution of Data Integration in Biological Data Repositories
Biological data repositories were often data islands with unharmo- nized formats, models, and protocols. Their integration evolved along the years and sharing data in multi-tenant infrastructures is a reality now. In this article, we illustrate this evolution by presenting real-world cases from the bioinformat- ics area and collect the best practices and current trends that future solutions should observe from these examples. Finally, we situate the platform being cre- ated by the BiobankCloud project in the scenario of integrating biological data.
Alves, F., Cogo, V. V., Wandelt, S., Leser, U., and Bessani, A. (2015). On-demand index- ing for referential compression of DNA sequences. PLoS ONE, 10(7):e0132460.
Bessani, A. et al. (2013). DepSky: Dependable and secure storage in cloud-of-clouds. ACM Transactions on Storage, 9(4).
Bessani, A. et al. (2015). BiobankCloud: a platform for the secure storage, sharing, and processing of large biomedical data sets. In the First International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH 2015).
Brandt, J., Bux, M., and Leser, U. (2015). Cuneiform: a functional language for large scale scientific data analysis. In Proceedings of the Workshops of the EDBT/ICDT, vol. 1330, pages 7–16.
Brazhnik, O. and Jones, J. F. (2007). Anatomy of data integration. J. Biomed. Inform., 40(3):252–269.
Bux, M., Brandt, J., Lipka, C., Hakimzadeh, K., Dowling, J., and Leser, U. (2015). SAAS- FEE: Scalable scientific workflow execution engine. Proceedings of the VLDB Endow- ment, 8(12).
Cochrane, G., Karsch-Mizrachi, I., and Nakamura, Y. (2011). The international nucleotide sequence database collaboration. Nucleic Acids Res., 39(suppl 1):D15–D18.
Cogo, V. V., Bessani, A., Couto, F. M., and Verissimo, P. (2015). A high-throughput method to detect privacy-sensitive human genomic data. In Proc. of the Workshop on Privacy in the Electronic Society (WPES 2015).
Crosswell, L. C. and Thornton, J. M. (2012). ELIXIR: a distributed infrastructure for European biological data. Trends Biotechnol., 30(5):241–242.
Goecks, J., Nekrutenko, A., Taylor, J., et al. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11(8):R86.
Haas, L. M. et al. (2001). DiscoveryLink: A system for integrated access to life sciences data sources. IBM Syst. J., 40(2):489–511.
Hernandez, T. and Kambhampati, S. (2004). Integration of biological sources: current systems and challenges ahead. SIGMOD Rec., 33(3):51–60.
Lenzerini, M. (2002). Data integration: A theoretical perspective. In Proc. of the 21st PODS, pages 233–246. ACM.
Litton, J.-E. et al. (2003). Data modeling and data communication in GenomEUtwin. Twin Res., 6(5):383–390.
Louie, B. et al. (2007). Data integration and genomic medicine. J. Biomed. Inform., 40(1):5–16.
Marx, V. (2013). Biology: The big challenges of big data. Nature, 498(7453):255–260. Mayer, G. (2009). Data management in systems biology I - overview and bibliography.
CoRR, abs/0908.0411.
Muilu, J., Peltonen, L., and Litton, J.-E. (2007). The federated database—a basis for biobank-based post-genome studies, integrating phenome and genome data from 600 000 twin pairs in Europe. Eur. J. Hum. Genet., 15(7):718–723.
Mu ̈ller, H. et al. (2015). State-of-the-art and future challenges in the integration of biobank catalogues. In Smart Health, pages 261–273. Springer.
Norlin, L. et al. (2012). A minimum data set for sharing biobank samples, information, and data: MIABIS. Biopreserv. Biobank, 10(4):343–348.
Ollier, W., Sprosen, T., and Peakman, T. (2005). UK Biobank: from concept to reality. Pharmacogenomics J., 6(6):639–646.
Sheth, A. P. (1999). Changing focus on interoperability in information systems: from sys- tem, syntax, structure to semantics. In Interoperating geographic information systems, pages 5–29. Springer.
Stein, L. D. (2003). Integrating biological databases. Nat. Rev. Genet., 4(5):337–345.
Verissimo, P. E. and Bessani, A. (2013). E-biobanking: What have you done to my cell samples? IEEE Security&Privacy, 11(6):62–65.
Como Citar
Os manuscritos aceitos e publicados são de propriedade da revista ComInG.
Os originais deverão ser acompanhados de documentos de transferência de direitos autorais contendo assinatura dos autores.
A carta de direitos autorais deve ser enviada para o e-mail
É vedada a submissão integral ou parcial do manuscrito a qualquer outro periódico. A responsabilidade do conteúdo dos artigos é exclusiva dos autores.
É vedada a tradução para outro idioma sem a autorização escrita do Editor ouvida a Comissão Editorial.
Manuscripts accepted and published are the property of the journal ComInG.
The originals must be accompanied by documentation of copyright transfer containing the signature of the authors.
You may not submit full or partial manuscript to another journal. The responsibility of the article's content is exclusive of the authors.
You may not translating into another language without the written permission of the Editor after consultation with the Editorial Board.