Empirical Evaluation of Decentralized Genomic Data Computation Using Bacalhau and IPFS

  • Bagas Triaji Universitas Teknologi Digital Indonesia, Indonesia
  • Badiyanto Badiyanto Universitas Teknologi Digital Indonesia, Indonesia
  • Justivan Intifadhah Afif Universitas Teknologi Digital Indonesia, Indonesia
Keywords: Decentralized System, IPFS Cluster, Genomic, Bacalhau, Compute over data

Abstract

Large-scale genomic analysis typically relies on centralized infrastructures, creating conflicts between collaborative needs and data sovereignty regulations. This study solves this dilemma by evaluating a decentralized architecture designed to facilitate secure, inter-institutional genomic computation without moving raw data. We integrated Bacalhau for orchestration and IPFS Cluster with CRDT consensus for storage, employing AES-256 encryption. A quantitative evaluation was conducted on AWS using five t3.medium nodes to simulate a resource-constrained hospital network. We tested three scenarios: a centralized baseline (SSH+SCP), an ideal decentralized workflow, and a "chaos" scenario involving active network fault injection. While the centralized baseline was the fastest (Mean=37.69s), the decentralized architecture incurred a manageable ~30% overhead under ideal conditions (Mean=49.22s, SD=1.58s). Critically, under chaos fault injection, although execution time increased to 90.67s (SD=17.84s), the system achieved a superior 100% job completion rate compared to the fragile baseline. This research quantifies the trade-off between execution speed and system resilience in a healthcare context. We demonstrate that this architecture prioritizes data sovereignty and high availability over raw speed, offering a proven model for privacy-critical Decentralized Science (DeSci) collaborations.

Downloads

Download data is not yet available.

References

M. Bourgey et al., “GenPipes: An open-source framework for distributed and scalable genomic analyses,” Gigascience, vol. 8, no. 6, Jun. 2019, doi: 10.1093/gigascience/giz037.

B. Liu et al., “Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses,” J Biomed Inform, vol. 49, pp. 119–133, 2014, doi: 10.1016/j.jbi.2014.01.005.

M. Beyene, P. A. Toussaint, S. Thiebes, M. Schlesner, B. Brors, and A. Sunyaev, “A scoping review of distributed ledger technology in genomics: Thematic analysis and directions for future research,” Aug. 01, 2022, Oxford University Press. doi: 10.1093/jamia/ocac077.

T. Zhao, F. Wang, R. Mott, J. Dekkers, and H. Cheng, “Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality,” Genetics, vol. 226, no. 3, Mar. 2024, doi: 10.1093/genetics/iyad210.

P. Kang, W. Yang, and J. Zheng, “Blockchain Private File Storage-Sharing Method Based on IPFS,” Sensors, vol. 22, no. 14, Jul. 2022, doi: 10.3390/s22145100.

Y. Zhang, M. Zhong, X. Zhao, C. Curtis, X. Li, and C. Chen, “Enabling privacy-preserving sharing of genomic data for GWASs in decentralized networks,” in WSDM 2019 - Proceedings of the 12th ACM International Conference on Web Search and Data Mining, Association for Computing Machinery, Inc, Jan. 2019, pp. 204–212. doi: 10.1145/3289600.3290983.

T. T. Kuo et al., “iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching,” Jul. 21, 2020, BioMed Central Ltd. doi: 10.1186/s12920-020-0715-0.

D. Copeland and A. Taylor, “A novel encryption protocol for facilitating de-identification of genomics health data,” Int J Popul Data Sci, vol. 9, no. 5, Sep. 2024, doi: 10.23889/ijpds.v9i5.2907.

M. Shabani, “Blockchain-based platforms for genomic data sharing: a de-centralized approach in response to the governance problems?,” Jan. 01, 2019, Oxford University Press. doi: 10.1093/jamia/ocy149.

A. A. Corodescu et al., “Locality-aware workflow orchestration for big data,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Nov. 2021, pp. 62–70. doi: 10.1145/3444757.3485106.

G. Gürsoy, C. M. Brannon, E. Ni, S. Wagner, A. Khanna, and M. Gerstein, “Storing and analyzing a genome on a blockchain,” Genome Biol, vol. 23, no. 1, Dec. 2022, doi: 10.1186/s13059-022-02699-7.

R. P. Adelson et al., “Empirical design of a variant quality control pipeline for whole genome sequencing data using replicate discordance,” Sci Rep, vol. 9, no. 1, Dec. 2019, doi: 10.1038/s41598-019-52614-7.

S. N. Kobren et al., “Commonalities across computational workflows for uncovering explanatory variants in undiagnosed cases,” Genetics in Medicine, vol. 23, no. 6, pp. 1075–1085, Jun. 2021, doi: 10.1038/s41436-020-01084-8.

P. S. Almeida, “Approaches to Conflict-free Replicated Data Types,” ACM Comput Surv, vol. 57, no. 2, Nov. 2024, doi: 10.1145/3695249.

D. Cotroneo, L. De Simone, and R. Natella, “ThorFI: a Novel Approach for Network Fault Injection as a Service,” Journal of Network and Computer Applications, vol. 201, May 2022, doi: 10.1016/j.jnca.2022.103334.

A. Basiri et al., “Chaos Engineering,” IEEE Softw, vol. 33, no. 3, pp. 35–41, May 2016, doi: 10.1109/MS.2016.60.

C. Diekmann, L. Hupel, J. Michaelis, M. Haslbeck, and G. Carle, “Verified iptables Firewall Analysis and Verification,” J Autom Reason, vol. 61, no. 1–4, pp. 191–242, Jun. 2018, doi: 10.1007/s10817-017-9445-1.

W. Hoarau, S. Tixeuil, and F. Vauchelles, “Fault Injection in Distributed Java Applications.”

R. Chandra, R. M. Lefever, K. Joshi, M. Cukier, and W. H. Sanders, “A Global-State-Triggered Fault Injector for Distributed System Evaluation *.”

P. Singhal, “Orchestration Workflows in Distributed Systems: A Systematic Analysis of Efficiency Optimization and Service Coordination.” [Online]. Available: www.ijfmr.com

D. Trautwein et al., “Design and evaluation of ipfs: A storage layer for the decentralizedweb,” in SIGCOMM 2022 - Proceedings of the ACM SIGCOMM 2022 Conference, Association for Computing Machinery, Inc, Aug. 2022, pp. 739–752. doi: 10.1145/3544216.3544232.

O. A. Lajam and T. A. Helmy, “Performance evaluation of IPFS in private networks,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Feb. 2021, pp. 77–84. doi: 10.1145/3456146.3456159.

S. Ma, Y. Cao, and L. Xiong, “Efficient logging and querying for blockchain-based cross-site genomic dataset access audit,” BMC Med Genomics, vol. 13, Jul. 2020, doi: 10.1186/s12920-020-0725-y.

R. Hariharan, “Resilience Engineering in Distributed Cloud Architectures,” International Journal of Engineering and Architecture, vol. 2, no. 1, pp. 39–75, May 2025, doi: 10.58425/ijea.v2i1.355.

G. Mandinyenya and V. Malele, “Comparative Security and Performance Evaluation of IPFS and Filecoin for Off-chain Blockchain Storage,” The Indonesian Journal of Computer Science, vol. 14, no. 4, Aug. 2025, doi: 10.33022/ijcs.v14i4.4968.

Published
2025-12-18
Abstract views: 0 times
Download PDF: 0 times
How to Cite
Triaji, B., Badiyanto, B., & Afif, J. (2025). Empirical Evaluation of Decentralized Genomic Data Computation Using Bacalhau and IPFS. Journal of Information Systems and Informatics, 7(4), 4098-4112. https://doi.org/10.63158/journalisi.v7i4.1311
Section
Articles