20 years ago, an internet search for the word ‘biobank’ would have returned almost nothing; today, there are millions of results. The term biobank commonly refers to a large, organised collection of well-characterised tissue samples such as surgical biopsies (fresh frozen or in paraffin sections), blood and serum samples, different cell types and DNA – all carefully collected for research purposes.1 The implication of biobanking is that the tissue samples will be collected with associated biological and medical data (such as biochemical test data and imaging data).
There is value to having large collections of high-quality samples; however, the exponential added value is in the linked data and the clinical information that relates to them. The science of biobanking is very broad and covers collections of plant, animal or human specimens, as represented in the International Society for Biological and Environmental Repositories (ISBER).2 For the purposes of this article, we will focus on human biobanks. However, it should be noted that there are major initiatives to digitise biobanking across the scientific disciplines such as the Global Genome Biodiversity Network (GGBN).3
One of the first high-profile calls for the establishment and support of biobanking initiatives came from the Organisation for Economic Co-operation and Development (OECD), which advocated the importance of biobanks but also insisted on data being an integral part of biobanking.4 Following this OECD proposal, various countries established biobanks as research infrastructure in precision medicine research.
Thus, many national biobanks and biobank networks were created in Canada, the UK, USA, China, Estonia, South Korea, Finland, Denmark, Sweden and many other countries. It has since been well-demonstrated that biobanks crucially underpin and facilitate the national and international medical research efforts by providing high-quality, research-ready samples together with linked clinical data, according to a set of best practices and standards.
Medical research in the era of precision medicine is based on the analysis of samples with clinical data – and, because the associations are often weak, both these samples and data are required in large quantities. This is the basic premise of introducing research approaches such as ‘big data analytics’ and ‘artificial intelligence’ in precision medicine research.
The implication is clear: the more well-characterised, high-quality samples and associated data are available through biobanks, the faster research will advance and impact the delivery of healthcare today. Therefore, there is a growing requirement on biobanks to have increased capacity and sufficient informatics capabilities in order to ensure these demands are met.
As a consequence, modern biobanking is shifting its focus from sample-driven to data-driven strategies. However, in order to fulfil the opportunities promised by precision medicine, challenges remain on the road ahead.
As medical researchers ‘think bigger’ than ever before, their need for data grows ever stronger. It demands the gathering and administration of large collections of samples and related data, often from multiple sources. ‘Big data’ is an umbrella term which describes, in the simplest words, the use of new computerised technologies and software developed to extract knowledge – and ultimately interpret it into actions – from very large volumes of heterogeneous data, such as biological and medical data.
In order to achieve this high-velocity capture, discovery and processing computerised mechanisms are deployed. ‘Big data’ requires a paradigm shift in both storage requirements and data analysis – both of these are parameters that directly impact biobanking.
One of the crucial questions that still remains to be answered is the handling of all the data. For example, technological advancement has meant that the cost of genomic analyses, in order to identify patterns in the genetic materials of samples that might be linked to particular diseases or conditions, is now affordable for large numbers of samples. This availability of genomic testing has generated additional data storage requirements which are greater by several orders of magnitude than before.
Additionally, the information is produced at a much greater pace, hence biobanks that need to store genomic information need to address both speed and quantity needs – to the highest standard possible. The same is also true for other sample information, such as other ‘-omics’ technologies (proteomics, transcriptomics, etc.) as well as sample biodiversity information, such as microbiome data.
Addressing the above, many different solutions have emerged: from cloud-based storage, to creating secure, fast links to hospital-based data pools, to creating dedicated ‘safe-haven’ data storage. At the moment, emerging solutions to such operational requirements reflect for the most part local needs and as such are not necessarily ‘universal’ to the entire biobanking field.
Often the ‘big data’ approach in biobanking is challenging for individuals, single projects, or small research groups because of the high costs in time, technological resources and the funding involved. Hence, ‘virtual biobanks’ started forming with institutional collaboration and geographically distributed forms of endeavour. These can include any combination of publicly funded institutions and/or private partners. One such major collaborative effort is the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-ERIC).5
The pan-European BBMRI vision emerged from the recognition that keeping up with policies and developments elsewhere, most notably in the USA (for example the successes from Stanford and the All of Us Research Program), necessitates integrated European research. The BBMRI-ERIC aims to make treatments possible through the centralised access to well-curated information and samples throughout the many European biobanks in a standardised fashion in addition to providing relevant technical and legal information.
As biobanks are important sources for the provision of research-ready tissue as well as associated data, they can face a dual bottleneck of analytical laboratory harmonisation and analytical data curation. These aspects are interconnected, and both can directly affect the biobank utilisation rates.
Variations associated with collecting, processing and storing different samples and accompanying clinical data make it extremely difficult to extrapolate or to merge data from different studies. Without that information, it’s easy to introduce invisible bias into the work, leading to irreproducible work. Therefore, the standardisation and harmonisation of biobanking practices are of paramount importance.
It is well-documented that standardisation can avoid the effects of pre-analytical variables which in turn can affect the quality of samples and downstream analyses.6 Similarly, adopting metrics that will assist the data standardisation could potentially reduce inherent biases and facilitate comparative analyses. For example, each biobank operating within a separate NHS geographical area in the UK interacts with different clinical systems. Given the lack of standard terminology across the NHS, there is an inevitable consequence that the biobanks will be using in-part differing clinical terms.
As such, one of the core challenges that remains is bringing harmonisation to the field in order to address the array of data standards and data terms that can be used. In terms of the laboratory harmonisation, major international steps have been accomplished with the recent publication of the ISBER Best Practices (4th ed.) and the ISO standard in biobanking, both in 2018.7 The data harmonisation is lagging temporarily; however, there are major efforts in place to address this aspect too.
The mission of biobanks is to collect samples and data, and through this function provide support to research and underpin the discovery of new treatments, speeding up the overall rate of discovery. However, as these ‘big data’ approaches in precision or personalised medicine require access to personal and often identifiable information, they become inexorably coupled with ethical challenges.
For example, there is an increased need for information security so that sensitive information remains protected. Biobanks have been well accustomed to such needs and have been at the forefront of adapting to new legislations such as the EU General Data Protection Regulation (GDPR).8
Nevertheless, the technical complexity may impact the nature of informed consent. In other words, individuals donating samples may not be able to fully appreciate the types of research conducted. Furthermore, there is the need/ability to return the outcomes of tests to the patients who originally donated their samples, or at least those who have requested so. Considered together, a constructive and transparent inclusion of ethical questioning in this rapidly evolving field is necessary to support the societal acceptance and responsible development of the technological advancement.
Recent advances in health research and technology have increased demands on the types of samples and data provided by biobanks. On the other hand, research-funding agencies and institutions often assume that, beyond the initial start-up operational and infrastructure costs, biobanks at some point should be become ‘self-sustaining’.
This is rarely achievable and does not represent most biobanks attached to integrated academic/health institutions, such as those assisting with rare genetic conditions research.9 Thus, new or more flexible operating and funding models are needed to support the growth of biobanking in the medium and long term, able to achieve the full potential of opportunities opening up as a result of precision medicine research.
The authors alone are responsible for the views expressed in this article and they do not necessarily represent the views, decisions or policies of the institutions with which they are affiliated.