The Human Genomes Platform Project (HGPP) is a nationally-funded collaborative research project aiming to enhance capability for securely and responsibly sharing human genomics research data. National and international connectivity will maximise the utility of these sensitive and valuable assets. The partners on the project represent many of the largest human genome sequencing and analysis efforts in Australia.
A major challenge to human genome data sharing is navigating restrictions on secondary use. Decisions on how and to whom to grant access to data require significant human effort by DAC (Data Access) Committees. This manual approach is slow and burdensome. The aims of the DAC Automation sub-project are to explore a new data access request and approval paradigm driven by automation for the national human genome research community. Once a researcher/clinician applies to the DAC Committee for access to relevant data from a participating holding organisation, DAC Committees will be able to quickly and easily determine whether access is permitted for the requested purpose. There can be hesitancy from Data Owners and DAC Committees around automation methods. Understandably this can include fears that automation may take away some of the important controls over data use. This needs to be taken into consideration as we progress. However, this new paradigm will improve a DAC Committee’s evaluation of data access requests for any data set for a requested purpose.
The initial focus of the DAC Automation sub-project team (from here on referred to as “the team”) was a discovery and recording phase to define the current state of data access requests and data sharing agreements within the community, the set of problems that need to be addressed, and key sub-project areas and their (likely) requirements.
For an Australian genomic data sharing system to be successful, widespread adoption of the new processes and systems is necessary. Therefore, any proposed system must take this into consideration. To ensure the current environment and challenges are well understood, the team used several techniques to understand the current state and the future needs of the national human genome research community including: project partner interviews, synchronous (workshops, meetings) and asynchronous (communication tools, kanban boards and shared repositories) discussion and review, consultation with influential stakeholders not participating in the project (MCRI/VCGS and CSIRO), a survey of human genome researchers to validate user stories recorded by the project team.
The DAC Automation Discovery Phase Report (this document) records: the current state of processes and tools for data access requests and data sharing across the community, national community needs, gap analysis, and identification of international projects with potential solution components for piloting in later project stages.
This document will be the reference for planning the pilot for a system that addresses prioritised requirements to create a Minimum Viable Product (MVP). The audience for this document includes the team, the HGPP stakeholders and the project reference group.
The Human Genomes Platform Project (HGPP) is a nationally-funded collaborative research project aiming to enhance capability for securely and responsibly sharing human genomics research data. National and international connectivity will maximise the utility of these sensitive and valuable assets. The partners on the project represent many of the largest human genome sequencing and analysis efforts in Australia.
At the heart of any technology platform is identity and access management (IAM): a collection of standards, policies and technologies that enable a platform to determine whether to permit access to a user. In a federated environment such as the Australian/global genomics community, IAM is the glue that enables loosely coupled systems to establish strong trust relationships for the purposes of data sharing. Trust relies on technologies such as cryptography but also on coordinated policies outlining shared expectations between federation participants.
The initial focus of the Federated IAM sub-project team (from here on known as “we”) was a discovery and recording phase to define the current state of identity and access management in the community, the set of problems that need to be addressed, and key stakeholders and their (likely) requirements.
For an Australian genomics federation to be successful, widespread adoption of the new processes and systems will be needed. To foster widespread adoption, we used a range of techniques to understand the current state and the future needs of the national human genome research community including: project partner interviews, synchronous (workshops, meetings) and asynchronous (communication tools, kanban boards and shared repositories) discussion and review, consultation with influential stakeholders not involved in the project (MCRI/VCGS and CSIRO), and a survey of human genome researchers to validate user stories developed by the project team.
The Federated IAM Discovery Phase Report (this document) records: the current state of processes and tools for identity and access management across the community, national community needs, gap analysis, and identification of international projects with components suitable to canvas and potentially pilot.
This document will be used as a reference to plan the pilot for a system that addresses prioritised requirements to create a Minimum Viable Product (MVP). The audience for this document includes the sub-project team, other HGPP stakeholders and the project reference group.
The Human Genomes Platforms Project (HGPP) aims to leverage best practice technologies and global standards to accelerate FAIR human genomics data sharing in Australia. Involving Australia’s human genomics research leaders, along with national computing infrastructure partners, the HGPP will break down silos and facilitate the deployment of much needed genomic data sharing infrastructure throughout Australia. The main project themes are: virtual cohort assembly; data access committee automation; federated identity and access management; data and metadata archiving; and documentation and training. Each has strong parallels with existing and developing ELIXIR themes, platforms and communities including the federated human data community, interoperability, REMS, AAI and training. We are also investigating emerging GA4GH standards including Beacon v2 and Passports. Here we introduce the HGPP, describe our achievements to date, outline our upcoming plans for the project and how they align with the ELIXIR programme.
At the Australian BioCommons we aim to identify and adopt leading technology to maximise benefit from human genomics and related data in Australia. We are doing this by 1) removing barriers between researchers, data and analysis resources; 2) facilitating data sharing across data holdings for greater scale and analytical power; 3) connecting and harmonising national and international research efforts; and 4) ensuring data is appropriately accessed within ethical, legal and privacy standards. A key foundation of this is the establishment of federated data commons that "collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data" (Grossman et al, Comput. Sci. Eng, 2016).
In this presentation we will discuss the lessons we have learnt so far in establishing a human genomics data commons for Australian coronary artery disease cohorts. We will talk about specific technology choices that have been made, challenges faced and effort and skills required. We anticipate that many more data commons of a similar nature will be established nationally, across many disciplines, including stem cell research. This presents a scaling challenge that we hope can be addressed through the adoption of interoperable standards and reusable components, so that the lessons learnt in one context can be applied in many others.