Location: Home >> Detail
This work is licensed under aCreative Commons Attribution 4.0 International License
Med One. 2020;5:e200001. https://doi.org/10.20900/mo20200001
1 Department of Public Health Science, Biomedical Informatics Center, Hollings Cancer Center, Medical University of South Carolina (MUSC), 135 Cannon St, Charleston, SC 29425, USA
2 Health Equity and Rural Outreach Innovation Center (HEROIC), Ralph H. Johnson Veteran Affairs Medical Center, Charleston, SC 29401, USA
3 Department of Computer Science, Tennessee Tech University (TTU), 1 William L Jones Dr, Cookeville, TN 38505, USA
Correspondence: Lewis J. Frey, Douglas A. Talbert.
This article belongs to the Virtual Special Issue "Bioinformatics and Precision Medicine"
Precision medicine informatics is a field of research that incorporates learning systems that generate new knowledge to improve individualized treatments using integrated data sets and models. Given the ever-increasing volumes of data that are relevant to patient care, artificial intelligence (AI) pipelines need to be a central component of such research to speed discovery. Applying AI methodology to complex multidisciplinary information retrieval can support efforts to discover bridging concepts within collaborating communities. This dovetails with precision medicine research, given the information rich multi-omic data that are used in precision medicine analysis pipelines. In this perspective article we define a prototype AI pipeline to facilitate discovering research connections between bioinformatics and clinical researchers. We propose building knowledge representations that are iteratively improved through AI and human-informed learning feedback loops supported through crowdsourcing. To illustrate this, we will explore the specific use case of nonalcoholic fatty liver disease, a growing health care problem. We will examine AI pipeline construction and utilization in relation to bench-to-bedside bridging concepts with interconnecting knowledge representations applicable to bioinformatics researchers and clinicians.
The following quote by Herbet A. Simon, an artificial intelligence (AI) visionary, clearly articulates the extant dilemma that biomedical researchers face with a plethora of data measured at multiple scales.
“In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence, a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” 
This new information-rich reality motivates the use of systems that reduce attentional overload instead of contribute to it. An example of increasing attentional overload is alert fatigue where clinicians stop devoting attention to warning messages because of the abundance of unnecessary alerts . Translational precision medicine would benefit from AI systems that glean relevant and useful knowledge from multiple sources in both automated and semi-automated ways through autonomous integration or crowdsourced augmentation to address our attentional limitations. Some areas of precision medicine research, such as translational research with large and diverse research repositories, have an even greater need for AI supported research and collaboration. Using a community driven collaborative information retrieval framework, we will discuss discovery processes relevant to precision medicine that can generalize beyond precision medicine, but is particularly applicable given the exponential growth of multi-omic data that can be leveraged in constructing knowledge representations .
Translational research focuses on moving discoveries made at the bench to clinical care and from clinical care to the bench . However, in practice, translational research is very hard to achieve . This is in part because translational research is multidisciplinary . Here is where an AI system can serve as a bridge between concept models from different disciplines . The specifics of an AI pipeline approach include the integration of knowledge sources, knowledge bases, knowledge linkages, and knowledge extraction that can be crowdsourced by humans and/or AI learners to improve the performance and predictive capabilities of the learned perspectives and knowledge . The sources of knowledge reside in domain experts, research journals, guidelines, and data repositories, and the AI learner could be used to assist in the search for and organization of information to lower the attentional load on researchers. This could build on extant work that extracts meaningful information from clinical practice guidelines using question answering systems such as Watson . Our proposed system is not intended to be a question answer system such as IBM Watson or solely an information retrieval system as described by Sparck . We build on concepts from never ending learning systems [11,12] by combining AI learning with crowdsourcing to identifying bridging concepts among researchers to facilitate collaborations. It is not a general AI approach, but is instead focused on building knowledge representations specific to collaborative research teams.
The goal of the proposed system is to provide a multidisciplinary forum that grows with the knowledge of experts and AI learning components, who together collaboratively build knowledge graphs in specific research areas. A knowledge graph is a representation of concepts as nodes that are connected by edges representing relationships between concepts. Clinicians and bench researchers implicitly create knowledge graphs when they construct best practice clinical guidelines or mechanistic models of interacting components in cells or model systems. Knowledge graphs can be used to discover bridging concepts among collaborators in the project. Because there are data rich repositories of high-throughput experiments along with the papers that document them, the task of biomarker discovery in precision medicine is well matched to the challenge of finding bridging concepts using knowledge graph representations. Given that ontological representations converge more quickly for information rich knowledge spaces , the density of high-throughput data in the precision medicine space, especially in cancer, lends itself to the task of conceptual ontology space construction. The collaborative information retrieval task combined with converting high quality data into well supported knowledge graphs will be enhanced through the combined efforts of experts in specific domain areas and AI algorithms scaling with the size of the growing data resources through the use of crowdsourcing.
The use of crowdsourcing technology coupled with human and AI expertise supports scalable solutions, even when the AI does not have the complexity or subtlety to comprehend aspects of the problem space. The strength of crowdsourcing enables concepts of word and acronym ambiguity to be resolved through expertise that exists in the community. The idea is that a community would use the pipeline to grow a knowledge graph that is highly relevant to their area of expertise and interest. The approach is not geared to build a single knowledge graph with general knowledge, but instead many specific knowledge graphs that have active community bases that support a living repository of information.
Because there is a risk that the AI could overwhelm a knowledge graph with irrelevant and low quality data and papers, crowdsourcing with experts that rank and prioritize the encoding of information provides a check and balance to the information retrieved by the AI. Low quality data and information would result in a downgrade of the knowledge graph by experts in the field, and a reinforcement component built into the AI learner could support the AI learning from the domain experts’ feedback. The net result would be improved AI support for the collaboration, and the human interaction input improves the ability of the AI to integrate relevant information.
In this paper we describe a roadmap for researchers to build such a system to reduce the attentional strain in the application area of precision medicine translational science research. First, we describe how translational scientists can represent their research questions in a computable knowledge representation. Second, we outline a prototype AI pipeline that addresses attentional overload through computational analysis of knowledge representations. Third, we describe how the three target communities (i.e., bioinformaticians, clinicians, and AI researchers) can be engaged and contribute to its success. As an illustration, we apply these concepts to the domain of nonalcoholic fatty liver disease (NAFLD) from both a bench as well as a clinical researcher perspective [14,15]. NAFLD is related to metabolic syndrome with a constellation of comorbidities including obesity, type 2 diabetes mellitus, hypertension, and dyslipidemia [16–19]. If not properly managed NAFLD can progress to liver fibrosis and cirrhosis with outcomes including hepatocellular carcinoma . For bench research to attract the attention of clinicians and clinical researchers their questions need to align with issues that are associated with improving the care of patients.Bench Researcher Perspective
Our NAFLD use case is a researcher studying the proteomic signature of extracellular matrix (ECM) interaction in fibrotic tissues. Changes in ECM can be observed in liver tissue disease progression that occurs in NAFLD and in epithelial-mesenchymal transition occurring in an aggressive cancer [21,22]. Bench researchers analyze measurements collected from biopsied tissue and, in the case of proteomics, could use mass spectrometry matrix-assisted laser desorption/ionization with liquid chromatography (LC-MALDI) to obtain collagen and peptide signatures from formalin-fixed paraffin-embedded (FFPE) tissue [23–25]. The difficulty faced by translational bench researchers is knowing how to frame their preclinical questions so they are clinically relevant. This often comes down to selecting phenotypes to examine along with which experimental conditions to investigate by drawing from their own and their collaborator’s experiences, all while prioritizing time and resources. Given exponentially expanding data repositories, however, the analysis and investigation can be done without awareness of the full set of highly relevant resources. Hence, there are gaps in utilizing the vast amount of informational resources that continue to accrue daily.Clinical Researcher Perspective
Continuing our NAFLD example, a clinical researcher is interested in factors associated with patient outcomes that include fibrotic tissue growth in the liver to the point of cirrhosis and potentially liver cancer. The guideline for diabetes care now recommends NAFLD assessment for diabetic patients with elevated liver enzymes such as alanine aminotransferase (ALT) . The data sources associated with fibrosis progression are liver radiology imaging reports, liver biopsy reports, non-invasive liver fibrosis estimates (e.g., fibrosis 4 (FIB4) scores), race, ethnicity, and body mass index (BMI). The problem for the clinical researcher is how to improve their predictive risk models through non-invasive measures. A strong incentive to move forward with a translational collaboration is co-developing experiments and measures that can be validated to improve the prediction of fibrosis progression in NAFLD. Below we explore this scenario using an AI pipeline approach to facilitate the identification of such connections using a formalized knowledge graph approach.
An AI pipeline can bridge collaborations among bench and clinical researchers by identifing concepts that connect the collaborators and seed the common interests among them. To achieve the adaptive knowledge management scenario described in this paper, we envision AI support for engaged research communities that create, use, and share knowledge to collaborate and extend knowledge in precision medicine. This support would combine crowdsourcing with credential and trustworthiness measures with AI to iteratively refine (1) collaborative information retrieval [10,13], (2) knowledge extraction and organization, and (3) concept connection identification. In this way, the AI pipeline combines the interpretive strength of experts and practitioners in the specific areas of interest while enhancing their models with data-informed AI medical models.
While the use of such a tool might not be limited to precision medicine, we believe precision medicine exhibits a number of properties that make it an ideal area in which to apply AI to help initiate collaborative research: (1) need for translational research, (2) challenge of bridging different ways of thinking (patient-centric vs. bench-centric), (3) particularly high volume of literature, (4) presence of well-developed and organized data repositories, and (5) availability of well-recognized and useful vocabularies and ontologies that can support concept identification and extraction.
Our proposed framework, which we call the Intelligent Precision Medicine Pipeline (IPMP), enables human researchers to collaborate with AI to help organize research questions at different scales of the problem. In our example, the scale of the research question for the clinical research is at the level of the patient, where the focus is identifying actionable predictive models to improve care decisions. For the bench researcher, the scale of the problem is at the cellular level examining fibrosis in the ECM.
One of the challenging tasks that IPMP needs to perform is to identify bridging concepts across the different scales of the research problem. Figure 1 provides a high level diagram of the components involved in a translational human/AI collaborative pipeline. The processes for each researcher are represented in the boxes on the right and left side of the diagram. After developing their research questions, the researchers generate conceptual keywords and identify other sources of knowledge (e.g., data repositories, papers, guidelines) for extraction. There are a number of tools (e.g., Protege, owlready)[27,28] that support manual or programmatic construction and refinement of machine readable knowledge graphs that can be used in the initial knowledge extraction from the collaborating researchers . Given an initial set of small human-generated knowledge graphs, IPMP can utilize the information as input to start the human/AI collaboration.
Once the collaboration begins, the AI can assist in knowledge graph refinement to more precisely capture the relevant research interests. A knowledge graph can be pruned or grown as the collaboration provides feedback about its correctness. As the knowledge graph guides the retrieval of information, the researchers can provide feedback on the relevance of selected retrieved items. When retrieved items are marked as irrelevant, the aspect of the knowledge graph responsible for its retrieval is marked for possible refinement. Then the relevant and irrelevant items associated with each knowledge graph element can be analyzed for more precise representations that include the relevant items and exclude the irrelevant ones. Thus, the knowledge graph’s precision is improved over time by learning from examples, and IPMP’s ability to correctly retrieve relevant literature continually improves. Depending on the desired level of intelligence, IPMP could employ active learning strategies to target knowledge graph refinement .
The pipeline is then tasked with identifying the connecting concepts between these refined perspectives and deriving a common collaborative space of extracted concepts/knowledge. To do this, knowledge is extracted from the literature and converted to a knowledge representation (knowledge graph) with edges linking the concepts to each other. One challenge during knowledge extraction will be the recognition of synonyms and abbreviations. A variety of techniques can be used to address this. Where possible, IPMP can leverage existing resources that map terms to concepts (e.g., the Unified Medical Language System (UMLS) metathesaurus) . Where that is not possible, IPMP can learn to map terms to each other either by being shown that a mapping exists through crowdsourcing or by using machine learning to discover such mappings . For IPMP, this practice needs to be made explicit and represented in a machine readable format. A graph-based approach has shown some success in linking concepts across papers and finding complementary (i.e., cross-specialty) literature [33–35].
For our NAFLD example, the researchers can initiate this by listing keywords and subsets of publications relevant to their joint NAFLD interest. For example, the clinician would have keywords such as NAFLD, fibrosis, and BMI while the bench researcher would have keywords such as LC-MALDI, proteomics, collagen, and ECM. Figure 2 illustrates the creation of a knowledge representation, created using Protege/owlready [27,28], of the keywords for bench (Left) and clinical (Right) researchers.
By using the python package owlready2 we constructed the word graphs in Figure 2 for words associated with bench researchers examining ECM remodeling and clinical researchers examining risk models of fibrosis progression in NAFLD. The list of keywords for each research area establishes a position in the conceptual space. For the purposes of explanation, we have made the space tightly related. The separate knowledge representations can be converted into a joint knowledge representation by running them simultaneously in owlready2 and visualizing the result with Protege (see Figure 3).
The two graphs are connected through the concept of Cancer in the words used by each group. There are places in the example where the keywords could be different but related (e.g., bio_marker at the bench and risk_model at the bedside) and hence, may be important to link. In Figure 3, related concepts have been positioned close to each other. Proximity could be used by the researchers to communicate similarity to the AI while being visually informative to the human researchers. Another example is Fibrosis and Fibroblast since fibroblast cells generate connective tissue associated with fibrosis. The AI can map into related concepts to find a link between them with human assessed accuracy being a measure of the quality of the AI reasoning . Existing terminologies and coding standards such as those in the UMLS  could be used to connect concepts through the AI graph search or reasoning algorithms. Bioinformatics ontologies such as the Gene Ontology could also be integrated into the reasoning process to connect and bridge concepts at the molecular and cellular scale . Alternatively the researchers could construct it manually and feed it as input to the AI. For example, they might map predictive bio_marker and predictive risk_model to a bridge concept like predictive marker. Crowdsource participants could also provide links between concepts . Approaches using machine learning and crowdsourcing on real time analysis of tweets during disaster recovery provides examples of the feasibility of supporting crowdsourced collaborative information retrieval using the python package pybossa . The use of pybossa for this realtime task provides a template for supporting multidisciplinary teams performing collaborative information retrieval focused on specific research areas.
In creating the small knowledge graphs integrated in Figure 3, a paper was identified that could be used to pursue future research entitled, “Identifying Nonalcoholic Fatty Liver Disease Patients with Active Fibrosis by Measuring Extracellular Matrix Remodeling Rates in Tissue and Blood” . The ability to use blood measurement on EMC in liver disease creates a potential translational bridge concept for less invasive measures of liver fibrosis progression in NAFLD, a current active topic of research. Having IPMP further investigate other potentially related papers would provide value to the ongoing research. IPMP could also be used to mark areas of research to follow or flag when related discoveries are identified.
Once the process is initiated by entering the initial keyword knowledge graph(s), the volume of information that IPMP needs to process and search requires the power and speed of AI, but the nature and complexity of the task might also necessitate human input, review, and interpretation. Thus, the success of IPMP depends on effective human/AI collaboration, as envisioned in the following steps for an IPMP search:
A summary of the steps is visualized in Figure 4.
Thus, collaboration is integrated throughout IPMP, which is intended to provide both human-support of AI (e.g., crowdsourced or researcher-provided feedback) and AI-support of humans (e.g., automated exploration of area of interest for relevant papers, datasets, and guidelines).
IPMP’s AI would work as autonomously as it is able. It is human experts, however, who drive the information flow in the pipeline, and thus the inputs to and outputs from the AI are visible and can be reviewed, manipulated, and critiqued by human experts. This feedback is an important component in the AI’s learning process and is similar to the human feedback component in Mitchell’s NELL project . Additionally, IPMP will learn from the crowdsourced content that humans input, and just as they can help out the AI, human interactions with the system (e.g., entering keywords and constructing knowledge graphs) are supported and mediated by the AI.
The strength of IPMP will be to organize a large volume of information relevant to the researchers’ questions and make the connection understandable to them. The extent to which IPMP succeeds in providing relevant information will increase the chances of facilitating a collaboration among the researchers involved. Achieving this ambitious goal will require not only collaboration between biomedical researchers and IPMP, but also collaboration between biomedical researchers and AI researchers.
From the above description, it is clear that clinical and bench researchers are integral to the success of IPMP’s search. They need to be able to clearly specify their questions and must be able to logically critique IPMP’s outputs and provide timely, meaningful, and insightful feedback to shape the system’s understanding of their needs and to contribute to IPMP’s continual learning process. In contrast, AI researchers take no active role in the actual search. Their role is the design and implementation, in concert with bench and clinical researchers, of a collaborative AI that can (A) capture and extract information through dialog with human experts; (B) extract and organize knowledge from natural language narratives, structured guidelines, datasets, and terminologies; (C) understand, search, and connect concepts across the different scales in precision medicine (i.e., genes and proteins to organ systems and patients) in ways that make sense to human experts; and (D) continually learn how to get better at doing (A–C).Available and Needed Technologies
Many of the identified capabilities and gaps are, at least partially, addressed by existing technologies. Details of these technologies are beyond the scope of this paper. There are many approaches to extracting knowledge from free text, including named-entity extraction [39–41], topic modeling [42–44], and automatic text summarization [45–47]. Techniques such as clustering [48,49], frequent pattern identification [50,51], and rule extraction [52–54] have been used to extract knowledge from data. IPMP could build additional knowledge representations such as ontologies [55–58], knowledge graphs [12,59,60] and word embeddings [61–63]. Once knowledge is extracted and represented, a variety of search and information retrieval techniques can be adapted to help find relevant connections [64–67]. The python module pybossa keeps track of user contributions and can provide statistics on the activities of the authenticated and anonymous users on the project: top contributors, time to completion of tasks and other metrics. It potentially could be used in conjunction with a version control system to manage provenance of knowledge graph contributions . Highlighting the existence of these technologies is not meant to trivialize the development of the IPMP. On the contrary, we anticipate that significant work will be needed to realize this AI pipeline. The work highlighted here does, however, suggest that AI researchers have a strong foundation on which to build when developing this AI pipeline.
A key component of precision medicine informatics is knowledge generation using learning systems applied to larger data sets . Repositories of biological information have been growing at exponential rates since the rate of cost reduction for genomic data is faster than Moore’s Law . To manage the accumulation of data that has discoverable patterns, we propose IPMP, a human and AI collaboration, that creates hybrid knowledge structures that benefit from combined knowledge generation intrinsic to human and AI learners. Collaborations between bioinformatics and clinical researchers are complex and difficult to achieve given the differences in the nature and use of information in the two communities. In bioinformatics research, large sets of highly complex multi-omic data are often used to better understand the mechanistic behavior of model systems. Clinicians tend to use highly defined measures (e.g., images, labs) to answer specific questions pertaining to the health and survival of patients. The different emphases and risks involved in both fields result in different training regimes and different perspectives when collecting and analyzing information that affect the systems of interest.
Given the gap that exists between the two fields there is a need for approaches that identify knowledge that is relevant to both communities to increase the productivity of collaborations. Such an approach will need to be able to start from the perspective of either community and build bridges that link the disparate perspectives into a unified whole that motivates each researcher to invest resources in building a collective understanding that advances both fields. To achieve this desirable outcome there is a role for AI researchers to adapt or invent approaches and methodologies that spur on the enterprise of discovery at a faster rate with more successful outcomes for all communities involved. It is our goal in the paper to advocate for a crowdsourced multidisciplinary collaborative information retrieval framework based on AI that would enable a true collaboration involving clinical, bench, and AI researchers all working together to improve the efficacy and precision of medical care.
Both authors contributed equally to the manuscript and wrote the paper together.
The authors have declared no conflicts of interest.
The work of LJF was supported in part by VA HSR&D NAFLD grant HX002700-01A1, National Institutes of Health (NIH) grant U54-MD010706 and Health Equity and Rural Outreach Innovation Center grant CIN 13-418.
Frey L, Talbert DA. Artificial Intelligence Pipeline to Bridge the Gap between Bench Researchers and Clinical Researchers in Precision Medicine. Med One. 2020;5:e200001. https://doi.org/10.20900/mo20200001