One step closer to identifying lung cancer biomarkers

Forum rules
Welcome to The Scottish Boinc Team boards. See forum rules in pinned post. If you can't be bothered then try not to be too naughty as I have a delete button to press and a ban hammer to swing.
User avatar
Alez
[ TSBT's Pirate ]
[ TSBT's Pirate ]
Posts: 10363
Joined: Thu Oct 04, 2012 1:22 pm
Location: roaming the planet

#1 One step closer to identifying lung cancer biomarkers

Post by Alez »

From WCG website

One step closer to identifying lung cancer biomarkers
By: The Mapping Cancer Markers research team
19 Aug 2015

Summary
Although work continues, enough data has already been processed to let the Mapping Cancer Markers team begin identifying high-scoring signatures and associating them with particular lung cancer biomarkers. The ultimate goal is to find signatures that distinguish many types of cancer, giving physicians and researchers another tool to improve detection, treatment and patient outcomes.

A new stage of MCM lung cancer biomarker discovery

After a long first stage of exploratory analysis, Mapping Cancer Markers (MCM) began a new, more targeted stage of lung cancer analysis in April 2015. Processing results from the first stage revealed a subset of approximately 1% of the biomarkers that frequently occur in high-scoring signatures. The second stage of the MCM lung cancer study will focus on signatures drawn from this subset of biomarkers.

Among the first research questions we are aiming to answer in the second stage are those about the nature of successful signatures and the reduced signature space. Will the selected subset of biomarkers in the targeted stage perform better at distinguishing lung cancer in tissue samples? Will the effect of signature length (number of biomarkers) on signature performance that we noticed in the exploratory stage also appear in this narrowed signature space? Which patterns of biomarkers characterize the top-performing cancer signatures? Most biological function is achieved by multiple genes (or proteins) participating in a coordinated network or signaling cascade (pathway), so can we discover pairs or larger groups of biomarkers that frequently co-occur in successful signatures? Will these groups of biomarkers correspond to known biological networks, or do successful signatures necessarily draw their members from multiple networks?

Enough second-stage results have been returned to allow us to start the preliminary analysis. One main goal of the second stage is to discover high-performing cancer signatures. We used results from the first stage to narrow the field of potential biomarkers from 22,000+ to a subset of 223. Figure 1 shows how the average cancer-distinguishing ability of the stage-2 gene signatures has improved considerably, compared to signatures discovered in the initial stage.

Image

Figure 1. Distribution of signature scores, stage 1 vs. stage 2, by size. Signature frequencies are drawn in blue for stage 1, black for stage 2. Note the increase in the quality of scores in both stages between signatures of length 20 vs. length 10, as well as increase in frequency of higher quality scores.

Shorter or longer signatures?

One of the questions that the volunteer community might be asking is why we continue to focus on shorter gene signatures when the trend in the data shows that the longer gene signatures are performing better. Despite this trend, a larger gene signature may be more predictive but not always better. One reason is practical. Much of the work in the field of biomarker identification has the ultimate goal of producing a signature that can be translated into a clinical test. Feasibility and economics will play an important role at that stage. The process of moving a research-based result through testing and approval is lengthy and complex, so a 10 gene signature is easier and cheaper to translate than a 65 gene signature. The viability of gene signature sizes has roughly guided how we define our lower and upper size search limits for the MCM project.

Biomarker pairs

One of our goals in the analysis of gene signatures is to look at smaller combinations of genes, and identify groups of genes that relate to patient outcome in a similar manner (i.e. they may provide alternative choices for the signature). This is important for a variety of reasons. From the analytical side, if we can find two genes that perform almost exactly the same, then a successful gene signature will likely have only one of those. This will help us reduce search space, but also to find alternatives. From the practical side, one of the two (or more) alternatives may be easier to bring to clinical practice. Thus, we aim to find multiple signatures, and characterize them with respect to their relationships. Another reason to look at combinations of genes involves seeing if two genes may have a biological reason for being related. Is this particular cancer affecting two genes at the same time? Is a particular biological pathway compromised? These kinds of questions might explain differences between patients or why certain people respond better to particular therapies. These are questions that are much further down the research path but we wanted to touch on them so that the community is aware of where in the pipeline your contributions have helped and also what still needs to be done. If a pair (or larger group) of genes is related by disease, signatures containing those genes or related biomarkers should perform well. Figure 2 looks at the rate at which stage-2 biomarker pairs co-occur in high-scoring signatures.

Image

Figure 2: Frequencies of stage-2 biomarker pairs. The frequency of biomarkers i and j co-occurirng in high-scoring signatures is represented by the color of the row-i, column-j element of the matrix. Higher-frequency pairs are colored lighter blue. Note the horizontal and vertical stripes indicating specific biomarkers that perform well regardless of their pairing. Also, very bright single spots highlight biomarker combinations that are exceptionally promising.

A note on run times of research tasks

Some of you may have noticed above-average run times of work units in this new stage of MCM. We are working to make run times more consistent and predictable; however, this job is made more difficult as this stage of the research requires changing work unit designs more frequently than before. The design of new work units will also depend in part on results of earlier second-stage results. Consequently, the turnaround time for benchmarking and calibrating work units may limit our success at stabilizing run times. We trust that our wonderful volunteers will be able to continue contributing results no matter what work units we provide, but we wanted to let you know what to expect. Once again, thank you for making our research possible, and please stay tuned for future announcements!

Recent publications, presentations and media coverage


Media

The Jurisica lab and the MCM1 project scientists were recently interviewed for a Drug Discovery News article about the difficulties of cancer biomarker discovery and validation: Signs of intelligent biomarkers by Randall C Willis, DDNews.
Igor was also interviewed for the NewsTalk Radio 1010 in June 2015 about the work on discovering prognostic and predictive cancer signatures.
World Community Grid was also covered by Genevieve Roberts in The Independent on June 10: In 10 years, 'crowdsourced computing' has changed the world; now it's tackling Ebola

Publications

Navab, R., Strumpf, D., Jurisica, I., Walker, C. G., Gullberg, D., Tsao, M.S. Integrin a11b1 regulates cancer stromal stiffness and promotes tumorigenecity in non-small cell lung cancer, Oncogene, 2015. In press.
Stewart, E.L., Mascaux, C., Pham, N-A, Sakashita, S., Sykes, J., Kim, L., Yanagawa, N., Allo, G., Ishizawa, K., Wang, D., Zhu, C.Q., Li, M., Ng, C., Liu, N., Pintilie, M., Martin, P., John, T., Jurisica, I., Leighl, N.B., Neel, B.G., Waddell, T.K., Shepherd, F.A., Liu, G., Tsao, M-S. Clinical Utility of Patient Derived Xenografts to Determine Biomarkers of Prognosis and Map Resistance Pathways in EGFR-Mutant Lung Adenocarcinoma, J Clin Oncol, 2015. In press. CJCO/2014/601492.
Camargo, J. F., Resende, M., Zamel, R., Klement, W., Bhimji, A., Huibner, S., Kumar, D., Humar, A., Jurisica, I., Keshavjee, S., Kaul, R., Husain, S. Potential role of CC chemokine receptor 6 (CCR6) in prediction of late-onset CMV infection following solid organ transplant. Clinical Transplantation, 2015. In press. doi: 10.1111/ctr.12531
Fortney, K., Griesman, G., Kotlyar, M., Pastrello, C., Angeli, M., Tsao, M.S., Jurisica, I. Prioritizing therapeutics for lung cancer: An integrative meta-analysis of cancer gene signatures and chemogenomic data, PLoS Comp Biol, 11(3): e1004068, 2015.
Starmans, M.H., Pintilie, M., Chan-Seng-Yue, M., Moon, N.C., Haider, S., Nguyen, F., Lau, S.K., Liu, N., Kasprzyk, A., Wouters, B.G., Der, S.D., Shepherd, F.A., Jurisica, I., Penn, L.Z., Tsao, M.S., Lambin, P., Boutros, P.C. Integrating RAS status into prognostic signatures for adenocarcinomas of the lung. Clin Cancer Res, 21(6): 1477-86, 2015.
Wong, S. W. H., Cercone, N., Jurisica, I. Comparative network analysis via differential graphlet communities, Special Issue of Proteomics dedicated to Signal Transduction, Proteomics, 15(2-3):608-17, 2015. E-pub 2014/10/07. doi: 10.1002/pmic.201400233

Editorial

Hoeng J, Peitsch MC, Meyer, P. and Jurisica, I. Where are we at regarding Species Translation? A review of the sbv IMPROVER Challenge, Bioinformatics, 31(4):451-452, 2015.

Presentations

Keynote: Life of an orphan protein, Symposium on Computational Biology, eScience approaches for biomedical data analysis, University of Southern Denmark, Odense, June 10-12
Invited presentation: High-performance computing in integrative cancer informatics. Fathoming cancer by data-driven medicine, Advanced Computing and Analytics in Medical Research Symposium, University of Ottawa, May 11-12.
Invited presentation: Scalable visual data mining.
HPC and “big data” in integrative cancer informatics. OCE Discovery Conference, the Metro Toronto Convention Centre, April 28.
Invited presentation: High-performance computing in integrative cancer informatics. Challenges and opportunities in intelligent molecular medicine, Systems Biology Ireland Seminar Series, University College Dublin, The College of Health Sciences, Dublin, Ireland, March 6
Keynote presentation: Integrative cancer informatics - moving personalized medicine to preventive interventions, Cancer Care Ontario Workshop - PREVENTION INTERVENTION STUDIES TO IMPROVE THE HEALTH OF ADULT CANCER SURVIVORS.
Scalable visual data mining video and demo, Compute Ontario highlight at OCE Discovery Conference, Toronto, April 27-28
Scalable visual data mining video, High Performance Computing Conference, Montreal, June






From WCG website

Working to detect lung and ovarian cancers before they start
By: The Mapping Cancer Markers research team
15 Dec 2015

Summary
Recent stages of the Mapping Cancer Markers (MCM) project have illuminated the protein-protein interactions and biological pathways involved in lung cancer, and have also suggested surprising results about its biomarkers. Once this current stage is complete, MCM will transition to analyzing ovarian cancer. Thanks to your help, we are making discoveries and helping the international research community. Dr. Jurisica, in particular, is one of the most frequently cited researchers worldwide.

Third stage of lung cancer analysis underway

In our previous update, we announced a second, targeted stage of lung cancer signature discovery. We have since moved to a new, third stage in lung cancer analysis: targeting high-scoring, uncorrelated biomarkers. These different stages are all part of an overall effort to understand lung cancer signatures. The first stage surveyed possible lung cancer signatures drawn from the complete set of biomarkers in our lung cancer dataset. The statistics gathered in this first stage were used to narrow the list of biomarkers to explore in subsequent stages. The second and third stages explore lung cancer signatures drawn from small sets of high-performing signatures, chosen by two different methods. In the second stage, we focused on a 1% subset of biomarkers, selected by the frequency with which each appeared in high-scoring signatures from the initial stage. In the third stage, we selected a different subset of biomarkers that are both high-scoring and largely uncorrelated to one another.

Correlation is a measure of information shared between two data sources. Two biomarkers are correlated if they exhibit similar patterns in the cancer dataset. For example, two correlated genes might show high activity in one set of tumour samples, low activity in a second set, and average activity in a third. Including two highly-correlated biomarkers in the same signature can reduce the quality of the signature, because they would be contributing redundant information to the signature. For a fixed-size signature, a redundant biomarker would potentially displace another biomarker that has different information content.

As an analogy, consider the information contained in a small library of textbooks. Say there are three books, A, B, and C. If A and B are two copies of the same textbook, one of them is redundant. Removing B from the library would not change the information contained in the library, and replacing B with a different textbook (D), would increase the information in the library. If A and B were similar, but not identical books (e.g., two books on introduction to molecular biology written by different authors), there would still be some overlap in the texts, and a possible advantage to replacing B with D.

Signature performance

Because the target biomarkers in this third stage were selected to be minimally inter-correlated, every signature should be free of redundant information. We therefore hypothesized that signatures in the third stage would perform better on average than those in the second stage. Figure 1 shows the surprising results: second stage signatures (potentially containing correlated biomarkers) outperformed those from the third stage. We are analysing these results further, to determine the main reasons for the performance difference.

Image[/img]

Figure 1. Distribution of signature scores for second (black) and third stage (blue) signatures. As expected, larger signatures generally outperform smaller. Surprisingly, second stage signatures outperform third stage on average.

Size effects on biomarker rank in top signatures

Larger signatures (i.e., signatures containing more biomarkers) incorporate more information and can potentially offer better accuracy, but are more complex and expensive to implement in the clinic. All three stages of MCM thus far have explored lung cancer signatures of multiple sizes. For each signature size we considered, the target biomarker subsets for the second stage were chosen separately, based on statistics from the first stage. The set of biomarkers selected for the third is fixed across all signature sizes. This fixed set allows us to compare the effects of signature size on each biomarker's frequency in high-scoring signatures. Figure 2 shows the frequency change when moving from 10 biomarkers per signature to 20. Each dot in the graph represents a biomarker. The X axis represents the frequency with which biomarkers appear in size_10 signatures. The Y axis indicates frequency in size_20 signatures. Note that the biomarkers change in rank but are generally correlated. Size_10 signatures show greater biomarker frequency spread: some have relatively high frequency, and many are low-frequency. The biomarker frequencies in larger (size_20) signatures are more even.

Biomarker pairs as protein interactions?

We applied and extended the analysis of biomarker pairs described in the August 2015 update to early results from third stage data, looking specifically for pairs of biomarkers in both the second and third stages that appear surprisingly frequently in the highest-scoring lung cancer signatures. When two genes or proteins appear in signatures together with greater frequency than expected randomly, we predict a stronger cancer-related connection (interaction).

We searched for any known connections (interactions) in The Integrated Interactions Database( IID ) , a database of known and predicted protein-protein interactions created by our lab [1]. We found several interactions in IID that mirror these cancer interactions, but the overlap was not statistically significant.

Image

Figure 2. Biomarker frequencies in size_10 vs. size_20 signatures. Points to the left of the diagonal line represent biomarkers occurring more frequently in size_20 signatures. Note the overall correlation in ranks between sizes, but greater variation in frequencies for shorter signatures.

Pathway enrichment in second and third stage targets

We also took the genes selected for the second and third stages, and searched for them in a database of biological pathways. See Figure 3. We discovered our lists of genes were enriched (present in statistically significant numbers; p ≤ 0.01) in several pathways. See Table 1.

Although our analysis is ongoing, we can see that two of the identified pathways are components of Mevalonate metabolism. Mevalonate pathways are already targets for many drugs such as statins and have been implicated as targets for treatment in lung cancer [2, 3]. Some of the downstream analysis will focus on how the signatures discovered by World Community Grid processing will ultimately connect to pathways and other research. We have used Mevalonate as an example, but there are many more that can be examined to assess the viability of our best signatures.

Table 1. List of biological pathways enriched with MCM's "discovered-pair" genes. P-values < 0.01 indicate statistical significance.

Pathway Name p-value
Mevalonate from acetyl CoA step 2 3 0.003236
Biotinidase Deficiency metabolite pathway 0.004845
Biotin Metabolism 0.004845
Biotinidase Deficiency 0.004845
Multiple carboxylase deficiency neonatal or early onset form 0.004845
Mevalonate biosynthesis 0.004845
Synthesis of Ketone Bodies 0.006449
Ketone Body Metabolism 0.008048
Succinyl CoA 3 ketoacid CoA transferase deficiency 0.008048
Synthesis and Degradation of Ketone Bodies 0.01
Fatty acid triacylglycerol and ketone body metabolism 0.008892
Vitamin H biotin metabolism 0.009643
Dermatan sulfate degradation metazoa 0.009643

Image


Figure 3. Biological pathways enriched by biomarker targets in the second (sizes 10 and 20) and third (all sizes) stages. Some pathways are common to all three.

Transition from lung cancer to ovarian cancer analysis

The third stage is nearly complete, and will be the final piece of MCM lung cancer analysis on World Community Grid before we switch to ovarian cancer.

Ovarian cancer is a gynecologic malignancy that ranks 8th for incidence and 5th for death rate among all women's cancers. The American National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program estimated 22,240 new cases and 14,030 deaths from ovarian cancer in 2013. Patients are usually diagnosed at an advanced stage (61% present metastasized cancer) and have poor prognosis (27.3 months for metastasized stage (SEER)).

Ovarian cancer was chosen as our next dataset because of long experience with this disease in our own lab, and in those of collaborators. We look forward to using MCM to glean new insights into ovarian cancer.

We expect the transition to ovarian cancer research to begin in early 2016, and do not anticipate any interruption in the flow of work units.

Thank you to World Community Grid members

We wish to thank World Community Grid members for their continued support and interest for this and other projects. Without you, this work would not be possible.

References

1. Kotlyar M, Pastrello C, Sheahan N, Jurisica I. Integrated interactions database: tissue-specific view of the human and model organism interactomes. Nucleic Acids Res. 2015 Oct 29

2. Hwa Young Lee, In Kyoung Kim, Hye In Lee, Hye Sun Kang, Chan Kwon Park, Jick Hwan Ha, Seung Joon Kim, Sang Haak Lee. Mevalonate pathway inhibitors as chemopreventive agents on lung cancer cell lines: p53 might be a potent regulator. [abstract]. In: Proceedings of the Eleventh Annual AACR International Conference on Frontiers in Cancer Prevention Research; 2012 Oct 16-19; Anaheim, CA. Philadelphia (PA): AACR; Cancer Prev Res 2012;5(11 Suppl):Abstract nr A48.

3. Yano K. Lipid metabolic pathways as lung cancer therapeutic targets: a computational study. Int J Mol Med. 2012 Apr;29(4):519-29. doi: 10.3892/ijmm.2011.876. Epub 2011 Dec 30.

Some additional relevant presentations and publications
In several papers we have used strategies described above and protein interaction networks to identify better prognostic markers and new treatment options:

Singh M, Garg N, Venugopal C, Hallett RM, Tokar T, McFarlane N, Arpin C, Page B, Haftchenary S, Todic A, Rosa DA, Lai P, Gómez-Biagi R, Ali AM, Lewis A, Geletu M, Mahendram S, Bakhshinyan D, Manoranjan B, Vora P, Qazi M, Murty NK, Hassell JA, Jurisica I, Gunning P, Singh SK. STAT3 pathway regulates lung-derived brain metastasis initiating cell capacity through miR-21 activation. Oncotarget (accepted June 30, 2015, ONC-2014-02546)
Navab R, Strumpf D, To C, Pasko E, Kim KS, Park CJ, Hai J, Liu J, Jonkman J, Barczyk M, Bandarchi B, Wang YH, Venkat K, Ibrahimov E, Pham NA, Ng C, Radulovich N, Zhu CQ, Pintilie M, Wang D, Lu A, Jurisica I, Walker GC, Gullberg D, Tsao MS. Integrin a11b1 regulates cancer stromal stiffness and promotes tumorigenecity in non-small cell lung cancer, Oncogene, 2015. In press.
Agostini M, Zangrando A, Pastrello C, D'Angelo E, Romano G, Giovannoni R, Giordan M, Maretto I, Bedin C, Zanon C, Digito M, Esposito G, Mescoli C, Lavitrano M, Rizzolio F, Jurisica I, Giordano A, Pucciarelli S, Nitti D. A functional biological network centered on XRCC3: a new possible marker of chemoradiotherapy resistance in rectal cancer patients, Cancer Biol Ther, 16(8):1160-71, 2015.
Agostini M, Janssen KP, Kim LJ, D'Angelo E, Pizzini S, Zangrando A, Zanon C, Pastrello C, Maretto I, Digito M, Bedin C, Jurisica I, Rizzolio F, Giordano A, Bortoluzzi S, Nitti D, Pucciarelli S. An integrative approach for the identification of prognostic and predictive biomarkers in rectal cancer. Oncotarget. 2015. Sep 2.
Stewart, E.L., Mascaux, C., Pham, N-A, Sakashita, S., Sykes, J., Kim, L., Yanagawa, N., Allo, G., Ishizawa, K., Wang, D., Zhu, C.Q., Li, M., Ng, C., Liu, N., Pintilie, M., Martin, P., John, T., Jurisica, I., Leighl, N.B., Neel, B.G., Waddell, T.K., Shepherd, F.A., Liu, G., Tsao, M-S. Clinical Utility of Patient Derived Xenografts to Determine Biomarkers of Prognosis and Map Resistance Pathways in EGFR-Mutant Lung Adenocarcinoma, J Clin Oncol, 33(22):2472-80, 2015.
Camargo, J. F., Resende, M., Zamel, R., Klement, W., Bhimji, A., Huibner, S., Kumar, D., Humar, A., Jurisica, I., Keshavjee, S., Kaul, R., Husain, S. Potential role of CC chemokine receptor 6 (CCR6) in prediction of late-onset CMV infection following solid organ transplant. Clinical Transplantation, 2015. In press. doi: 10.1111/ctr.12531
Fortney, K., Griesman, G., Kotlyar, M., Pastrello, C., Angeli, M., Tsao, M.S., Jurisica, I. Prioritizing therapeutics for lung cancer: An integrative meta-analysis of cancer gene signatures and chemogenomic data, PLoS Comp Biol, 11(3): e1004068, 2015.

Integrative analyses also help provide better explanations of experimental results and more accurate models:

Benleulmi-Chaachoua, A., Chen, L., Sokolina, K., Wong, V., Jurisica, I., Emerit, M.B., Darmon, M., Espin, A., Stagljar, I., Tafelmeyer, P., Zamponi, G.W., Delagrange, P., Maurice, P., Jockers, R. Protein interactome mining defines melatonin MT1 receptors as integral component of presynaptic protein complexes of neurons, Journal of Pineal Research, In press

Some of this work was presented at multiple meetings and institutions: including keynotes at The 14th International Conference on Machine Learning and Applications and The American Society for Blood and Marrow Transplantation, Corporate Council Meeting; and invited highlight talks at Intelligent Systems for Molecular Biology Conference and Basel Computational Biology Conference.

Media Coverage

Also, for the second year in a row, Dr. Jurisica has been included in Thomson Reuters highly cited researcher list; Out of 108 in computer science and 3,125 world-wide in 21 fields of science.




From WCG website

Mapping Cancer Markers Now Examining Ovarian Cancer
By: The Mapping Cancer Markers research team
21 Apr 2016

Summary
The Mapping Cancer Markers project now includes markers for the most common form of ovarian cancer, with a goal to understand the how this disease progresses from early-stage to late-stage.

Image

The Mapping Cancer Markers project, which had been concentrating on lung cancer, now includes work units related to ovarian cancer. The researchers are seeking to identify the genes that are important in differentiating between early-stage and late-stage ovarian cancer.

The Goals of Mapping Cancer Markers

Cancer is caused by genetic or environmental changes that interfere with biological mechanisms that control cell growth. These changes can be detected in tissue samples through the presence of their unique chemical indicators, such as DNA and proteins, which together are known as "markers." Specific combinations of these markers may be associated with a given type of cancer. Additionally, the pattern of markers can determine whether an individual is susceptible to developing a specific form of cancer, and may also predict the progression of the disease, helping to suggest the best treatment for a given individual. In order to identify these markers, the project is analyzing millions of data points collected from thousands of healthy and cancerous patient tissue samples.

Why Ovarian Cancer?

Around the world, nearly 250,000 women are diagnosed with ovarian cancer each year, and it is responsible for 140,000 deaths each year. Statistics show that just 45 percent of women with ovarian cancer survive for five years. The main types of ovarian cancer are epithelial, germ cell and stromal cell, with the epithelial type accounting for roughly 85-90 percent of all cases. Ovarian cancer often goes undetected in early stages due to the disease being confined to the ovary, the subtlety of the symptoms, and the lack of an effective screening tool. Therefore, most presentations of the disease are detected in late stages or once the cancer has spread outside the ovary, making treatment less effective and less likely to succeed. It is for these reasons that we have chosen epithelial ovarian cancer as our next area of study.

Understanding the Progression of Ovarian Cancer

In the next stage of Mapping Cancer Markers, we will attempt to identify important genes in defining the differences between early and late-stage cancers. There is a strong correlation between survival time and cancer stage; patients with early-stage cancer tend to have longer lives. We will be using a curated database of ovarian cancer survival data developed by researchers around the world as a starting point.

For the purposes of this study, we are defining early-stage death as before three years after diagnosis, and late-stage death as more than four years after diagnosis. We are looking for the genes that are important in differentiating between these two classes of ovarian cancer to allow us to understand the underlying mechanisms of how cancer progresses.

As compared to the earlier work of Mapping Cancer Markers, where we studied lung cancer, this phase will have a larger and more complex dataset. We estimate that the number of "experiments" we can perform within a single work unit will be much less, as each experiment will take longer to solve. Although the dataset is larger, that means that we are able to use our algorithm against many more points of data, which will hopefully return a very clear result.

We thank the thousands of World Community Grid volunteers who have supported this project since its launch in 2013, and look forward to continuing to work with you as our research progresses.







From Wcg website

Mapping Cancer Markers Begins Analyzing Lung and Ovarian Cancer Data
By: The Mapping Cancer Markers research team
17 Oct 2016

Summary
The Mapping Cancer Markers researchers are analyzing the results of the lung cancer research tasks run on World Community Grid, as well as the first sets of ovarian cancer data. This update gives a detailed look at the tools and processes they are using for this analysis, plus a list of their recent publications and events in their lab.

The Mapping Cancer Markers project aims to identify chemical markers associated with various types of cancer. This will help researchers detect cancer earlier and design more personalized cancer care. Below, the research team describes how they are analyzing the lung cancer data from research tasks that were run on World Community Grid, as well as the first sets of ovarian cancer data. They also update us on happenings in their lab, and provide information on their recent publications.

Lab news

Dr. Anne-Christin Hauschild has recently joined our lab as a postdoctoral research fellow, and will be contributing to the Mapping Cancer Markers project by applying data mining and machine learning algorithms to further prioritize signatures and characterize involved genes.

Our work was recognized for the third time in the row by Thomson Reuters, who included us in the highly cited researcher list , out of 127 in computer science and 3,266 world-wide in 21 fields of science.

Transition to the ovarian dataset

Our previous update described the planned transition to an ovarian cancer dataset from the lung cancer analysis. Due to the timing of the ovarian dataset tests and their launch on World Community Grid, we had several extra, unscheduled days of lung cancer analysis. We used these extra days to explore larger lung cancer signatures (30-100 markers). Previous Mapping Cancer Markers lung cancer work units explored smaller signatures (5-25 markers). All lung cancer results have since been collected, along with the first few months of ovarian cancer results. We are hard at work analyzing the completed lung and preliminary ovarian results.
How we are processing results

As part of this work, we have overhauled how we handle and process the results we receive from World Community Grid. Specifically, we have changed our Extract-Transform-Load (ETL) system which takes the raw, packaged research tasks received from World Community Grid, and unpacks, collates, reorganizes, and recodes results into an efficient and easy-to-load format for subsequent analyses. Our previous ETL system was built into our IBM InfoSphere Streams analysis pipeline. Separating the ETL stage from the analysis benefits the project in several ways. It allows us to more efficiently store data, it simplifies our main Streams-based analysis pipeline, and most importantly, it allows direct analysis of MCM results with other tools and platforms (such as data mining and data analysis tools like R and scikit-learn).
Minimizing potential bias in the ovarian cancer dataset

The Mapping Cancer Markers ovarian cancer dataset combines data from multiple, independent cancer studies. These studies did not follow identical protocols in selecting patients, tissue sample collection or preparation, or recording of clinical covariates. Combining data from multiple sources together requires careful normalization (the process of reorganizing data). The search for successful signatures in such a dataset is made easier if the dataset minimizes bias. We will continue to study the issue of data normalization in the ovarian cancer dataset, and may update the dataset in the future if we discover improvements, or if analysis of results reveals biases.
Data integration portals

Our team has developed two data integration portals to help us interpret and validate the results we receive from Mapping Cancer Markers. The functions of these two portals, called mirDIP and pathDIP, are described in detail below.
Using mirDIP to interpret Mapping Cancer Markers results


One of the projects that our group has been working on in the past several months is the MicroRNA Data Integration Portal (mirDIP). This web resource allows users to query MicroRNAs (miRNAs) and to investigate their interactions with messenger RNA (mRNA) targets. MicroRNAs are short and non-coding RNA molecules observed across plants, animals and viruses(1). For the most part, these short molecules bind to mRNA to control the quantity of protein production. For example, in the case where someone might be injured and is bleeding, if the body determines that there is an immediate need for more blood, miRNAs may bind to their targets and turn up the production of the protein hemoglobin. These molecules are important in the study of cancer because, in part, they control when things should be turned on or off. MiRNAs are known to target more than one gene, so understanding their relationships with genes is an important step in understanding how genes and proteins function.

One of the ways that we plan to use mirDIP in tandem with our Mapping Cancer Markers project is to look more closely at common genes identified by the two methods. Below, we use an example of how this is done. If we take one of many publications to start(2), we can identify miRNAs that are known to be related to ovarian cancer. As seen in a figure from the publication, we can identify some of the key players involved. Networks such as this provide valuable information, yet by no means completely characterize the environment.
(Click image to see an enlarged version)

Image

Figure 1. Oncogenic and tumor suppressor miRNAs in ovarian carcinoma. Based on their function, miRNAs can be used for diagnostics and therapeutics. Certain miRNAs such as miR-200 family, let-7 family, miR-21, miR-214, and miR-100 have strong diagnostic/prognostic potential in ovarian cancer. Figure and caption from paper by Zaman et al. (2), licensed under CC BY 2.0.

We can use mirDIP to identify other major and minor genes that may be involved in these processes. If we submit a query using the 8 example miRNAs, and limit results to those referenced in at least 5 databases, we can quickly narrow down a list of genes of interest. Here, we show an analysis using the software package we developed for visualizing and analyzing protein-protein interaction networks (NAViGaTOR 3.0) of 8 miRNAs and associated genes, where at least 2 miRNAs are targeting a gene.
(Click image to see an enlarged version)

Image

Figure 2. NAViGaTOR 3.0 network of ovarian cancer-associated miRNAs and genes. Turquois nodes indicate miRNAs and grey boxes indicate associated genes, predicted by our mirDIP portal. Two of the miRNAs hsa-mir-34a-5p and has-mir-34c-5p are related to each other and thus have a high amount of overlap across genes they regulate. Hsa-mir-100-5p has no interacting genes; however, this network only shows interactions validated by over 5 independent databases. Hsa-mir-100-5p interactions may only have been significantly identified using fewer sources.

In turn, those genes may indicate critical pathways (some of which are identified in Figure 1) or novel pathways, which may be compromised. If we compare the results of our mirDIP analysis to our highest-scoring Mapping Cancer Markers signatures, we can further identify particular genes of interest. Understanding which players (pathways, genes, proteins, miRNAs, etc) are involved and predicting the possible mechanism will lead to focusing further studies, and may lead to identifying targeted treatment for specific patient subgroups—the goal of precision medicine.
Systematic and comprehensive pathway analysis using pathDIP

This brings us to another resource we have created for comprehensive characterization of cancer profiles: pathDIP. Importantly, this public resource integrates 20 databases and enables computational prediction of pathway association, necessary step to fully understand signal cascades in healthy and disease conditions. Cross-validation determined 71% accuracy of our predictions, and predictions provide novel annotations for 5,732 proteins previously lacking pathway characterization.

Taking the results from Figure 2 (the 36 genes identified by mirDIP), the pathDIP portal identifies two significantly enriched pathways (p < 0.05): 1) MicroRNAs in cancer, and 2) Central carbon metabolism in cancer (KEGG database). The first pathway directly confirms our steps to this point while the latter pathway indicates another avenue for exploration. Indeed, the central carbon metabolism pathway includes the conversion of glucose to lactic acid, a process known as the Warburg effect, which is common in ovarian and other cancers. This process has been shown to be controlled by nitric oxide (3) as well as other miRNAs (4).
Studying broccoli’s anti-cancer properties

However, going back to microRNA gene regulation – the mirDIP portal now enables us to study whether microRNAs from animals, plants and viruses could regulate human genes. This cross-species regulation mechanism opens enormous potential for understanding increase in disease risk and prevention. As a first study in this direction, we have recently completed and published a paper showing that broccoli microRNAs do regulate human genes, upregulated in lung cancer, thus providing a potential explanation of why broccoli consumption has been linked to anti-cancer properties by many epidemiological studies.

Pastrello, C., Tsay, M., McQuaid, R., Abovsky, M., Pasini, E., Shirdel, E., Angeli, M., Tokar, T., Jamnik, J., Kotlyar, M., Jurisicova, A., Kotsopoulos, J., El-Sohemy, A., Jurisica, I. Circulating plant miRNAs can regulate human gene expression in vitro. Nat Sci Reports 6: 32773, 2016.

Several related publications

While most of these publications are related to either cancer studies or tools and resources we created, we also continue to collaborate with other researchers and translate verified workflows from cancer informatics to help solve other diseases.

Here are some of our recent publications:

Chehade, R., R. Pettapiece-Phillips, Salmena, L., Kotlyar, M., Jurisica, I., Narod, S. A., Akbari, M. R., Kotsopoulos, J. Reduced BRCA1 transcript levels in freshly isolated blood leukocytes from BRCA1 mutation carriers is mutation specific, Breast Cancer Res, 18(1): 87, 2016.
Cierna, Z., Mego, M., Jurisica, I., Machalekova, K., Chovanec, M., Miskovska, V., Svetlovska, D., Hainova, K., Kajo, K., Mardiak, J., Babal, P. Fibrillin-1 (FBN-1) a new marker of germ cell neoplasia in situ, BMC Cancer, 16: 597, 2016.
Nakamura, A., Rampersaud, R., Sharma, A., Lewis, S.J., Wu, B., Datta,P., Sundararajan, K., Endisha, H., Rossomacha, E., Rockel, J.S., Jurisica, I., Kapoor, M., Identification of microRNA-181a-5p and microRNA-4454 as mediators of facet cartilage degeneration, JCI Insight, 1(12):e86820, 2016.
Becker-Santos, D.D., Thu, K.L, English, J.C., Pikor, L.A., Chari, R., Lonergan, K.M., Martinez, V.D., Zhang, M., Vucic, E.A., Luk, M.T.Y., Carraro, A., Korbelik, J., Piga, D., Lhomme, N.M., Tsay, M.J., Yee, J., MacAulay, C.E., Lockwood, W.W., Robinson, W.P., Jurisica, I., Lam, W.L., Developmental transcription factor NFIB is a putative target of oncofetal miRNAs and is associated with tumour aggressiveness in lung adenocarcinoma, J Pathology, In press.
Konvalinka, A., Batruch, I., Tokar, T., Dimitromanolakis, A., Reid, R., Song, X., Pei, Y., Drabovich, A.P.,PhD; Diamandis, E. P., Jurisica, I., Scholey, J.W. Quantification of Angiotensin II-Regulated Proteins in Urine of Patients with Polycystic and Other Chronic Kidney Diseases by Selected Reaction Monitoring, Clinical Proteomics, 13: 16, 2016.
Stojanova, A., Tu, W.B., Ponzielli, R., Kotlyar, M., Chan, P.K., Boutros, P.C., Khosravi, F., Jurisica, I., Raught, B., Penn, L.Z. MYC interaction with the tumor suppressive SWI/SNF complex member INI1 regulates transcription and cellular transformation, Cell Cycle, 15(13): 1693-705, 2016.
Li, Y-H, Tavallaee, G., Tokar, T., Nakamura, A., Sundararajan, K., Weston, A., Sharma, A., Mahomed, N. N., Gandhi, R., Jurisica, I., Kapoor, M. Identification of synovial fluid microRNA signature in knee osteoarthritis: Differentiating early- and late-stage knee Osteoarthritis. Osteoarthritis and Cartilage, 24(9): 1577-86, 2016.
Cinegaglia, N.C., Andrade, S.C.S., Tokar, T., Pinheiro, M., Severino, F. E., Oliveira, R. A., Hasimoto, E. N., Cataneo, D. C., Cataneo, A.J.M., Defaveri, J., Souza, C.P., Marques, M.M.C, Carvalho, R. F., Coutinho, L.L., Gross, J.L., Rogatto., S.R., Lam, W.L., Jurisica, I., Reis, P.P. Integrative transcriptome analysis identifies deregulated microRNA-transcription factor networks in lung, adenocarcinoma, Oncotarget, 7(20): 28920-34, 2016.
Vargas, A., Angeli, M., Pastrello, C., McQuaid, R., Li, H., Jurisicova, A., Jurisica, I., Robust quantitative scratch assay, Bioinformatics, 32(9):1439-40, 2016.

Other news

We have completed the 6th annual Team Ian Ride, raising funds to support training of young researchers. These funds support the Best Student Paper Award at the annual ISMB conference, as well as summer interns in cancer informatics.

The event also supports a new direction of our research into physical activity and cancer prevention, and many Team Ian participants have already enrolled in our experimental studies. You can find images at http://www.cs.utoronto.ca/~juris/TIR2016. Please contact us if you want to get involved in 2017.

Thank you,
Image
The best form of help from above is a sniper on the rooftop....

Return to “Mapping Cancer Markers”