Goal: Demonstrate monarchr disease, genes, homologs and phenotype methods.

We’ll look for genes in model organisms that might suggest future focus for a human cancer phenotype.

To do this we need to define:

A cancer phenotype.
Find genes assoiated with that cancer phenotype.
possibly ID map those genes.
possibly subset those genes by certain attributes.
Find homologs to those genes in other organisms.
Find interactions to those genes in human and in other organisms.
Determine if we have found any novel interactions from the model organism.

We’ll try to do these using monarch as follows:

We’ll attempt to use Monarch to find a list of cancer phenotypes, and select one of these as our goal.
We’ll use Monarch to find what human genes are associated with that cancer phenotype.
We’ll use Monarch to find homologs to those genes in other model organisms.
We’ll use Monarch to find interactions to those genes in the other model organisms.
We’ll use Monarch to map the interactions back to human.
We’ll use mygene mygene.info to retrieve useful ids for monarch.
We’ll use Monarch to describe the interactions novel to the model organism network as projected on human.

In other vignettes, we’ll use some of the results we produce here to pull other related information from monarch and non-monarch resources.

Selecting a cancer to work with: Uveal Melanoma

To ensure a richly documented dataset, we have selected a cancer from selected cancer listings at TCGA for which both sample collection is completed and public data available.

Under ‘head and neck’ cancer, we found uveal melanoma. We’ll use this for our example.

Before proceeding further, we wondered: “Are there any animal models of uveal melanoma?”

A quick pubmed search did reveal mouse models for uveal melanoma.

Uveal Melanoma as a disease at Monarch Initiative.

It would be great if we could simply search for the disease by name and retrieve the top result.

Using monarchr: Find a useful ID for ‘uveal melanoma’.

Load monarchr.

library(monarchr)

Search for “uveal melanoma” and display top results.

term <- "uveal melanoma"
results <- biolink_search(term)
um <- results$search_results # keep just the results, dropping the 'response' object.
#um
head(um)

## # A tibble: 6 x 13
##   definition       category label prefix equivalent_iri  id    score edges
##   <chr>            <chr>    <chr> <chr>  <chr>           <chr> <chr> <chr>
## 1 A melanoma deri… disease  uvea… MONDO  http://purl.ob… MOND… 116.… 355  
## 2 A melanoma aris… disease  mixe… MONDO  http://purl.ob… MOND… 105.… 88   
## 3 <NA>             disease  post… MONDO  http://purl.ob… MOND… 105.… 15   
## 4 A uveal melanom… disease  necr… MONDO  http://purl.ob… MOND… 105.… 46   
## 5 A uveal melanom… disease  epit… MONDO  http://purl.ob… MOND… 105.… 77   
## 6 <NA>             disease  susc… MONDO  http://purl.ob… MOND… 105.… 105  
## # ... with 5 more variables: synonym <chr>, `_version_` <chr>,
## #   equivalent_curie <chr>, leaf <chr>, iri <chr>

Note that the results are presented in order of the ‘score’ column. This is the same ordering as we observe doing searches at the website.

# subset(um, select = c(id, score, label, category, synonym))
head(subset(um, select = c(id, score, label, category, synonym)), 10)

## # A tibble: 10 x 5
##    id            score     label      category synonym                    
##    <chr>         <chr>     <chr>      <chr>    <chr>                      
##  1 MONDO:0006486 116.12919 uveal mel… disease  uvea melanoma (disease), M…
##  2 MONDO:0003910 105.42569 mixed cel… disease  mixed cell uveal melanoma,…
##  3 MONDO:0003927 105.42569 posterior… disease  medium/large size posterio…
##  4 MONDO:0004365 105.42569 necrotic … disease  <NA>                       
##  5 MONDO:0006200 105.42569 epithelio… disease  uvea epithelioid cell mela…
##  6 MONDO:0007966 105.42569 susceptib… disease  MELANOMA, UVEAL            
##  7 MONDO:0004062 97.76348  intermedi… disease  intraocular mixed cell typ…
##  8 HP:0007716    66.65321  Intraocul… Phenoty… Uveal melanoma             
##  9 MONDO:0002661 62.40225  uveal dis… disease  uvea disease, disease of u…
## 10 MP:0010276    58.491478 increased… Phenoty… increased uveal melanoma i…

In this case, the top result is a perfect match for our phrase.

We save the ‘id’ for this top result. The id is the Monarch Disease Onotology (MONDO) ID and will be most useful for pulling other results from monarchr.

We print the definition for ‘uveal melanoma’.

um_id <- um[1,]$id
um[um$id == um_id,]$definition

## [1] "A melanoma derived from melanocytes of the uveal tract. It is the most common primary intraocular tumor in the United States and Western Europe. Similar to melanoma of the skin, it is rare in Africa and Asia. Diagnostic procedures include ophthalmoscopic exam, fluorescein angiography and ultrasound. Treatment includes: surgical excision of the eye, iridocyclectomy and tumor resection. Recent treatments also include radiotherapy or photo coagulation. Classification of uveal melanomas recognizes four cell types within these tumors: epithelioid, intermediate, mixed cell, and spindle cell types. The spindle cell type uveal melanomas are further sub-classified as spindle cell type A and spindle cell type B."

This definition is the same text as given in the ‘Overview’ tab from our search of the website in the previous section, MONDO:0006486.

Human Genes associated with uveal melanoma.

Fetching the human genes associated with the disease is simple.

um_genes <- bioentity_genes_assoc_w_disease(um_id)
um_genes <- um_genes$genes # ignore the response object.
um_genes

## # A tibble: 11 x 10
##    subject.taxon subject.id subject relation object object.id object.taxon
##  * <chr>         <chr>      <chr>   <chr>    <chr>  <chr>     <lgl>       
##  1 Homo sapiens  HGNC:18274 CYSLTR2 is mark… uveal… MONDO:00… NA          
##  2 Homo sapiens  HGNC:24308 CLPTM1L contrib… uveal… MONDO:00… NA          
##  3 Homo sapiens  HGNC:950   BAP1    has phe… uveal… MONDO:00… NA          
##  4 Homo sapiens  HGNC:3250  EIF1AX  is mark… uveal… MONDO:00… NA          
##  5 Homo sapiens  HGNC:4390  GNAQ    has phe… uveal… MONDO:00… NA          
##  6 Homo sapiens  HGNC:9059  PLCB4   is mark… uveal… MONDO:00… NA          
##  7 Homo sapiens  HGNC:4390  GNAQ    likely_… susce… MONDO:00… NA          
##  8 Homo sapiens  HGNC:4379  GNA11   likely_… susce… MONDO:00… NA          
##  9 Homo sapiens  HGNC:8101  OCA2    contrib… uveal… MONDO:00… NA          
## 10 Homo sapiens  HGNC:4379  GNA11   is mark… uveal… MONDO:00… NA          
## 11 Homo sapiens  HGNC:10768 SF3B1   is mark… uveal… MONDO:00… NA          
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## #   provided_by <fct>

The HGNC IDs are most useful for further searches with monarchr. Obviously we can subset for the particular relation we want. For this vignette, we take all relation types.

HGNC derives from Human identifiers by the HUGO gene nomenclaure committee HUGO.

Note, the same gene can be returned more than once if there is more than one relation(?) or MONDO id (see notes).

um_hgnc <- um_genes$subject.id
dups <- um_genes[duplicated(um_genes$subject.id),]$subject.id
um_genes[um_genes$subject.id %in% dups, ]

## # A tibble: 4 x 10
##   subject.taxon subject.id subject relation  object object.id object.taxon
##   <chr>         <chr>      <chr>   <chr>     <chr>  <chr>     <lgl>       
## 1 Homo sapiens  HGNC:4390  GNAQ    has phen… uveal… MONDO:00… NA          
## 2 Homo sapiens  HGNC:4390  GNAQ    likely_p… susce… MONDO:00… NA          
## 3 Homo sapiens  HGNC:4379  GNA11   likely_p… susce… MONDO:00… NA          
## 4 Homo sapiens  HGNC:4379  GNA11   is marke… uveal… MONDO:00… NA          
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## #   provided_by <fct>

Save the unique gene IDs.

um_genes_ids <- unique(um_genes$subject.id)

Homologs of human genes.

Find homologs to each of the human genes.

We’d like to use something like the bioentityset function ‘homologs’ to do this all at once. But it doesn’t seem to be working. I tried with our HGNC genes as well as with the NCBI example genes in their supplied example. We’ll watch as the API evolves. Until then, we have to search for each gene’s homolog one at a time.

homologs <- lapply(um_genes_ids, bioentity_homologs)
homologs <- lapply(homologs, "[[", "homologs")
homologs <- do.call('rbind', homologs)

Subset homologs to some organisms.

Subset to some of the more completely researched model organisms. Note that most or all of the mouse and rat are in 1 to 1 orthology relationships, and the other organisms a little less so.

#unique(homologs$object.taxon) # all we can choose from
orgs <- c('Mus musculus', "Rattus norvegicus", "Danio rerio", "Drosophila melanogaster", "Caenorhabditis elegans", "Saccharomyces cerevisiae S288C")
ho <- homologs[homologs$object.taxon %in% orgs,]
ho <- subset(ho, select= c(subject.taxon, subject.id, object.id, object.taxon))
ho <- ho[order(ho$object.taxon),]
split(ho, ho$object.taxon) # list these by organism.

## $`Caenorhabditis elegans`
## # A tibble: 11 x 4
##    subject.taxon subject.id object.id               object.taxon          
##    <chr>         <chr>      <chr>                   <chr>                 
##  1 Homo sapiens  HGNC:18274 WormBase:WBGene00015000 Caenorhabditis elegans
##  2 Homo sapiens  HGNC:18274 WormBase:WBGene00022584 Caenorhabditis elegans
##  3 Homo sapiens  HGNC:18274 WormBase:WBGene00018344 Caenorhabditis elegans
##  4 Homo sapiens  HGNC:18274 WormBase:WBGene00021328 Caenorhabditis elegans
##  5 Homo sapiens  HGNC:18274 WormBase:WBGene00012275 Caenorhabditis elegans
##  6 Homo sapiens  HGNC:24308 WormBase:WBGene00011761 Caenorhabditis elegans
##  7 Homo sapiens  HGNC:3250  WormBase:WBGene00019162 Caenorhabditis elegans
##  8 Homo sapiens  HGNC:4390  WormBase:WBGene00001196 Caenorhabditis elegans
##  9 Homo sapiens  HGNC:9059  WormBase:WBGene00001177 Caenorhabditis elegans
## 10 Homo sapiens  HGNC:9059  WormBase:WBGene00004037 Caenorhabditis elegans
## 11 Homo sapiens  HGNC:10768 WormBase:WBGene00011605 Caenorhabditis elegans
## 
## $`Danio rerio`
## # A tibble: 13 x 4
##    subject.taxon subject.id object.id                 object.taxon
##    <chr>         <chr>      <chr>                     <chr>       
##  1 Homo sapiens  HGNC:18274 ZFIN:ZDB-GENE-071016-2    Danio rerio 
##  2 Homo sapiens  HGNC:18274 ZFIN:ZDB-GENE-071016-1    Danio rerio 
##  3 Homo sapiens  HGNC:24308 ZFIN:ZDB-GENE-040718-75   Danio rerio 
##  4 Homo sapiens  HGNC:950   ZFIN:ZDB-GENE-050208-492  Danio rerio 
##  5 Homo sapiens  HGNC:3250  ZFIN:ZDB-GENE-030131-1319 Danio rerio 
##  6 Homo sapiens  HGNC:3250  ZFIN:ZDB-GENE-041010-185  Danio rerio 
##  7 Homo sapiens  HGNC:4390  ZFIN:ZDB-GENE-081104-25   Danio rerio 
##  8 Homo sapiens  HGNC:9059  NCBIGene:100537538        Danio rerio 
##  9 Homo sapiens  HGNC:9059  ZFIN:ZDB-GENE-060627-1    Danio rerio 
## 10 Homo sapiens  HGNC:4379  ZFIN:ZDB-GENE-050208-597  Danio rerio 
## 11 Homo sapiens  HGNC:4379  ZFIN:ZDB-GENE-041121-9    Danio rerio 
## 12 Homo sapiens  HGNC:8101  ZFIN:ZDB-GENE-070718-4    Danio rerio 
## 13 Homo sapiens  HGNC:10768 ZFIN:ZDB-GENE-040827-3    Danio rerio 
## 
## $`Drosophila melanogaster`
## # A tibble: 14 x 4
##    subject.taxon subject.id object.id           object.taxon           
##    <chr>         <chr>      <chr>               <chr>                  
##  1 Homo sapiens  HGNC:18274 FlyBase:FBgn0039595 Drosophila melanogaster
##  2 Homo sapiens  HGNC:24308 FlyBase:FBgn0030456 Drosophila melanogaster
##  3 Homo sapiens  HGNC:950   FlyBase:FBgn0262166 Drosophila melanogaster
##  4 Homo sapiens  HGNC:3250  FlyBase:FBgn0026250 Drosophila melanogaster
##  5 Homo sapiens  HGNC:4390  FlyBase:FBgn0050054 Drosophila melanogaster
##  6 Homo sapiens  HGNC:4390  FlyBase:FBgn0033756 Drosophila melanogaster
##  7 Homo sapiens  HGNC:4390  FlyBase:FBgn0004435 Drosophila melanogaster
##  8 Homo sapiens  HGNC:9059  FlyBase:FBgn0262738 Drosophila melanogaster
##  9 Homo sapiens  HGNC:8101  FlyBase:FBgn0041150 Drosophila melanogaster
## 10 Homo sapiens  HGNC:8101  FlyBase:FBgn0036329 Drosophila melanogaster
## 11 Homo sapiens  HGNC:8101  FlyBase:FBgn0051693 Drosophila melanogaster
## 12 Homo sapiens  HGNC:8101  FlyBase:FBgn0035332 Drosophila melanogaster
## 13 Homo sapiens  HGNC:8101  FlyBase:FBgn0031649 Drosophila melanogaster
## 14 Homo sapiens  HGNC:10768 FlyBase:FBgn0031266 Drosophila melanogaster
## 
## $`Mus musculus`
## # A tibble: 9 x 4
##   subject.taxon subject.id object.id   object.taxon
##   <chr>         <chr>      <chr>       <chr>       
## 1 Homo sapiens  HGNC:18274 MGI:1917336 Mus musculus
## 2 Homo sapiens  HGNC:24308 MGI:2442892 Mus musculus
## 3 Homo sapiens  HGNC:950   MGI:1206586 Mus musculus
## 4 Homo sapiens  HGNC:3250  MGI:1913485 Mus musculus
## 5 Homo sapiens  HGNC:4390  MGI:95776   Mus musculus
## 6 Homo sapiens  HGNC:9059  MGI:107464  Mus musculus
## 7 Homo sapiens  HGNC:4379  MGI:95766   Mus musculus
## 8 Homo sapiens  HGNC:8101  MGI:97454   Mus musculus
## 9 Homo sapiens  HGNC:10768 MGI:1932339 Mus musculus
## 
## $`Rattus norvegicus`
## # A tibble: 9 x 4
##   subject.taxon subject.id object.id   object.taxon     
##   <chr>         <chr>      <chr>       <chr>            
## 1 Homo sapiens  HGNC:18274 RGD:619797  Rattus norvegicus
## 2 Homo sapiens  HGNC:24308 RGD:1307896 Rattus norvegicus
## 3 Homo sapiens  HGNC:950   RGD:1311938 Rattus norvegicus
## 4 Homo sapiens  HGNC:3250  RGD:1560198 Rattus norvegicus
## 5 Homo sapiens  HGNC:4390  RGD:620770  Rattus norvegicus
## 6 Homo sapiens  HGNC:9059  RGD:3345    Rattus norvegicus
## 7 Homo sapiens  HGNC:4379  RGD:619749  Rattus norvegicus
## 8 Homo sapiens  HGNC:8101  RGD:2318412 Rattus norvegicus
## 9 Homo sapiens  HGNC:10768 RGD:620148  Rattus norvegicus
## 
## $`Saccharomyces cerevisiae S288C`
## # A tibble: 2 x 4
##   subject.taxon subject.id object.id      object.taxon                  
##   <chr>         <chr>      <chr>          <chr>                         
## 1 Homo sapiens  HGNC:3250  SGD:S000004873 Saccharomyces cerevisiae S288C
## 2 Homo sapiens  HGNC:10768 SGD:S000004901 Saccharomyces cerevisiae S288C

Get interactions of the orthologs.

Monarch Initiative has indicated that while only 51% of human genes have an annotation associated with them, if one includes annotations of orthologs to those genes in other organisms, 89% have some annotation Monarch Initiative Nucleic Acids Research paper.

A possible inference from that is that by looking for interactions in other organisms, we might extend what we know about interactions for humans.

Of course we also want interactions in humans.

Find all interactions within each species.

all_genes <- c(um_genes_ids, ho$object.id)
all_intxns <- lapply(all_genes, bioentity_interactions_assoc_w_gene)
all_intxns <- lapply(all_intxns, "[[", "interactions")
all_intxns <- do.call('rbind', all_intxns)
all_intxns <- all_intxns[all_intxns$subject.taxon ==  all_intxns$object.taxon, ]

# unique(all_intxns$relation) # "interacts with" "genetically interacts with"

# nrow(all_intxns) # 3456
# length(all_intxns$object.id) # 3456
# length(unique(c(all_intxns$object.id, all_intxns$subject.id))) # 2677
# length(unique(all_intxns$object.id)) # 2646

all_intxns_split <- split(all_intxns, all_intxns$subject.taxon)

lapply(all_intxns_split, nrow)

## $`Caenorhabditis elegans`
## [1] 172
## 
## $`Danio rerio`
## [1] 738
## 
## $`Drosophila melanogaster`
## [1] 519
## 
## $`Homo sapiens`
## [1] 720
## 
## $`Mus musculus`
## [1] 679
## 
## $`Rattus norvegicus`
## [1] 570

Note, sometimes interactions are reported across species. We dropped those.

3484 interactions were found, 720 in human. Including both the subject and object genes, we have a total set of 2755 genes. We didn’t find interactions for yeast. We don’t know if the lack of yeast interactions is a monarch shortcoming or indicative of there really being no interactions for these particular yeast genes.

(TODO: evaluate counts inline as they’ll change with future updates to monarch initiative’s data sources.)

Orthologs to the interactions.

This is getting to be a large dataset. Let’s continue by focusing on only one of the original human uveal melanoma genes and its orthologs and interactors of those.

We’ll focus on the human gene ‘BAP1’, which has the unique id ‘HGNC:950’.

bap1 <- "HGNC:950"
bap1_homologs <- homologs[homologs$subject.id == bap1,] # includes paralogs.
# all_intxns[all_intxns$subject.id %in% bap1_homologs$object.id,] # does not include interactions to human bap1
# Include interacations to human bap1 and all orthologs of bap1.
bap1_homolog_intxns <- all_intxns[all_intxns$subject.id %in% c(bap1, bap1_homologs$object.id),]
bap1_homolog_intxns_split <- split(bap1_homolog_intxns, bap1_homolog_intxns$object.taxon)
lapply(bap1_homolog_intxns_split, nrow)

## $`Danio rerio`
## [1] 64
## 
## $`Drosophila melanogaster`
## [1] 100
## 
## $`Homo sapiens`
## [1] 97
## 
## $`Mus musculus`
## [1] 85
## 
## $`Rattus norvegicus`
## [1] 50

We found 97 interactions in humans and 50-100 in other organisms.

Note! Some of these counts are suspiciously close to our default max_rows used to query monarch. We could go back and change the all_intxns to allow for more rows, but we’ll continue on, understanding we are limiting our results, for now.

TODO: yeah, we should go back and redo that, but we only need BAP1 interactions.

Each of these tibbles represent graphs with a central node of BAP1 (or BAP1’s orthologs) connected to each of its interactors.

We can continue from here to look at qualities of these BAP1 centralized network in data from each organism. That is left as an exercise.

We try to fix our limited results, and proceed only with mouse and human BAP1, as there are lots of interactions.

bap1_and_mouse_bap1 <- c(bap1, "MGI:1206586")
hm_bap1_homolog_intxns <- lapply(bap1_and_mouse_bap1, bioentity_interactions_assoc_w_gene, rows = 3000)  
hm_bap1_intxns <- lapply(hm_bap1_homolog_intxns, "[[", "interactions")
hm_bap1_intxns <- do.call('rbind', hm_bap1_intxns)
split(hm_bap1_intxns, hm_bap1_intxns$subject.id)

## $`HGNC:950`
## # A tibble: 148 x 10
##    subject.taxon subject.id subject relation object object.id object.taxon
##    <chr>         <chr>      <chr>   <chr>    <chr>  <chr>     <chr>       
##  1 Homo sapiens  HGNC:950   BAP1    interac… IPO4   HGNC:194… Homo sapiens
##  2 Homo sapiens  HGNC:950   BAP1    interac… USP28  HGNC:126… Homo sapiens
##  3 Homo sapiens  HGNC:950   BAP1    interac… MAP3K1 HGNC:6848 Homo sapiens
##  4 Homo sapiens  HGNC:950   BAP1    interac… XRCC3  HGNC:128… Homo sapiens
##  5 Homo sapiens  HGNC:950   BAP1    interac… PSMB2  HGNC:9539 Homo sapiens
##  6 Homo sapiens  HGNC:950   BAP1    interac… AHCYL2 HGNC:222… Homo sapiens
##  7 Homo sapiens  HGNC:950   BAP1    interac… BMI1   HGNC:1066 Homo sapiens
##  8 Homo sapiens  HGNC:950   BAP1    interac… TSNAX  HGNC:123… Homo sapiens
##  9 Homo sapiens  HGNC:950   BAP1    interac… PHC1   HGNC:3182 Homo sapiens
## 10 Homo sapiens  HGNC:950   BAP1    interac… HIST2… HGNC:4736 Homo sapiens
## # ... with 138 more rows, and 3 more variables: evidence <fct>,
## #   publications <fct>, provided_by <fct>
## 
## $`MGI:1206586`
## # A tibble: 116 x 10
##    subject.taxon subject.id subject relation object object.id object.taxon
##    <chr>         <chr>      <chr>   <chr>    <chr>  <chr>     <chr>       
##  1 Mus musculus  MGI:12065… Bap1    interac… Phc1   MGI:1032… Mus musculus
##  2 Mus musculus  MGI:12065… Bap1    interac… Mpnd   MGI:1915… Mus musculus
##  3 Mus musculus  MGI:12065… Bap1    interac… Psma3  MGI:1048… Mus musculus
##  4 Mus musculus  MGI:12065… Bap1    interac… Psmd6  MGI:1913… Mus musculus
##  5 Mus musculus  MGI:12065… Bap1    interac… Gm5239 MGI:3648… Mus musculus
##  6 Mus musculus  MGI:12065… Bap1    interac… Brca1  MGI:1045… Mus musculus
##  7 Mus musculus  MGI:12065… Bap1    interac… DNAAF5 HGNC:260… Homo sapiens
##  8 Mus musculus  MGI:12065… Bap1    interac… Foxk1  MGI:1347… Mus musculus
##  9 Mus musculus  MGI:12065… Bap1    interac… Hist2… MGI:2448… Mus musculus
## 10 Mus musculus  MGI:12065… Bap1    interac… Psma4  MGI:1347… Mus musculus
## # ... with 106 more rows, and 3 more variables: evidence <fct>,
## #   publications <fct>, provided_by <fct>

We found all the possible BAP1 interactions as neither mouse nor human exceed our row limits.

We restrict to only within organism matches.

hm_bap1_intxns <- hm_bap1_intxns[hm_bap1_intxns$subject.taxon == hm_bap1_intxns$object.taxon,]

Map the interactions in the other organisms back to human.

Now, we get a little crazy, and take the point of view that each organism’s BAP1 centralized network represents a version of all networks which might be relevant (or not) to human BAP1.

We project the interactions back to human to get their human orthologs.

We already have all the human interactors to BAP1, we just save them with a new name.

We need to project the mouse interactions back to human.

bap1_human_intxns <- hm_bap1_intxns[hm_bap1_intxns$subject.taxon == "Homo sapiens",]
bap1_orthos_mouse <- hm_bap1_intxns[hm_bap1_intxns$subject.taxon == "Mus musculus",]

# we add back the mouse bap1 homolog to all the other mouse genes.
bap1_orthos_mouse_ids <- unique(c(bap1_orthos_mouse$subject.id, 
                                  bap1_orthos_mouse$object.id))
# x <- lapply(bm, bioentity_homologs, homolog_taxon="NCBITaxon:9606")
# Nicer to server with sleep.
bap1_orthos_mouse2human <- lapply(bap1_orthos_mouse_ids, 
                                  function(x) {
                                    r <- bioentity_homologs(gene = x, homolog_taxon = "NCBITaxon:9606", rows = 3000)
                                    Sys.sleep(0.1)
                                    r
                                    })
m2h <- lapply(bap1_orthos_mouse2human, "[[", "homologs")
m2h <- do.call('rbind', m2h)
nrow(m2h) # 105

## [1] 105

length(unique(m2h$object.id)) # 102

## [1] 102

We note imperfect 1 to 1 orthology led to the discrepancy in our counts above. Some genes have multiple orthology relationships.

m2h[m2h$object.id %in% m2h[duplicated(m2h$object.id),]$object.id,]

## # A tibble: 6 x 10
##   subject.taxon subject.id  subject relation object object.id object.taxon
##   <chr>         <chr>       <chr>   <chr>    <chr>  <chr>     <chr>       
## 1 Mus musculus  MGI:1195985 Cbx4    in 1 to… CBX4   HGNC:1554 Homo sapiens
## 2 Mus musculus  MGI:3649015 Gm4950  in orth… PSMB3  HGNC:9540 Homo sapiens
## 3 Mus musculus  MGI:3642386 Gm9774  in orth… ADRM1  HGNC:157… Homo sapiens
## 4 Mus musculus  MGI:1196439 Cbx7    in orth… CBX4   HGNC:1554 Homo sapiens
## 5 Mus musculus  MGI:1929289 Adrm1   in 1 to… ADRM1  HGNC:157… Homo sapiens
## 6 Mus musculus  MGI:1347014 Psmb3   in 1 to… PSMB3  HGNC:9540 Homo sapiens
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## #   provided_by <fct>

When we project these mouse interactors to BAP1 back to human, we have a projected network of 102 unique human genes.

Recall we had a network of 98 human genes using only the human interactions. (97 interactions + BAP1 as the hub)

Compare human network and human network as projected from mouse.

What is the difference between the human network and the projected mouse network?

Note, we’re switching to working with the gene symbols rather than HGNC IDs.

h_genes <- c(bap1_human_intxns$object, "BAP1")
m2h_genes <- m2h$object

Common to both.

length(intersect(h_genes, m2h_genes))

## [1] 87

Altogether.

length(union(h_genes, m2h_genes))

## [1] 160

Unique within the human network.

nrow(setdiff(h_genes, m2h_genes))

## NULL

unique within the mouse genes.

uniq_mouse <- setdiff(m2h_genes, h_genes)
uniq_mouse

##  [1] "KIAA2012"    "BCORP1"      "COMMD3-BMI1" "HIST2H2AA4"  "PRKCA"      
##  [6] "NFRKB"       "OTUD4"       "SMOX"        "KDM1A"       "PAOX"       
## [11] "YY2"         "PIP5K1C"     "ANKRD2"      "ANKRD1"      "ASB4"

We expand the potential network of genes interact with BAP1 by considering interactions in mouse, and projecting those interactions back to human.

To the extent anything is known of the new genes (the unique to mouse genes) vis a vis cancer or melanoma, these genes might be useful to pursue further. Alternatively, if nothing is known with respect to cancer for these genes, they might be interesting as novel targets for discovery.

Are the novel mouse genes associated with melanoma?

In the appendix, we note that MONDO:0005105 is the identifier for Melanoma.

We ask if any of the genes unique to the mouse network (as projected to human) are noted as melanoma genes.

melanoma <- bioentity_genes_assoc_w_disease("MONDO:0005105")
mel_genes <- unique(melanoma$genes$subject) # gene symbols
length(mel_genes)

## [1] 52

intersect(mel_genes, uniq_mouse)

## character(0)

No new melanoma genes from the mouse network. This isn’t too surprising as relatively few genes of any sort have been associated with melanoma according to the MONDO ontology.

Nonetheless, the genes might be useful for futher study with regards to melanoma.

Are the novel mouse genes associated with cancer?

Cancer has MONDO id ‘MONDO:0004992’.

We suspect there are lots of genes tagged for cancer.

Indeed, https://monarchinitiative.org/disease/MONDO:0004992 shows 3055.

We need them all instead of the default 100 that monarchr provides. But we need to play nice with the server (either because of server limits or because of maximum size limits for the json, or both.)

Note, we get far more than 3035 genes. The Monarch Initiative website count included a handfulfrom non-human organims. We narrowed them to human, and took only the unique results.

#TODO: user shouldn't have to do this, wrap it in all methods instead.
check_for_more = TRUE
start_row = 0
group = 1
results = list()
while(check_for_more == TRUE) {
  # print(paste("fetching", group, "start", start_row))   # Monitor progress when using interactively.
  cancer <- bioentity_genes_assoc_w_disease("MONDO:0004992", 
                                            rows = 100, 
                                            start=start_row, 
                                            fetch_objects = FALSE)
  if (nrow(cancer$genes) < 100) {
    check_for_more <- FALSE
  }
  start_row <- start_row + 100
  group <- group + 1
  results <- c(results, cancer$genes[cancer$genes$subject.taxon == "Homo sapiens", ]$subject)
  Sys.sleep(0.5)
}
cancer_genes <- unique(unlist(results))

intersect(cancer_genes, uniq_mouse)

## [1] "PRKCA" "KDM1A"

These 2 genes are BAP1 interactors in mouse. They are not noted as BAP1 interactors in human. In human, they are noted as having some association with cancer.

Get more info on the 2 new genes identified from projecting mouse to human.

Because we now have gene symbols, instead of gene IDs, further searches with monarchr are difficult.

mygene.info is handy to convert symbols back to useful monarch ids.

We use mygene.info to get back to an HGNC identifier.

suppressMessages(library(mygene))
new_genes <- intersect(cancer_genes, uniq_mouse)
# queryMany(new_genes, scopes = "symbol", species = "human", fields = "all")
new_genes_myinfo <- queryMany(new_genes, scopes = "symbol", species = "human", fields = "HGNC")$HGNC

## Finished

new_genes_hgnc <- paste0("HGNC:", new_genes_myinfo)
# We need to give some time for mygene.info to respond.
Sys.sleep(1)

List disease and phenotype associations (including cancer) for our 2 candidates.

Then we explore how these candidate human BAP1 interactors are annotated for disease.

new_genes <- lapply(new_genes_hgnc, bioentity_diseases_assoc_w_gene)
new_genes <- lapply(new_genes, "[[", "diseases")
new_genes <- do.call(rbind, new_genes)
new_genes

## # A tibble: 9 x 10
##   subject.taxon subject.id subject relation  object object.id object.taxon
##   <chr>         <chr>      <chr>   <chr>     <chr>  <chr>     <lgl>       
## 1 Homo sapiens  HGNC:9393  PRKCA   contribu… osteo… MONDO:00… NA          
## 2 Homo sapiens  HGNC:9393  PRKCA   is marke… intes… MONDO:00… NA          
## 3 Homo sapiens  HGNC:9393  PRKCA   contribu… gout   MONDO:00… NA          
## 4 Homo sapiens  HGNC:9393  PRKCA   is marke… nephr… MONDO:00… NA          
## 5 Homo sapiens  HGNC:9393  PRKCA   contribu… post-… MONDO:00… NA          
## 6 Homo sapiens  HGNC:29079 KDM1A   is marke… hyper… MONDO:00… NA          
## 7 Homo sapiens  HGNC:29079 KDM1A   has phen… palat… MONDO:00… NA          
## 8 Homo sapiens  HGNC:29079 KDM1A   is marke… non-s… MONDO:00… NA          
## 9 Homo sapiens  HGNC:29079 KDM1A   is marke… colon… MONDO:00… NA          
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## #   provided_by <fct>

Of the new genes of interest, KDM1A has the most cancer disease terms associated with it.

What are some of the phenotypes associated with either of the new genes.

new_genes <- lapply(new_genes_hgnc, bioentity_phenotypes_assoc_w_gene)
new_genes <- lapply(new_genes, "[[", "phenotypes")
new_genes <- do.call("rbind", new_genes)
new_genes

## # A tibble: 41 x 10
##    subject.taxon subject.id subject relation object object.id object.taxon
##    <chr>         <chr>      <chr>   <chr>    <chr>  <chr>     <lgl>       
##  1 Homo sapiens  HGNC:9393  PRKCA   has phe… Neopl… HP:00117… NA          
##  2 Homo sapiens  HGNC:9393  PRKCA   contrib… neuro… EFO:0005… NA          
##  3 Homo sapiens  HGNC:9393  PRKCA   contrib… coron… EFO:0004… NA          
##  4 Homo sapiens  HGNC:29079 KDM1A   has phe… Promi… HP:00112… NA          
##  5 Homo sapiens  HGNC:29079 KDM1A   has phe… Synop… HP:00006… NA          
##  6 Homo sapiens  HGNC:29079 KDM1A   has phe… Blue … HP:00005… NA          
##  7 Homo sapiens  HGNC:29079 KDM1A   has phe… Crypt… HP:00000… NA          
##  8 Homo sapiens  HGNC:29079 KDM1A   has phe… Highl… HP:00025… NA          
##  9 Homo sapiens  HGNC:29079 KDM1A   has phe… Cereb… HP:00124… NA          
## 10 Homo sapiens  HGNC:29079 KDM1A   has phe… Antev… HP:00004… NA          
## # ... with 31 more rows, and 3 more variables: evidence <fct>,
## #   publications <fct>, provided_by <fct>

With a neoplasm listed for PKCA, maybe we were too hasty to suggest focusing more on KDM1A.

We think this sufficiently demonstrates the value of looking at the multiple types of annotations as are readily available from Monarch Initiative.

We note that both of these genes are subjects of intensive study in a variety of organisms. What is novel to our discovery here is that in the case of BAP1, we might want to consider each of these (PKCA, KDM1A) as possible interactors in human. And by extension of BAP1’s role in uveal melanoma, possible roles in uveal melanoma.

Finally, we looked at interactions of only 1 (BAP1) of the 9 ‘melanoma’ genes in detail. And we only projected interactors from one organism (mouse) back to human. If we wanted to cast as wide a net as possible, we could repeat these steps using all model organisms and interactions to all of the 9 melanoma genes. Of course we could also go in the other direction, identifying new research targets or questions for model organisms based on orthology to humans.

Summary

We used monarchr to find genes associated with a disease, uveal melanoma.
We found model organism orthologs of these genes.
We then focused on the human gene BAP1, and its mouse ortholog, identifying genes that interact with these.
We projected the mouse interactors to their human orthologs.
We compared the networks (centered on the single hub, BAP1)
We found there were some interactors that were unique to the mouse network (projected back to humans).
We described that the novel mouse projection candidates had no annotations for melanoma, but 2 did for cancer.
We further described the 2 novel BAP1 human interaction candidates by searching Monarch disease and phenotype terms.

Appendix

The website equivalent of searching for ‘uveal melanoma’.

If we were using the Monarch Initiative Website Directly, we would:

search Monarch Initiative for ‘uveal melanoma’ by entering the term in the search box and clicking ‘Go’
take the first disease shown in the results, it is named ‘uveal melanoma’.
Clicking it, we get to an overview page and other tabs for ‘uveal melanoma’ for the MONDO ID 0006486

Note the following tabs and counts at the website:

overview
Phenotypes (17)
Genes (9)
Models (6)
Variants (9)
Pathways (3)
Compare

We’ll be interested in the overview, to make sure we found the disease we are targeting. And we’ll want the genes to move onto additional steps.

In addition, we see that uveal melanoma is classified under ‘melanoma’ which is classified under ‘cancer’. We make note of these MONDO IDs for validation or analysis steps in this or other vignettes.

cancer, MONDO:0004992
melanoma, MONDO:0005105
uveal melanoma, MONDO:0006486

Note, the entire MONDO ontology can be found here: MONDO.

Note, when we searched using monarchr the (label, category, synonym) columns in the tibble correspond to (Term, Category, Matching String) columns as observed at the website result. The exception is that the website displayed only one of possibly many synonyms until we drilled further into the results. (TODO(?): How is the particular synonym chosen by monarch? It is not obvious.).

Note, the number of unique genes (9) for uveal melanoma using monarchr is the same as listed under the Genes tab at the website.

Monarch returns all genes at and below a disease ontology’s level (i.e. node or leaf).

When we searched for MONDO:0006486, we also returned genes for MONDO:0007966. Searching monarch’s website for MONDO:0007966 we see it is directly below the MONDO:0006486 ‘uveal melanoma’ term. So this makes sense, monarchr is returning both the genes at the disease ontology level we requested, and all those underneath it as well. You can confirm this by clicking on each of the nodes and leafs at the website below uveal melonoma. The only leaf that shows genes is ‘susceptibility to melanoma’, all other genes are classified solely at the higher level of ‘uveal melanoma’. (That isn’t the entire story, we do see some variants and phenotypes at the other nodes and leaves.)

We need to check server limits, or at least play nice.

TODO: We haven’t seen any server limits in the headers from our TODO: previous requests, but it is still on us to play nice.

response$content$objects might be a handy shortcut for some searches.

We note that in the json when we searched for cancer genes using the MONDO term, cancer$response$content$objects" gets all 3055 unique objects, even when asking for a limited number of rows. This suggests a shortcut, where we don’t have to query for all information.

However, these “cancer$response$content$objects” are in form of gene IDs rather than gene names. Therefore, we had to loop through all the detailed results.

disease to model organism

Charles Carey

09 May, 2018