vignettes/disease_to_model_organism.Rmd
disease_to_model_organism.Rmd
We’ll look for genes in model organisms that might suggest future focus for a human cancer phenotype.
To do this we need to define:
We’ll try to do these using monarch as follows:
In other vignettes, we’ll use some of the results we produce here to pull other related information from monarch and non-monarch resources.
To ensure a richly documented dataset, we have selected a cancer from selected cancer listings at TCGA for which both sample collection is completed and public data available.
Under ‘head and neck’ cancer, we found uveal melanoma. We’ll use this for our example.
Before proceeding further, we wondered: “Are there any animal models of uveal melanoma?”
A quick pubmed search did reveal mouse models for uveal melanoma.
It would be great if we could simply search for the disease by name and retrieve the top result.
Load monarchr.
library(monarchr)
Search for “uveal melanoma” and display top results.
term <- "uveal melanoma"
results <- biolink_search(term)
um <- results$search_results # keep just the results, dropping the 'response' object.
#um
head(um)
## # A tibble: 6 x 13
## definition category label prefix equivalent_iri id score edges
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 A melanoma deri… disease uvea… MONDO http://purl.ob… MOND… 116.… 355
## 2 A melanoma aris… disease mixe… MONDO http://purl.ob… MOND… 105.… 88
## 3 <NA> disease post… MONDO http://purl.ob… MOND… 105.… 15
## 4 A uveal melanom… disease necr… MONDO http://purl.ob… MOND… 105.… 46
## 5 A uveal melanom… disease epit… MONDO http://purl.ob… MOND… 105.… 77
## 6 <NA> disease susc… MONDO http://purl.ob… MOND… 105.… 105
## # ... with 5 more variables: synonym <chr>, `_version_` <chr>,
## # equivalent_curie <chr>, leaf <chr>, iri <chr>
Note that the results are presented in order of the ‘score’ column. This is the same ordering as we observe doing searches at the website.
# subset(um, select = c(id, score, label, category, synonym))
head(subset(um, select = c(id, score, label, category, synonym)), 10)
## # A tibble: 10 x 5
## id score label category synonym
## <chr> <chr> <chr> <chr> <chr>
## 1 MONDO:0006486 116.12919 uveal mel… disease uvea melanoma (disease), M…
## 2 MONDO:0003910 105.42569 mixed cel… disease mixed cell uveal melanoma,…
## 3 MONDO:0003927 105.42569 posterior… disease medium/large size posterio…
## 4 MONDO:0004365 105.42569 necrotic … disease <NA>
## 5 MONDO:0006200 105.42569 epithelio… disease uvea epithelioid cell mela…
## 6 MONDO:0007966 105.42569 susceptib… disease MELANOMA, UVEAL
## 7 MONDO:0004062 97.76348 intermedi… disease intraocular mixed cell typ…
## 8 HP:0007716 66.65321 Intraocul… Phenoty… Uveal melanoma
## 9 MONDO:0002661 62.40225 uveal dis… disease uvea disease, disease of u…
## 10 MP:0010276 58.491478 increased… Phenoty… increased uveal melanoma i…
In this case, the top result is a perfect match for our phrase.
We save the ‘id’ for this top result. The id is the Monarch Disease Onotology (MONDO) ID and will be most useful for pulling other results from monarchr.
We print the definition for ‘uveal melanoma’.
um_id <- um[1,]$id
um[um$id == um_id,]$definition
## [1] "A melanoma derived from melanocytes of the uveal tract. It is the most common primary intraocular tumor in the United States and Western Europe. Similar to melanoma of the skin, it is rare in Africa and Asia. Diagnostic procedures include ophthalmoscopic exam, fluorescein angiography and ultrasound. Treatment includes: surgical excision of the eye, iridocyclectomy and tumor resection. Recent treatments also include radiotherapy or photo coagulation. Classification of uveal melanomas recognizes four cell types within these tumors: epithelioid, intermediate, mixed cell, and spindle cell types. The spindle cell type uveal melanomas are further sub-classified as spindle cell type A and spindle cell type B."
This definition is the same text as given in the ‘Overview’ tab from our search of the website in the previous section, MONDO:0006486.
Fetching the human genes associated with the disease is simple.
um_genes <- bioentity_genes_assoc_w_disease(um_id)
um_genes <- um_genes$genes # ignore the response object.
um_genes
## # A tibble: 11 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## * <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 Homo sapiens HGNC:18274 CYSLTR2 is mark… uveal… MONDO:00… NA
## 2 Homo sapiens HGNC:24308 CLPTM1L contrib… uveal… MONDO:00… NA
## 3 Homo sapiens HGNC:950 BAP1 has phe… uveal… MONDO:00… NA
## 4 Homo sapiens HGNC:3250 EIF1AX is mark… uveal… MONDO:00… NA
## 5 Homo sapiens HGNC:4390 GNAQ has phe… uveal… MONDO:00… NA
## 6 Homo sapiens HGNC:9059 PLCB4 is mark… uveal… MONDO:00… NA
## 7 Homo sapiens HGNC:4390 GNAQ likely_… susce… MONDO:00… NA
## 8 Homo sapiens HGNC:4379 GNA11 likely_… susce… MONDO:00… NA
## 9 Homo sapiens HGNC:8101 OCA2 contrib… uveal… MONDO:00… NA
## 10 Homo sapiens HGNC:4379 GNA11 is mark… uveal… MONDO:00… NA
## 11 Homo sapiens HGNC:10768 SF3B1 is mark… uveal… MONDO:00… NA
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## # provided_by <fct>
The HGNC IDs are most useful for further searches with monarchr. Obviously we can subset for the particular relation we want. For this vignette, we take all relation types.
HGNC derives from Human identifiers by the HUGO gene nomenclaure committee HUGO.
Note, the same gene can be returned more than once if there is more than one relation(?) or MONDO id (see notes).
um_hgnc <- um_genes$subject.id
dups <- um_genes[duplicated(um_genes$subject.id),]$subject.id
um_genes[um_genes$subject.id %in% dups, ]
## # A tibble: 4 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 Homo sapiens HGNC:4390 GNAQ has phen… uveal… MONDO:00… NA
## 2 Homo sapiens HGNC:4390 GNAQ likely_p… susce… MONDO:00… NA
## 3 Homo sapiens HGNC:4379 GNA11 likely_p… susce… MONDO:00… NA
## 4 Homo sapiens HGNC:4379 GNA11 is marke… uveal… MONDO:00… NA
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## # provided_by <fct>
Save the unique gene IDs.
um_genes_ids <- unique(um_genes$subject.id)
Find homologs to each of the human genes.
We’d like to use something like the bioentityset function ‘homologs’ to do this all at once. But it doesn’t seem to be working. I tried with our HGNC genes as well as with the NCBI example genes in their supplied example. We’ll watch as the API evolves. Until then, we have to search for each gene’s homolog one at a time.
homologs <- lapply(um_genes_ids, bioentity_homologs)
homologs <- lapply(homologs, "[[", "homologs")
homologs <- do.call('rbind', homologs)
Subset to some of the more completely researched model organisms. Note that most or all of the mouse and rat are in 1 to 1 orthology relationships, and the other organisms a little less so.
#unique(homologs$object.taxon) # all we can choose from
orgs <- c('Mus musculus', "Rattus norvegicus", "Danio rerio", "Drosophila melanogaster", "Caenorhabditis elegans", "Saccharomyces cerevisiae S288C")
ho <- homologs[homologs$object.taxon %in% orgs,]
ho <- subset(ho, select= c(subject.taxon, subject.id, object.id, object.taxon))
ho <- ho[order(ho$object.taxon),]
split(ho, ho$object.taxon) # list these by organism.
## $`Caenorhabditis elegans`
## # A tibble: 11 x 4
## subject.taxon subject.id object.id object.taxon
## <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:18274 WormBase:WBGene00015000 Caenorhabditis elegans
## 2 Homo sapiens HGNC:18274 WormBase:WBGene00022584 Caenorhabditis elegans
## 3 Homo sapiens HGNC:18274 WormBase:WBGene00018344 Caenorhabditis elegans
## 4 Homo sapiens HGNC:18274 WormBase:WBGene00021328 Caenorhabditis elegans
## 5 Homo sapiens HGNC:18274 WormBase:WBGene00012275 Caenorhabditis elegans
## 6 Homo sapiens HGNC:24308 WormBase:WBGene00011761 Caenorhabditis elegans
## 7 Homo sapiens HGNC:3250 WormBase:WBGene00019162 Caenorhabditis elegans
## 8 Homo sapiens HGNC:4390 WormBase:WBGene00001196 Caenorhabditis elegans
## 9 Homo sapiens HGNC:9059 WormBase:WBGene00001177 Caenorhabditis elegans
## 10 Homo sapiens HGNC:9059 WormBase:WBGene00004037 Caenorhabditis elegans
## 11 Homo sapiens HGNC:10768 WormBase:WBGene00011605 Caenorhabditis elegans
##
## $`Danio rerio`
## # A tibble: 13 x 4
## subject.taxon subject.id object.id object.taxon
## <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:18274 ZFIN:ZDB-GENE-071016-2 Danio rerio
## 2 Homo sapiens HGNC:18274 ZFIN:ZDB-GENE-071016-1 Danio rerio
## 3 Homo sapiens HGNC:24308 ZFIN:ZDB-GENE-040718-75 Danio rerio
## 4 Homo sapiens HGNC:950 ZFIN:ZDB-GENE-050208-492 Danio rerio
## 5 Homo sapiens HGNC:3250 ZFIN:ZDB-GENE-030131-1319 Danio rerio
## 6 Homo sapiens HGNC:3250 ZFIN:ZDB-GENE-041010-185 Danio rerio
## 7 Homo sapiens HGNC:4390 ZFIN:ZDB-GENE-081104-25 Danio rerio
## 8 Homo sapiens HGNC:9059 NCBIGene:100537538 Danio rerio
## 9 Homo sapiens HGNC:9059 ZFIN:ZDB-GENE-060627-1 Danio rerio
## 10 Homo sapiens HGNC:4379 ZFIN:ZDB-GENE-050208-597 Danio rerio
## 11 Homo sapiens HGNC:4379 ZFIN:ZDB-GENE-041121-9 Danio rerio
## 12 Homo sapiens HGNC:8101 ZFIN:ZDB-GENE-070718-4 Danio rerio
## 13 Homo sapiens HGNC:10768 ZFIN:ZDB-GENE-040827-3 Danio rerio
##
## $`Drosophila melanogaster`
## # A tibble: 14 x 4
## subject.taxon subject.id object.id object.taxon
## <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:18274 FlyBase:FBgn0039595 Drosophila melanogaster
## 2 Homo sapiens HGNC:24308 FlyBase:FBgn0030456 Drosophila melanogaster
## 3 Homo sapiens HGNC:950 FlyBase:FBgn0262166 Drosophila melanogaster
## 4 Homo sapiens HGNC:3250 FlyBase:FBgn0026250 Drosophila melanogaster
## 5 Homo sapiens HGNC:4390 FlyBase:FBgn0050054 Drosophila melanogaster
## 6 Homo sapiens HGNC:4390 FlyBase:FBgn0033756 Drosophila melanogaster
## 7 Homo sapiens HGNC:4390 FlyBase:FBgn0004435 Drosophila melanogaster
## 8 Homo sapiens HGNC:9059 FlyBase:FBgn0262738 Drosophila melanogaster
## 9 Homo sapiens HGNC:8101 FlyBase:FBgn0041150 Drosophila melanogaster
## 10 Homo sapiens HGNC:8101 FlyBase:FBgn0036329 Drosophila melanogaster
## 11 Homo sapiens HGNC:8101 FlyBase:FBgn0051693 Drosophila melanogaster
## 12 Homo sapiens HGNC:8101 FlyBase:FBgn0035332 Drosophila melanogaster
## 13 Homo sapiens HGNC:8101 FlyBase:FBgn0031649 Drosophila melanogaster
## 14 Homo sapiens HGNC:10768 FlyBase:FBgn0031266 Drosophila melanogaster
##
## $`Mus musculus`
## # A tibble: 9 x 4
## subject.taxon subject.id object.id object.taxon
## <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:18274 MGI:1917336 Mus musculus
## 2 Homo sapiens HGNC:24308 MGI:2442892 Mus musculus
## 3 Homo sapiens HGNC:950 MGI:1206586 Mus musculus
## 4 Homo sapiens HGNC:3250 MGI:1913485 Mus musculus
## 5 Homo sapiens HGNC:4390 MGI:95776 Mus musculus
## 6 Homo sapiens HGNC:9059 MGI:107464 Mus musculus
## 7 Homo sapiens HGNC:4379 MGI:95766 Mus musculus
## 8 Homo sapiens HGNC:8101 MGI:97454 Mus musculus
## 9 Homo sapiens HGNC:10768 MGI:1932339 Mus musculus
##
## $`Rattus norvegicus`
## # A tibble: 9 x 4
## subject.taxon subject.id object.id object.taxon
## <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:18274 RGD:619797 Rattus norvegicus
## 2 Homo sapiens HGNC:24308 RGD:1307896 Rattus norvegicus
## 3 Homo sapiens HGNC:950 RGD:1311938 Rattus norvegicus
## 4 Homo sapiens HGNC:3250 RGD:1560198 Rattus norvegicus
## 5 Homo sapiens HGNC:4390 RGD:620770 Rattus norvegicus
## 6 Homo sapiens HGNC:9059 RGD:3345 Rattus norvegicus
## 7 Homo sapiens HGNC:4379 RGD:619749 Rattus norvegicus
## 8 Homo sapiens HGNC:8101 RGD:2318412 Rattus norvegicus
## 9 Homo sapiens HGNC:10768 RGD:620148 Rattus norvegicus
##
## $`Saccharomyces cerevisiae S288C`
## # A tibble: 2 x 4
## subject.taxon subject.id object.id object.taxon
## <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:3250 SGD:S000004873 Saccharomyces cerevisiae S288C
## 2 Homo sapiens HGNC:10768 SGD:S000004901 Saccharomyces cerevisiae S288C
Monarch Initiative has indicated that while only 51% of human genes have an annotation associated with them, if one includes annotations of orthologs to those genes in other organisms, 89% have some annotation Monarch Initiative Nucleic Acids Research paper.
A possible inference from that is that by looking for interactions in other organisms, we might extend what we know about interactions for humans.
Of course we also want interactions in humans.
Find all interactions within each species.
all_genes <- c(um_genes_ids, ho$object.id)
all_intxns <- lapply(all_genes, bioentity_interactions_assoc_w_gene)
all_intxns <- lapply(all_intxns, "[[", "interactions")
all_intxns <- do.call('rbind', all_intxns)
all_intxns <- all_intxns[all_intxns$subject.taxon == all_intxns$object.taxon, ]
# unique(all_intxns$relation) # "interacts with" "genetically interacts with"
# nrow(all_intxns) # 3456
# length(all_intxns$object.id) # 3456
# length(unique(c(all_intxns$object.id, all_intxns$subject.id))) # 2677
# length(unique(all_intxns$object.id)) # 2646
all_intxns_split <- split(all_intxns, all_intxns$subject.taxon)
lapply(all_intxns_split, nrow)
## $`Caenorhabditis elegans`
## [1] 172
##
## $`Danio rerio`
## [1] 738
##
## $`Drosophila melanogaster`
## [1] 519
##
## $`Homo sapiens`
## [1] 720
##
## $`Mus musculus`
## [1] 679
##
## $`Rattus norvegicus`
## [1] 570
Note, sometimes interactions are reported across species. We dropped those.
3484 interactions were found, 720 in human. Including both the subject and object genes, we have a total set of 2755 genes. We didn’t find interactions for yeast. We don’t know if the lack of yeast interactions is a monarch shortcoming or indicative of there really being no interactions for these particular yeast genes.
(TODO: evaluate counts inline as they’ll change with future updates to monarch initiative’s data sources.)
This is getting to be a large dataset. Let’s continue by focusing on only one of the original human uveal melanoma genes and its orthologs and interactors of those.
We’ll focus on the human gene ‘BAP1’, which has the unique id ‘HGNC:950’.
bap1 <- "HGNC:950"
bap1_homologs <- homologs[homologs$subject.id == bap1,] # includes paralogs.
# all_intxns[all_intxns$subject.id %in% bap1_homologs$object.id,] # does not include interactions to human bap1
# Include interacations to human bap1 and all orthologs of bap1.
bap1_homolog_intxns <- all_intxns[all_intxns$subject.id %in% c(bap1, bap1_homologs$object.id),]
bap1_homolog_intxns_split <- split(bap1_homolog_intxns, bap1_homolog_intxns$object.taxon)
lapply(bap1_homolog_intxns_split, nrow)
## $`Danio rerio`
## [1] 64
##
## $`Drosophila melanogaster`
## [1] 100
##
## $`Homo sapiens`
## [1] 97
##
## $`Mus musculus`
## [1] 85
##
## $`Rattus norvegicus`
## [1] 50
We found 97 interactions in humans and 50-100 in other organisms.
Note! Some of these counts are suspiciously close to our default max_rows used to query monarch. We could go back and change the all_intxns to allow for more rows, but we’ll continue on, understanding we are limiting our results, for now.
TODO: yeah, we should go back and redo that, but we only need BAP1 interactions.
Each of these tibbles represent graphs with a central node of BAP1 (or BAP1’s orthologs) connected to each of its interactors.
We can continue from here to look at qualities of these BAP1 centralized network in data from each organism. That is left as an exercise.
We try to fix our limited results, and proceed only with mouse and human BAP1, as there are lots of interactions.
bap1_and_mouse_bap1 <- c(bap1, "MGI:1206586")
hm_bap1_homolog_intxns <- lapply(bap1_and_mouse_bap1, bioentity_interactions_assoc_w_gene, rows = 3000)
hm_bap1_intxns <- lapply(hm_bap1_homolog_intxns, "[[", "interactions")
hm_bap1_intxns <- do.call('rbind', hm_bap1_intxns)
split(hm_bap1_intxns, hm_bap1_intxns$subject.id)
## $`HGNC:950`
## # A tibble: 148 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Homo sapiens HGNC:950 BAP1 interac… IPO4 HGNC:194… Homo sapiens
## 2 Homo sapiens HGNC:950 BAP1 interac… USP28 HGNC:126… Homo sapiens
## 3 Homo sapiens HGNC:950 BAP1 interac… MAP3K1 HGNC:6848 Homo sapiens
## 4 Homo sapiens HGNC:950 BAP1 interac… XRCC3 HGNC:128… Homo sapiens
## 5 Homo sapiens HGNC:950 BAP1 interac… PSMB2 HGNC:9539 Homo sapiens
## 6 Homo sapiens HGNC:950 BAP1 interac… AHCYL2 HGNC:222… Homo sapiens
## 7 Homo sapiens HGNC:950 BAP1 interac… BMI1 HGNC:1066 Homo sapiens
## 8 Homo sapiens HGNC:950 BAP1 interac… TSNAX HGNC:123… Homo sapiens
## 9 Homo sapiens HGNC:950 BAP1 interac… PHC1 HGNC:3182 Homo sapiens
## 10 Homo sapiens HGNC:950 BAP1 interac… HIST2… HGNC:4736 Homo sapiens
## # ... with 138 more rows, and 3 more variables: evidence <fct>,
## # publications <fct>, provided_by <fct>
##
## $`MGI:1206586`
## # A tibble: 116 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Mus musculus MGI:12065… Bap1 interac… Phc1 MGI:1032… Mus musculus
## 2 Mus musculus MGI:12065… Bap1 interac… Mpnd MGI:1915… Mus musculus
## 3 Mus musculus MGI:12065… Bap1 interac… Psma3 MGI:1048… Mus musculus
## 4 Mus musculus MGI:12065… Bap1 interac… Psmd6 MGI:1913… Mus musculus
## 5 Mus musculus MGI:12065… Bap1 interac… Gm5239 MGI:3648… Mus musculus
## 6 Mus musculus MGI:12065… Bap1 interac… Brca1 MGI:1045… Mus musculus
## 7 Mus musculus MGI:12065… Bap1 interac… DNAAF5 HGNC:260… Homo sapiens
## 8 Mus musculus MGI:12065… Bap1 interac… Foxk1 MGI:1347… Mus musculus
## 9 Mus musculus MGI:12065… Bap1 interac… Hist2… MGI:2448… Mus musculus
## 10 Mus musculus MGI:12065… Bap1 interac… Psma4 MGI:1347… Mus musculus
## # ... with 106 more rows, and 3 more variables: evidence <fct>,
## # publications <fct>, provided_by <fct>
We found all the possible BAP1 interactions as neither mouse nor human exceed our row limits.
We restrict to only within organism matches.
hm_bap1_intxns <- hm_bap1_intxns[hm_bap1_intxns$subject.taxon == hm_bap1_intxns$object.taxon,]
Now, we get a little crazy, and take the point of view that each organism’s BAP1 centralized network represents a version of all networks which might be relevant (or not) to human BAP1.
We project the interactions back to human to get their human orthologs.
We already have all the human interactors to BAP1, we just save them with a new name.
We need to project the mouse interactions back to human.
bap1_human_intxns <- hm_bap1_intxns[hm_bap1_intxns$subject.taxon == "Homo sapiens",]
bap1_orthos_mouse <- hm_bap1_intxns[hm_bap1_intxns$subject.taxon == "Mus musculus",]
# we add back the mouse bap1 homolog to all the other mouse genes.
bap1_orthos_mouse_ids <- unique(c(bap1_orthos_mouse$subject.id,
bap1_orthos_mouse$object.id))
# x <- lapply(bm, bioentity_homologs, homolog_taxon="NCBITaxon:9606")
# Nicer to server with sleep.
bap1_orthos_mouse2human <- lapply(bap1_orthos_mouse_ids,
function(x) {
r <- bioentity_homologs(gene = x, homolog_taxon = "NCBITaxon:9606", rows = 3000)
Sys.sleep(0.1)
r
})
m2h <- lapply(bap1_orthos_mouse2human, "[[", "homologs")
m2h <- do.call('rbind', m2h)
nrow(m2h) # 105
## [1] 105
length(unique(m2h$object.id)) # 102
## [1] 102
We note imperfect 1 to 1 orthology led to the discrepancy in our counts above. Some genes have multiple orthology relationships.
m2h[m2h$object.id %in% m2h[duplicated(m2h$object.id),]$object.id,]
## # A tibble: 6 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Mus musculus MGI:1195985 Cbx4 in 1 to… CBX4 HGNC:1554 Homo sapiens
## 2 Mus musculus MGI:3649015 Gm4950 in orth… PSMB3 HGNC:9540 Homo sapiens
## 3 Mus musculus MGI:3642386 Gm9774 in orth… ADRM1 HGNC:157… Homo sapiens
## 4 Mus musculus MGI:1196439 Cbx7 in orth… CBX4 HGNC:1554 Homo sapiens
## 5 Mus musculus MGI:1929289 Adrm1 in 1 to… ADRM1 HGNC:157… Homo sapiens
## 6 Mus musculus MGI:1347014 Psmb3 in 1 to… PSMB3 HGNC:9540 Homo sapiens
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## # provided_by <fct>
When we project these mouse interactors to BAP1 back to human, we have a projected network of 102 unique human genes.
Recall we had a network of 98 human genes using only the human interactions. (97 interactions + BAP1 as the hub)
What is the difference between the human network and the projected mouse network?
Note, we’re switching to working with the gene symbols rather than HGNC IDs.
h_genes <- c(bap1_human_intxns$object, "BAP1")
m2h_genes <- m2h$object
Common to both.
length(intersect(h_genes, m2h_genes))
## [1] 87
Altogether.
length(union(h_genes, m2h_genes))
## [1] 160
Unique within the human network.
nrow(setdiff(h_genes, m2h_genes))
## NULL
unique within the mouse genes.
uniq_mouse <- setdiff(m2h_genes, h_genes)
uniq_mouse
## [1] "KIAA2012" "BCORP1" "COMMD3-BMI1" "HIST2H2AA4" "PRKCA"
## [6] "NFRKB" "OTUD4" "SMOX" "KDM1A" "PAOX"
## [11] "YY2" "PIP5K1C" "ANKRD2" "ANKRD1" "ASB4"
We expand the potential network of genes interact with BAP1 by considering interactions in mouse, and projecting those interactions back to human.
To the extent anything is known of the new genes (the unique to mouse genes) vis a vis cancer or melanoma, these genes might be useful to pursue further. Alternatively, if nothing is known with respect to cancer for these genes, they might be interesting as novel targets for discovery.
In the appendix, we note that MONDO:0005105 is the identifier for Melanoma.
We ask if any of the genes unique to the mouse network (as projected to human) are noted as melanoma genes.
melanoma <- bioentity_genes_assoc_w_disease("MONDO:0005105")
mel_genes <- unique(melanoma$genes$subject) # gene symbols
length(mel_genes)
## [1] 52
intersect(mel_genes, uniq_mouse)
## character(0)
No new melanoma genes from the mouse network. This isn’t too surprising as relatively few genes of any sort have been associated with melanoma according to the MONDO ontology.
Nonetheless, the genes might be useful for futher study with regards to melanoma.
Cancer has MONDO id ‘MONDO:0004992’.
We suspect there are lots of genes tagged for cancer.
Indeed, https://monarchinitiative.org/disease/MONDO:0004992 shows 3055.
We need them all instead of the default 100 that monarchr provides. But we need to play nice with the server (either because of server limits or because of maximum size limits for the json, or both.)
Note, we get far more than 3035 genes. The Monarch Initiative website count included a handfulfrom non-human organims. We narrowed them to human, and took only the unique results.
#TODO: user shouldn't have to do this, wrap it in all methods instead.
check_for_more = TRUE
start_row = 0
group = 1
results = list()
while(check_for_more == TRUE) {
# print(paste("fetching", group, "start", start_row)) # Monitor progress when using interactively.
cancer <- bioentity_genes_assoc_w_disease("MONDO:0004992",
rows = 100,
start=start_row,
fetch_objects = FALSE)
if (nrow(cancer$genes) < 100) {
check_for_more <- FALSE
}
start_row <- start_row + 100
group <- group + 1
results <- c(results, cancer$genes[cancer$genes$subject.taxon == "Homo sapiens", ]$subject)
Sys.sleep(0.5)
}
cancer_genes <- unique(unlist(results))
intersect(cancer_genes, uniq_mouse)
## [1] "PRKCA" "KDM1A"
These 2 genes are BAP1 interactors in mouse. They are not noted as BAP1 interactors in human. In human, they are noted as having some association with cancer.
Because we now have gene symbols, instead of gene IDs, further searches with monarchr are difficult.
We use mygene.info to get back to an HGNC identifier.
suppressMessages(library(mygene))
new_genes <- intersect(cancer_genes, uniq_mouse)
# queryMany(new_genes, scopes = "symbol", species = "human", fields = "all")
new_genes_myinfo <- queryMany(new_genes, scopes = "symbol", species = "human", fields = "HGNC")$HGNC
## Finished
new_genes_hgnc <- paste0("HGNC:", new_genes_myinfo)
# We need to give some time for mygene.info to respond.
Sys.sleep(1)
Then we explore how these candidate human BAP1 interactors are annotated for disease.
new_genes <- lapply(new_genes_hgnc, bioentity_diseases_assoc_w_gene)
new_genes <- lapply(new_genes, "[[", "diseases")
new_genes <- do.call(rbind, new_genes)
new_genes
## # A tibble: 9 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 Homo sapiens HGNC:9393 PRKCA contribu… osteo… MONDO:00… NA
## 2 Homo sapiens HGNC:9393 PRKCA is marke… intes… MONDO:00… NA
## 3 Homo sapiens HGNC:9393 PRKCA contribu… gout MONDO:00… NA
## 4 Homo sapiens HGNC:9393 PRKCA is marke… nephr… MONDO:00… NA
## 5 Homo sapiens HGNC:9393 PRKCA contribu… post-… MONDO:00… NA
## 6 Homo sapiens HGNC:29079 KDM1A is marke… hyper… MONDO:00… NA
## 7 Homo sapiens HGNC:29079 KDM1A has phen… palat… MONDO:00… NA
## 8 Homo sapiens HGNC:29079 KDM1A is marke… non-s… MONDO:00… NA
## 9 Homo sapiens HGNC:29079 KDM1A is marke… colon… MONDO:00… NA
## # ... with 3 more variables: evidence <fct>, publications <fct>,
## # provided_by <fct>
Of the new genes of interest, KDM1A has the most cancer disease terms associated with it.
What are some of the phenotypes associated with either of the new genes.
new_genes <- lapply(new_genes_hgnc, bioentity_phenotypes_assoc_w_gene)
new_genes <- lapply(new_genes, "[[", "phenotypes")
new_genes <- do.call("rbind", new_genes)
new_genes
## # A tibble: 41 x 10
## subject.taxon subject.id subject relation object object.id object.taxon
## <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 Homo sapiens HGNC:9393 PRKCA has phe… Neopl… HP:00117… NA
## 2 Homo sapiens HGNC:9393 PRKCA contrib… neuro… EFO:0005… NA
## 3 Homo sapiens HGNC:9393 PRKCA contrib… coron… EFO:0004… NA
## 4 Homo sapiens HGNC:29079 KDM1A has phe… Promi… HP:00112… NA
## 5 Homo sapiens HGNC:29079 KDM1A has phe… Synop… HP:00006… NA
## 6 Homo sapiens HGNC:29079 KDM1A has phe… Blue … HP:00005… NA
## 7 Homo sapiens HGNC:29079 KDM1A has phe… Crypt… HP:00000… NA
## 8 Homo sapiens HGNC:29079 KDM1A has phe… Highl… HP:00025… NA
## 9 Homo sapiens HGNC:29079 KDM1A has phe… Cereb… HP:00124… NA
## 10 Homo sapiens HGNC:29079 KDM1A has phe… Antev… HP:00004… NA
## # ... with 31 more rows, and 3 more variables: evidence <fct>,
## # publications <fct>, provided_by <fct>
With a neoplasm listed for PKCA, maybe we were too hasty to suggest focusing more on KDM1A.
We think this sufficiently demonstrates the value of looking at the multiple types of annotations as are readily available from Monarch Initiative.
We note that both of these genes are subjects of intensive study in a variety of organisms. What is novel to our discovery here is that in the case of BAP1, we might want to consider each of these (PKCA, KDM1A) as possible interactors in human. And by extension of BAP1’s role in uveal melanoma, possible roles in uveal melanoma.
Finally, we looked at interactions of only 1 (BAP1) of the 9 ‘melanoma’ genes in detail. And we only projected interactors from one organism (mouse) back to human. If we wanted to cast as wide a net as possible, we could repeat these steps using all model organisms and interactions to all of the 9 melanoma genes. Of course we could also go in the other direction, identifying new research targets or questions for model organisms based on orthology to humans.
If we were using the Monarch Initiative Website Directly, we would:
Note the following tabs and counts at the website:
We’ll be interested in the overview, to make sure we found the disease we are targeting. And we’ll want the genes to move onto additional steps.
In addition, we see that uveal melanoma is classified under ‘melanoma’ which is classified under ‘cancer’. We make note of these MONDO IDs for validation or analysis steps in this or other vignettes.
Note, the entire MONDO ontology can be found here: MONDO.
Note, when we searched using monarchr the (label, category, synonym) columns in the tibble correspond to (Term, Category, Matching String) columns as observed at the website result. The exception is that the website displayed only one of possibly many synonyms until we drilled further into the results. (TODO(?): How is the particular synonym chosen by monarch? It is not obvious.).
Note, the number of unique genes (9) for uveal melanoma using monarchr is the same as listed under the Genes tab at the website.
When we searched for MONDO:0006486, we also returned genes for MONDO:0007966. Searching monarch’s website for MONDO:0007966 we see it is directly below the MONDO:0006486 ‘uveal melanoma’ term. So this makes sense, monarchr is returning both the genes at the disease ontology level we requested, and all those underneath it as well. You can confirm this by clicking on each of the nodes and leafs at the website below uveal melonoma. The only leaf that shows genes is ‘susceptibility to melanoma’, all other genes are classified solely at the higher level of ‘uveal melanoma’. (That isn’t the entire story, we do see some variants and phenotypes at the other nodes and leaves.)
TODO: We haven’t seen any server limits in the headers from our TODO: previous requests, but it is still on us to play nice.
We note that in the json when we searched for cancer genes using the MONDO term, cancer$response$content$objects" gets all 3055 unique objects, even when asking for a limited number of rows. This suggests a shortcut, where we don’t have to query for all information.
However, these “cancer$response$content$objects” are in form of gene IDs rather than gene names. Therefore, we had to loop through all the detailed results.