Application Ontologies (APO)

The group of Semantic Systems Biology (NTNU, Norway) hosts four APOs (http://www.semantic-systems-biology.org/apo ):
the Cell Cycle Ontology (CCO),
the Gene Expression Ontology (GeXO),
the Regulation of Gene Expression Ontology (ReXO),
the Regulation of Transcription Ontology (ReTO).

These ontologies, unlike domain ontologies, incorporate as well data pertinent to the domain of discourse.

The ontological core

All the four share a small Upper Level Ontology (ULO) which is built from a limited number of terms taken from well established ontologies.

Below is the complete list of terms (term_id => [ label, parent_id ]):
'SIO:000000' => [ 'entity', 'SIO:000000' ],
'SIO:000003' => [ 'physical entity', 'SIO:000000' ],
'SIO:000260' => [ 'abstract entity', 'SIO:000000' ],
'SIO:000002' => [ 'processual entity', 'SIO:000003' ],
'SIO:000004' => [ 'material entity', 'SIO:000003' ],
'SIO:000614' => [ 'attribute', 'SIO:000260' ],
'SIO:000006' => [ 'process', 'SIO:000002' ],
'SIO:010004' => [ 'chemical entity', 'SIO:000004' ],
'SIO:010046' => [ 'biological entity', 'SIO:000004' ],
'SIO:000340' => [ 'realizable entity', 'SIO:000614' ],
'SIO:011125' => [ 'molecule', 'SIO:010004' ],
'SIO:010441' => [ 'submolecule', 'SIO:010004' ],
'SIO:000112' => [ 'capability', 'SIO:000340' ],
'SIO:010001' => [ 'cell', 'SIO:010046' ],
'SIO:010074' => [ 'amino acid residue', 'SIO:010441' ],
'SIO:000017' => [ 'function', 'SIO:000112' ],
'SIO:000014' => [ 'disposition', 'SIO:000112' ],
'MI:0190' => [ 'interaction type', 'SIO:000006' ],
'GO:0005575' => [ 'cellular component', 'SIO:010046' ],
'GO:0008150' => [ 'biological process', 'SIO:000006' ],
'GO:0003674' => [ 'molecular function', 'SIO:000017' ],
'SIO:010043' => [ 'protein', 'SIO:011125' ],
'PR:000025513' => [ 'modified amino-acid residue', 'SIO:010074' ],
'SIO:010035' => [ 'gene', 'SIO:010441' ],
'SIO:010000' => [ 'organism', 'SIO:010046' ],
'OGMS:0000031' => [ 'disease', 'SIO:000014' ]

The scope of each APO is determined by the Biological Process Gene Ontology terms listed below, included in the APOs together with all their descendants (along ALL the relationship types used in the Gene Ontology).

CCO:
'GO:0007049' => 'cell cycle',
'GO:0051301' => 'cell division',
'GO:0008283' => 'cell proliferation',
'GO:0006261' => 'DNA-dependent DNA replication'

GeXO:
'GO:0010467' => 'gene expression'

ReXO:
'GO:0010468' => 'regulation of gene expression process'

ReTO:
'GO:0006355' => 'regulation of transcription, DNA-dependent'

All the four APOs include the complete Molecular Function and Cellular Component branches of the Gene Ontology and the Interaction Type branch of the Molecular Interactions Ontology.

Original IDs are re-used throughout instead of minting APO specific ones.

The data

With respect to the data all the four APOs are protein-centric. They import data from the following sources: GOA, IntAct, UniProt, Entrez

Additionally, orthology relations are computed with the use of the OrthAgogue utility.

In each case the data are filtered by the scope of the ontology as defined above. The initial set of proteins imported from GOA is further extended by applying the 'guilt-by-association' principle to the IntAct and orthology data.

The APOs include data for the following biological species (term_id => [term_name, term_def]):
CCO:
'NCBITaxon:559292' => [ 'Saccharomyces cerevisiae', 'An organism of the species Saccharomyces cerevisiae']
'NCBITaxon:284812' => [ 'Schizosaccharomyces pombe', 'An organism of the species Schizosaccharomyces pombe']
'NCBITaxon:3702' => [ 'Arabidopsis thaliana', 'An organism of the species Arabidopsis thaliana']
'NCBITaxon:6239' => [ 'Caenorhabditis elegans', 'An organism of the species Caenorhabditis elegans']
'NCBITaxon:7227' => [ 'Drosophila melanogaster', 'An organism of the species Drosophila melanogaster']
'NCBITaxon:8364' => [ 'Xenopus tropicalis', 'An organism of the species Xenopus tropicalis']
'NCBITaxon:9606' => [ 'Homo sapiens', 'An organism of the species Homo sapiens']
'NCBITaxon:10090' => [ 'Mus musculus', 'An organism of the species Mus musculus']

GEXO, REXO, RETO:
'NCBITaxon:9606' => [ 'Homo sapiens', 'An organism of the species Homo sapiens']
'NCBITaxon:10090' => [ 'Mus musculus', 'An organism of the species Mus musculus']
'NCBITaxon:10116' => [ 'Rattus Norvegicus', 'An organism of the species Rattus Norvegicus']

Original IDs are re-used throughout with the single exception for modified amino-acid residue terms. The following name spaces are used to form term IDs:
IntAct => interaction terms
NCBIGene => gene terms
NCBITaxon => biological species terms
OMIM => disease terms
SSB => modified amino-acid residue terms
UniProt => protein terms

Terms are related to each other with the use of properties from RO, BFO, SIO as explained below:
'protein term' => ['RO:0002331', 'involved_in'] => 'GO term, biological_process'
'protein term' => ['BFO:0000050', 'part_of'] => 'GO term, cellular_component'
'protein term' => ['RO:0002327', 'enables'] => 'GO term, molecular_function'
'protein interaction term' => ['SIO:000139', 'has agent'] => 'protein term'
'protein term' => ['SIO:000558', 'is orthologous to'] => 'protein term'
'protein term' => ['SIO:000630', 'is paralogous to'] => 'protein term'
'protein term' => ['RO:0000053', 'bearer of'] => 'modified amino-acid residue term'
'protein term' => ['RO:0002331', 'involved in'] => 'disease term'
'protein term' => ['RO:0000052', 'inheres in'] => 'biological species term'
'gene term' => ['RO:0000052', 'inheres in'] => 'biological species term'
'gene term' => ['SIO:010078', 'encodes'] => 'protein term'