InterMine installations accept queries over their data in a custom format known as Path-Queries. This is a graph-based query format which inherits some of its semantics and terminology from SQL.
The core concept of Path-Queries is naturally enough the Path, examples of which are:
In the XML serialization of path-queries, all paths must be completely qualified. In the JSON format a prefix can be specified with the from or root property.
Queries associate paths with various parts of the query:
To define what is retrieved from the data-store, a view is defined. This is simply a list of paths; any information in the data-store graph that matches these paths and satisifies the constraints (see below) will be included in the results.
eg:
<query model="genomic" view="Organism.name Organism.taxonId"/>
{from: "Organism", select: ["name", "taxonId"]}
In any chain of references in a long path such as Gene.sequence.residues or Gene.proteins.proteinDomains.name, may be null. There are two behaviours supported for dealing with null references (ie. where a gene does not have any sequence attached, or it has not proteins, or those proteins have no protein domains).
There are some consequences of using outer joins:
eg:
<query model="genomic" view="Gene.symbol Gene.pathways.identifier">
<join path="Gene.pathways" style="OUTER"/>
</query>
{from: "Gene", select: ["symbol", "pathways.identifier"], joins: ["pathways"]}
By default all values of a given type match a query unless they are excluded by empty references on an inner joined path. To restrict the result set constraints can be used.
The following are examples of constraints on attributes in the data store:
<constraint path="Gene.symbol" op="=" value="eve"/>
<constraint path="Gene.length" op=">" value="12345"/>
<constraint path="Gene.homologues.homologue.organism.taxonId" op="!=" value="7227"/>
<constraint path="Gene.description" op="CONTAINS" value="some term"/>
The json format allows a couple of different mechanisms for describing constraints:
{
select: ["Gene.symbol"],
where: {
"symbol": "eve",
"length": {gt: 12345},
"homologues.homologue.organism.taxonId": {"!=": 7227},
"description": {contains: "some term"}
}
}
or:
{
select: ["Gene.symbol"],
where: [
{path: "symbol", op: "=", value: "eve"},
{path: "length", op: ">", value: 12345},
{path: "homologues.homologue.organism.taxonId", op: "!=", value: 7227},
{path: "description", op: "CONTAINS", value: "some term"}
]
}
or
{
select: ["Gene.symbol"],
where: [
[ "symbol", "=", "eve" ],
[ "length", ">", 12345 ],
[ "homologues.homologue.organism.taxonId", "!=", 7227 ],
[ "description", "CONTAINS", "some term" ]
]
}
One can specifiy that a path resolve to a value matching one (or none) of a set of values:
<constraint path="Gene.symbol" op="ONE OF">
<value>eve</value>
<value>bib</value>
<value>zen</value>
</constraint>
{
select: ["Gene.proteins.name"],
where: {
symbol: ["eve", "bib", "zen"]
}
}
A special sub-type of this kind of constraint is the range constraint:
<constraint path="Gene.chromosomeLocation" op="OVERLAPS">
<value>X:12345..45678</value>
<value>2L:12345..45678</value>
<value>3R:12345</value>
</constraint>
{
select: ["Gene.symbol"],
where: {
chromosomeLocation: {OVERLAPS: ["X:12345..45678", "2L:34567..78654", "3R:12345"]}
}
}
Lookup constraints allow convenient constraints over multiple attributes of a value, or querying when you don’t know the particular attribute you wish to constrain:
<constaint path="Gene" op="LOOKUP" value="eve"/>
{
select: ["Gene.symbol"],
where: [[ "Gene", "LOOKUP", "eve"]]
}
An extra disambiguating value can be supplied. Its meaning depends on context, so for example would limit genes to a particular organism:
<constaint path="Gene" op="LOOKUP" value="eve" extraValue="D. melanogaster"/>
{
select: ["Gene.symbol"],
where: [[ "Gene", "LOOKUP", "eve", "D. melanogaster"]]
}
Nodes in the query graph can be constrained by membership in a stored list. This type of constraint is similar to multi-value constraints, in that we are looking at membership in a set, and also similar to lookup constraints in that we treat entities as subjects of the constraints, rather than values of any of the attributes of the entities. A simple example is selecting all the proteins for genes in a given list:
<constraint path="Protein.genes" op="IN" value="a given list"/>
<!-- Or to exclude those records -->
<constraint path="Protein.genes" op="NOT IN" value="a given list"/>
{
select: ["Protein.*"],
where: [["genes", "IN", "a given list"]]
}
The only relationships that may be asserted are “IN” and “NOT IN”.
Queries can require that two nodes in the query graph refer (or do not refer) to the same entity. This kind of constraint is termed a “Loop” constraint. An example of this is would be to request all the genes in the pathways a given gene is in, so long as they are (or are not) one of the orthologues of the gene in question.
A loop constraint is composed of two paths, and either = or !=.
<constraint path="Gene.homologues.homologue" op="=" value="Gene.pathways.genes"/>
<!-- or -->
<constraint path="Gene.homologues.homologue" op="!=" value="Gene.pathways.genes"/>
{
select: ["Gene.homologues.homologue.*", "Gene.pathways.genes.*"],
where: [
["Gene.symbol", "=", "x"],
["Gene.homologues.homologue", "=", "Gene.pathways.genes"]
]
}
Loop constraints must link paths that are not separated by outer joins.
Type constraints, in addition to limiting the returned results, have the side-effect of type-casting the references in their paths to the given type, enabling other paths to reference otherwise unrefereable fields.
<constraint path="Gene.overlappingFeatures" type="ChromosomeStructureVariation"/>
{
from: "Gene",
select: ["symbol", "overlappingFeatures.element1.primaryIdentifier"],
where: {
overlappingFeatures: "ChromosomeStructureVariation"
}
}
Type constraints may not participate in the constraint logic, and as such never have a code associated with them.
The order of the results can be determined through the sort order:
<query model="genomic" view="Gene.symbol" sortOrder="Gene.length DESC Gene.name ASC"/>
{select: ["Gene.symbol"], sortOrder: [["length", "DESC"], ["name", "ASC"]]}