InterMine comes with several data converter for homologue data, e.g. TreeFam, PANTHER, OrthoDB, Homlogene, etc. Follow the instructions below to include these datasets in your InterMine.
The default rule for bio-InterMine is to put the MOD identifiers (eg. MGI:XXX or ZDB-GENE-XXX) in the primaryIdentifier field. This is tricky because some homologue sources use the Ensembl identifiers (Ensembl identifiers belong in the Gene.crossReferences collection).
To solve this problem, each homologue source uses the NCBI identifier resolver. This resolver takes the Ensembl ID and replaces it with the corresponding MOD identifier.
Make sure permissions on the file are correct so the build process can read this file.
$ cd /DATA_DIR/idresolver/ $ ln -s /DATA_DIR/ncbi/gene_info entrez $ ln -s /DATA_DIR/human/identifiers humangene
See Id Resolvers for details on how ID resolvers work in InterMine.
The entrez identifiers file appears to only have the sequence identifier for worm instead of the WBgene identifier
Alternately you can load identifier sources.
Here are the download scripts we use here at InterMine:
We use WormMart but are happy to hear of a better source for worm identifiers.
Here are the project XML entries used by FlyMine: