Chado
================================
We have developed an InterMine data source that can use a GMOD Chado database as a source for an InterMine warehouse. The eventual aim is to allow import of any Chado database with some configuration. This will provide a web environment to perform rapid, complex queries on Chado databases with minimal development effort.
Converter
----------
The converter for this source is the `ChadoDBConverter` class. This class controls which `ChadoProcessors` are run. A `ChadoProcessor` class corresponds to a chado module. For example, the sequence module is processed by the `SequenceProcessor` and the stock module is processed by the `StockProcessor`.
Chado tables
--------------------
The `chado-db` source is able to integrate objects from a Chado database. Currently only tables from the `Chado sequence module` and `Chado stock modules` are read.
These tables are queried from the chado database:
`feature`
used to create objects in the ObjectStore
* The default configuration only supports features that have a Sequence Ontology type (eg. `gene`, `exon`, `chromosome`)
* Each new feature in InterMine will be a sub-class of `SequenceFeature`.
`featureloc`
used to create `Location` objects to set `chromosomeLocation` reference in each `SequenceFeature`
`feature_relationship`
used to find `part_of` relationships between features
* this information is used to create parent-child references and collections
* examples include setting the `transcripts` collection in the `Exon` objects and the `gene` reference in the `Transcript` class.
`dbxref` and `feature_dbxref`
used to create `Synonym` objects for external identifiers of features
* the `Synonym`s will be added to the `synonyms` collection of the relevant `SequenceFeature`
`featureprop`
used to set fields in features based on properties
* an example from the FlyBase database: the `SequenceFeature.cytoLocation` field is set using the `cyto_range` feature_prop
`synonym` and `feature_synonym`
used to create extra `Synonym` objects for `chado` synonyms and to set fields in features
* the `Synonym`s will be added to the `synonyms` collection of the relevant `SequenceFeature`
`cvterm` and `feature_cvterm`
used to set fields in features and to create synonyms based on CV terms
`pub`, `feature_pub` and `db`
used to set the `publications` collection in the new `SequenceFeature` objects.
Additionally, the `StockProcessor` class reads the tables from the chado stock module, eg. stockcollection, stock, stock_genotype.
Default configuration
----------------------
The default configuration of `ChadoDBConverter` is to query the `feature` table to only a limited list of types. The list can be changed by sub-classing the `ChadoDBConverter` class and overriding the `getFeatureList()` method. The `featureloc`, `feature_relationship` and `pub` tables will then be queried to create locations, parent-child relationships
and publications (respectively).
Converter configuration
----------------------------------------
Sub-classes can control how the Chado tables are used by overriding the `getConfig()` method and returning a configuration map.
Source configuration
---------------------
Example source configuration for reading from the ''C.elegans'' Chado database:
.. code-block:: xml
Sub-classing the converter
----------------------------------------
The processor classes can be sub-classed to allow organism or database specific configuration. To do that, create your class (perhaps in `bio/sources/chado-db/main/src/`) set the `processors` property in your source element. For example for reading the FlyBase Chado database there is a `FlyBaseProcessor` which can be configured like this:
.. code-block:: xml