Data integration works by using keys for each class of object to define equivalence for objects of that class. For example:
For each Gene object loaded, a query is performed in the database to find any existing Gene objects with the same primaryIdentifier. If any are found, fields from both objects are merged and the resulting object stored.
Many performance optimisation steps are applied to this process. We don’t actually run a query for each object loaded, requests are batched and queries can be avoided completely if the system can work out no integration will be needed.
We may also load data from some other source that provides information about genes but doesn’t use the identifier scheme we have chosen for primaryIdentifier. Instead it only knows about the symbol, in that case we would want that source to use the symbol to define equivalence for Gene.
Important points:
See Primary Keys for more information.