RFC007 TITLE: Segmental mapping for DAS/2 Author: Thomas Down, Matthew Pocock ({td2,mrp}@sanger.ac.uk) Dependancies: Sequence communication, Directory services (RFC2) Version: 1 Date: 15 August 2001 Requirements ------------ Two very commonly requested features in the DAS world are: - Making annotation robust against versioning changes in the underlying assembly. (Requested by almost everyone at Hinxton). - Support for data integration across genomes. e.g. using annotation from one species as a baseline for investigating another. In addition, a third, closely related, issue has arisen on a number of occasions: - Mapping annotation from proteins onto genomic coordinates. This is particularly important in the case of data providers who work on domain or active-site annotation. The existing DAS/1 technology is based around the concept of a browser (or other client-side application) mapping annotation into a common coordinate system for browsing. We believe that DAS/2 should extend this approach to address the issues described above. Mechanisms ---------- There are two issues involved in all versions of the annotation-mapping problem: - Deriving a set of corresponding segments between two (or more) coordinate systems. - A segment-mapping-aware client can retrieve this information and apply this information to the transformation of annotation between systems. The first part falls well outside the scope of any DAS protocol, since a wide variety of methods could be used depending on the nature of the problem and the level of accuracy required. DAS clients should not need to know anything about the method used, except that metadata should be available to present to the user if requested. Once segmental mappings have been derived, they should be communicated using a simple XML format, hopefully playing well with feature-table and querying technologies chosen for the communication of annotation is DAS/2. In addition to the case of `simply' mapping annotation between coordinate systems, this information should make it possible to provide powerful comparitive genomics displays, such as the Artemis Comparison Tool (ACT), backed by the DAS/2 data cloud. The problem of mapping annotation is, comparatively, quite simple. There are three important cases to consider: - Colinear segments (e.g. regions of agreement between two assemblies of the same genome). - `Fuzzy' segments, such as regions of synteny between species. Handling these is essentially equivalent to colinear segments, but they should be recognized specially so that the user can be warned when a fuzzy mapping is in use. - Defined-relationship mappings. The particular case that is important here is mapping between genomic and protein coordinates. DAS clients already include code for performing a type of colinear-segment mapping when handling assembled genomes. We believe that is should be relatively simple to extend this to cover the other cases. Relationship to other DAS/2 services ------------------------------------ Segmental mapping information offers a bridge between two or more reference services, each of which is providing a particular coordinate system for a particular genome. Servers offering segmental mapping between a set of reference services provide a separate service class within the DAS/2 cloud. A directory service mechanism should store information about available mapping services, and the reference genomes that they link. This allows a client to automatically build a mapping path between any two coordinate systems (so long as the data is available), and provide the highest level of data integration without user intervention.