TITLE: A Coordinate Mapping Service for DAS/2 Author: Lincoln Stein (lstein@cshl.org) Date: 13 August 2001 The single most frequently-requested feature at Hinxton was the ability to transform a set of coordinates in one genomic assembly into a set of coordinates in another using a DAS-like service. There actually seem to be two things that people want: 1) transforming coordinates from one assembly to another, predicated on having the same set of raw sequence identifiers This is the easier of the two. Given two assemblies that use the same basic substrates, project a range interval from one coordinate system onto the other. The canonical example is the UCSC assembly versus the NCBI assembly. Provided that both the assemblies used the same substrate, for example, a given draft "freeze", this service can be accomplished with at either the server or the client side. The basic algorithm is: a) Using assembly A, map the two ends of the interval onto smallest-physical-unit (SPU) coordinates (e.g. Genbank accessions) b) Using assembly B, map SPU coordinates up to top level (TL) coordinates (e.g. chromosomal coordinates). c) Still using assembly B, map TL coordinates into SPU coordinates. d) Report changed coordinates in assembly B's SPU coordinates. Report an exception if the interval size or orientation has changed, or if (b) could not be accomplished because the SPU reference sequence is not shared between A and B. 2) transforming coordinates from one assembly to another, where the raw sequence identifiers are not shared Typical of this situation is comparing the public to the Celera assemblies, and comparing two assemblies taken from different freezes of the human draft. In my view, this problem imposes the requirement that there be a big reference server in place that can discover the relationship between two arbitrary chunks of DNA, whether they be two versions of a Genbank accession, or two chunks of an assembly. For example, I can envision having a BLAT or SSAHA server around that will take the DNA defined by an interval on assembly A, align it to assembly B, find the breakpoints, and then do the appropriate coordinate transformation. This could be done on the fly, or done in advance using some huge lookup table. Either way, though, it sounds like a server-side service to me. The problem with this approach is that it (i) uses lots of computation; and (ii) is a subset of the assembly problem itself, and likely to be subject to the usual assembly artefacts. I think that DAS/2 should support (a)-style coordinate mapping and be in the core set of services offered. I'm not so sure about (b). Should arbitrary assembly transformations be on the list of requested features? Lincoln -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY Positions available at my lab: see http://stein.cshl.org/#hire ========================================================================