DAS/2.1/Spec

From BioDAS
Revision as of 03:18, 30 October 2008 by SteveC (talk) (Importing text from the original das2_protocol.html, with some mods)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Overview

The DAS 2.0 specification addresses some shortcomings of the DAS 1.x protocol, including:

  • Better support for hierachical structures (e.g. gene -> transcripts -> exons)
  • Ontology-based feature annotations
  • Allow multiple formats, including formats only appropriate for some feature types
  • A lock-based editing protocol for curational clients
  • An extensible namespacing system that allows annotations in non-genomic coordinates (e.g. uniprot protein coordinates or PDB structure coordinates)

DAS is a protocol for sharing biological data. This version of the specification, DAS 2.0, describes features located on the genomic sequence. Future versions will add support for sharing annotations of protein sequences, expression data, 3D structures and ontologies. The genomic DAS interface is deliberately designed so there will be a large core shared with the protein sequence DAS.

A DAS 2.0 annotation server provides feature information about one or more genome sources. Each source may have one or more versions. Different versions are usually based on different assemblies. As an implementation detail an assembly and corresponding sequence data may be distributed via a different machine, which is called the reference server.

Annotations are located on the genomic sequence with a start and end position. The range may be specified multiple times if there are alternate coordinate systems. An annotation may contain multiple non-continguous parts, making it the parent of those parts. Some parts may have more than one parent. Annotations have a type based on terms in SOFA (Sequence Ontology for Feature Annotation). Stylesheets contain a set of properties used to depict a given type.

Annotations can be searched by range, type, and a properties table associated with each annotation. These are called feature filters.

DAS 2.0 is implemented using a ReST architecture. Each document (also called an entity or object) has a name, which is a URL. Fetching the URL gets information about the document. The DAS-specific documents are all in XML. Other data types have existing widely used formats, and sometimes more than one for the same data. A DAS server may provide a distinct document for each of these formats, along with information about which formats are available.

Retrieval of genomic sequence and annotation

DAS/2/Spec/Get-Genomic

Stylesheet

Writeback

Region locking

Assay Retrievals

Ontology Retrievals