Difference between revisions of "DAS/1"

From BioDAS
Jump to: navigation, search
(Changed Dazzle link to go to biojava page)
(About the DAS Protocol)
 
(17 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[Category:Documentation]]  
 
[[Category:Documentation]]  
  
<big>'''The DAS/1 Protocol'''</big>
+
== About the DAS Protocol ==
  
== About DAS/1 ==
+
The DAS protocol was written by Lincoln Stein, Sean Eddy, and Robin Dowell in 2000, and is the basis for a large number of software implementations. Around 650 public DAS data sources are currently running worldwide, with many more private data sources known to be in use. A number of  websites and software applications function as DAS clients.
  
The original version 1 specification, written by Lincoln Stein, Sean Eddy, and Robin Dowell, is the basis for a number of clients and servers. Around 400 public DAS/1 servers are currently running worldwide including [http://www.wormbase.org/ WormBase], [http://www.flybase.org/ FlyBase], [http://www.ensembl.org/ Ensembl], [http://www.tigr.org/ TIGR],  [http://genome.ucsc.edu/ UCSC], and [http://www.ebi.ac.uk/das-srv/uniprot/das UniProt]. Many more private DAS sources are known to be in use. A number of websites and software applications are based on DAS.
+
As of January 2013, the latest version of the DAS protocol is [[DAS1.6|DAS 1.6]].
  
The official DAS/1 version 1.53 specification is available at http://www.biodas.org/documents/spec.html
+
The specification is being actively supported, and continues to be extended in order to cater for the needs of its existing users and expand its applicability to additional arenas. For example, though originally focussed on genomic annotation, the latest version includes support for protein sequences, structures and annotations. In addition, extensions have enabled DAS to be used to distribute alignment and molecular interaction data. These and other extensions are listed in the unofficial [[DAS1.6E|DAS 1.6E]] specification.
  
 +
<b>Note:</b> This protocol should not be confused with that of [[DAS/2]], an entirely separate project.
  
The specification is being actively supported, and continues to be extended in order to cater for the needs of its existing users and expand its applicability to additional arenas. For example, though originally focussed on genomic annotation, extensions have enabled DAS to be used to distribute alignment, structural and molecular interaction data.
+
The [[DAS/1/Overview|DAS Overview]] provides a glossary and list of concepts. The [[DAS Plans]] page describes some details of possible future improvements to DAS. See the [[DAS specification]] page for more information on the different versions of DAS.
  
The unofficial DAS/1 version 1.53E (extended) specification is available at http://www.dasregistry.org/spec_1.53E.jsp
+
== Architecture ==
  
 +
The Distributed Annotation System comprises three types of component:
  
[[DAS/1/Overview]] provides a glossary and list of concepts.
+
=== Data source ===
  
== DAS/1 Clients ==
+
A data source provides programmatic access to data over a network, typically the internet. Each <i>data source</i> contains a set of data from one provider, for example Pfam domains. One or more <i>data sources</i> may be hosted by a single <i>DAS server</i>.
  
* [http://www.ensembl.org/info/using/external_data/das/index.html Ensembl]
+
=== Registry ===
* [http://www.efamily.org.uk/software/dasclients/spice/ Spice]
 
* [http://www.ebi.ac.uk/dasty/ Dasty]
 
* [http://pfam.sanger.ac.uk/ Pfam]
 
  
* Older (still maintained?):
+
The [[DasRegistry| DAS Registry]] lists and describes public <i>data sources</i> and the types of data that may be communicated using DAS. It is accessible programatically.
** [http://biodas.org/geodesic/ Geodesic]
 
** [http://sourceforge.net/project/showfiles.php?group_id=28453&release_id=60810 OmniDAS/OmniGene]
 
  
== DAS/1 Servers ==
+
=== Client ===
  
A more exchaustive list of servers is available from the [[DasRegistry|DAS Registry]].
+
A client consumes and integrates the data contained within one or more <i>data sources</i>. It may also communicate with a <i>server</i> or <i>registry</i> to obtain information about available data sources.
  
* [http://netaffxdas.affymetrix.com/das/ Affymetrix]
+
== How to set up a DAS server ==
* [http://www.biosapiens.info/page.php?page=biosapiensdir BioSapiens servers]
 
* [http://www.ensembl.org/info/using/external_data/das/index.html Ensembl server]
 
* [http://das.hgc.jp/ KEGG DAS]
 
* [http://das.sanger.ac.uk/das/dsn Sanger DAS server]
 
* [http://www.ebi.ac.uk/das-srv/genomicdas/das/sources EBI Genomic DAS server]
 
* [http://www.ebi.ac.uk/das-srv/proteindas/das/sources EBI Protein DAS server]
 
* [http://www.ebi.ac.uk/das-srv/uniprot/das/dsn Uniprot DAS server]
 
* [http://www.tigr.org/tdb/DAS/das_server_list.html TIGR's listing of servers]
 
* [http://genome.ucsc.edu/FAQ/FAQdownloads#download23 UCSC server]
 
  
== How to set up a DAS/1 server ==
+
=== Implementation ===
  
In general it is quite easy to set up DAS server. All the server implementations are easy to set up.
+
Theoretically it is quite easy to implement a DAS server (once you know how). However, there are also some well established multi-purpose server implementations designed to be as flexible and easy to set up as possible. It is strongly suggested to use one of these existing packages. Many distributions contain ready made data-adaptors (e.g. for GFF files). For custom data, simple plugins can be written to quickly provide your data via DAS:
Most server implementations allow easy setup using ready provided data-adaptors (e.g. for GFF files).
 
For custom data simple plugins can be written to quickly provide your data via DAS.
 
  
DAS server implementations are available in several programming languages:
+
* [http://www.sanger.ac.uk/proserver/ ProServer] is a well established Perl server, supports all DAS extensions and is the most heavily used.
 +
* [http://biodas.org/servers/LDAS.html LDAS] is an older Perl server which is easy to set up but lacks support for the latest DAS features.
 +
* [http://www.biojava.org/wiki/Dazzle Dazzle] is a well established Java server tied to the BioJava framework and support most extensions.
 +
* [http://code.google.com/p/mydas/ MyDas] is a newer, more streamlined Java server but does not yet support all extensions.
 +
* [http://www.ebi.ac.uk/panda-srv/easydas/ EasyDAS] is a wrapper around ProServer facilitating access to the EBI's compute power for those that don't have access to a server of their own.
  
* Perl
+
=== Validation ===
[http://www.sanger.ac.uk/proserver/ Proserver]
 
[http://biodas.org/servers/LDAS.html LDAS]
 
  
* Java
+
Servers should endeavour to conform strictly to the DAS data formats as this maximises compatibility across the network and minimises the maintenance required as a result of evolving client software. RelaxNG documents are now available from the [[DasRegistry|DAS Registry]] to help validate the XML produced by your servers. These are used by the registry to help servers conform to the DAS specification before being registered.
[http://www.biojava.org/wiki/Dazzle Dazzle]
 
[http://code.google.com/p/mydas/ MyDas]
 
  
== Publishing and Discovery of DAS/1 sources ==
+
=== Publishing and Discovery of DAS sources ===
  
See [[DasRegistry]]
+
Publishing your source in the DAS Registry allows you to advertise its availability, and is described in the [[DasRegistry|DAS Registry]] documentation. Registered sources are also easier for users to visualise in clients that are capable of interacting with the DAS Registry, such as [http://www.ensembl.org/ Ensembl], [http://www.ebi.ac.uk Dasty] and [http://www.efamily.org.uk/software/dasclients/spice/ SPICE].
 +
 
 +
== Training in DAS ==
 +
 
 +
DAS Tutorials, talks and hackathons take place once a year at the Genome Campus UK almost every year in the [[DASWorkshop2011|DAS Workshop]]. The next workshop will be in March 2011. Events, courses and workshops involving DAS are often listed on the [[Current_events|Current events]] page.

Latest revision as of 11:35, 25 January 2013


About the DAS Protocol

The DAS protocol was written by Lincoln Stein, Sean Eddy, and Robin Dowell in 2000, and is the basis for a large number of software implementations. Around 650 public DAS data sources are currently running worldwide, with many more private data sources known to be in use. A number of websites and software applications function as DAS clients.

As of January 2013, the latest version of the DAS protocol is DAS 1.6.

The specification is being actively supported, and continues to be extended in order to cater for the needs of its existing users and expand its applicability to additional arenas. For example, though originally focussed on genomic annotation, the latest version includes support for protein sequences, structures and annotations. In addition, extensions have enabled DAS to be used to distribute alignment and molecular interaction data. These and other extensions are listed in the unofficial DAS 1.6E specification.

Note: This protocol should not be confused with that of DAS/2, an entirely separate project.

The DAS Overview provides a glossary and list of concepts. The DAS Plans page describes some details of possible future improvements to DAS. See the DAS specification page for more information on the different versions of DAS.

Architecture

The Distributed Annotation System comprises three types of component:

Data source

A data source provides programmatic access to data over a network, typically the internet. Each data source contains a set of data from one provider, for example Pfam domains. One or more data sources may be hosted by a single DAS server.

Registry

The DAS Registry lists and describes public data sources and the types of data that may be communicated using DAS. It is accessible programatically.

Client

A client consumes and integrates the data contained within one or more data sources. It may also communicate with a server or registry to obtain information about available data sources.

How to set up a DAS server

Implementation

Theoretically it is quite easy to implement a DAS server (once you know how). However, there are also some well established multi-purpose server implementations designed to be as flexible and easy to set up as possible. It is strongly suggested to use one of these existing packages. Many distributions contain ready made data-adaptors (e.g. for GFF files). For custom data, simple plugins can be written to quickly provide your data via DAS:

  • ProServer is a well established Perl server, supports all DAS extensions and is the most heavily used.
  • LDAS is an older Perl server which is easy to set up but lacks support for the latest DAS features.
  • Dazzle is a well established Java server tied to the BioJava framework and support most extensions.
  • MyDas is a newer, more streamlined Java server but does not yet support all extensions.
  • EasyDAS is a wrapper around ProServer facilitating access to the EBI's compute power for those that don't have access to a server of their own.

Validation

Servers should endeavour to conform strictly to the DAS data formats as this maximises compatibility across the network and minimises the maintenance required as a result of evolving client software. RelaxNG documents are now available from the DAS Registry to help validate the XML produced by your servers. These are used by the registry to help servers conform to the DAS specification before being registered.

Publishing and Discovery of DAS sources

Publishing your source in the DAS Registry allows you to advertise its availability, and is described in the DAS Registry documentation. Registered sources are also easier for users to visualise in clients that are capable of interacting with the DAS Registry, such as Ensembl, Dasty and SPICE.

Training in DAS

DAS Tutorials, talks and hackathons take place once a year at the Genome Campus UK almost every year in the DAS Workshop. The next workshop will be in March 2011. Events, courses and workshops involving DAS are often listed on the Current events page.