Difference between revisions of "Everything DAS"

From BioDAS
Jump to: navigation, search
(New page: <div id="main"> <p><em>Last Updated 5th Feb 2009</em></p> <p>The intention of this document is to bring together and add to all the documentation available on the WWW for the DAS system. ...)
 
Line 1: Line 1:
 
<div id="main">
 
<div id="main">
<p><em>Last Updated 5th Feb 2009</em></p>
 
  
<p>The intention of this document is to bring together and add to all the documentation available on the WWW for the DAS system.
+
''Last Updated 5th Feb 2009''
The content on these pages draws from many sources of information and thus has many contributors. Eventually the intention if that this document will be a set of instructions that you can print out and use as reference documentation or a good read. If you find any errors on these pages and pages that it links to then please
 
contact me  (<a href="mailto:jw12@sanger.ac.uk">Jonathan Warren</a>) to let me know, any suggestions and contributions are also welcomed. </p>
 
  
<h1>Index:</h1>
+
The intention of this document is to bring together and add to all the documentation available on the WWW for the DAS system. The content on these pages draws from many sources of information and thus has many contributors. Eventually the intention if that this document will be a set of instructions that you can print out and use as reference documentation or a good read. If you find any errors on these pages and pages that it links to then please contact me ([mailto:jw12@sanger.ac.uk Jonathan Warren]) to let me know, any suggestions and contributions are also welcomed.
<h2><a href="#what">What is DAS?</a></h2>
 
<h3><a href="#currentStatus">Current Status</a></h3>
 
<h2><a href="#settingUp">Setting up a DAS Server</a></h2>
 
<h3><a href="#serversList">Servers available</a></h3>
 
<h4><a href="#dazzle">Dazzle</a></h4>
 
<h5><a href="#gettingDazzle">Getting Dazzle</a>
 
<h5><a href="#readyPlugins">Using ready made plugins for datasources</a></h5>
 
<h5><a href="#ownPlugins">Writing your own plugin</a></h5>
 
<h5><a href="#ensemblRef">Deploying an Ensembl Reference Server</a></h5>
 
  
<h4><a href="#myDas">MyDas</a></h4>
+
= Index: =
<h4><a href="#proserver">Proserver</a></h4>
 
  
<h3><a href="#impLatest">Implementing the latest specs</a></h3>
+
== [#what What is DAS?] ==
<h3><a href="#testingServer">Testing your implementation</a></h3>
 
<h3><a href="#validation">Validation and Registering of your Server</a></h3>
 
<h4><a href="#relaxNG">RelaxNG and other validation in the Registry</a></h4>
 
  
<h2><a href="#dasregistry">The DAS Registry</a></h2>
+
=== [#currentStatus Current Status] ===
<h3><a href="#dasRegGenInfo">Introduction to the DAS Registry</a></h3>
 
<h3><a name="#connectingToReg">Connecting to the Registry Programmatically</a></h3>
 
 
  
<h2><a href="#settingUpClient">Setting up a DAS client</a></h2>
+
== [#settingUp Setting up a DAS Server] ==
<h3><a href="#clientsList">Currently Available DAS Clients - table?</a></h3>
 
<h3><a href="#writingClients">Writing your own DAS client</a></h3>
 
<h4><a href="#dasobert">A Java DAS Client Library - Dasobert</a></h4>
 
<h4><a href="#perlWalking">Example of walking a DAS source using perl</a></h4>
 
<h2><a href="#acknowledgements">Acknowledgments</a></h2>
 
  
<h1>Content:</h1>
+
=== [#serversList Servers available] ===
<h2><a name="what"></a>What is DAS?</h2>
 
As biological databases are becoming so large with the advent of high throughput technologies such as sequencers and microarray chips it is becoming increasing difficult to download
 
all the data relevant for a research team. DAS gets around this by keeping the data stored with it's originators and allows users around the world to access just the relevant parts they need at any one time.
 
Put another way: by making use of DAS you can take advantage of being able to view integrated information from multiple sources, without these sources needing to be aware of each other. You can also add your own DAS data source, perhaps privately in your own institution and then view the information served from this source in the context of features from other institutions.
 
  
 +
==== [#dazzle Dazzle] ====
  
DAS stands for Distributed Annotation System. It was originally set up to be used with genomic information where annotations/features are layered on top of a reference sequence , usually a genome. The idea is that a genome browser such as ensembl or GBrowse (both DAS clients in this scenario) can be used to look at annotations from data sources
+
===== [#gettingDazzle Getting Dazzle] =====
both that exist on the same server/machine the browser is running on and display annotations in the same view from data sources (data served by DAS servers) that could be on the other side of the world (communicating via the WWW). The DAS system consists of the DAS Registry www.dasregistry.org as well as DAS Servers and Clients. The Registry is there
 
to enable people and computers to easily find the DAS data sources available around the world and also to help these data sources conform to the specifications. It's important that data served by DAS servers conform to enable the interoperability of different clients and servers around the world.
 
<a href="http://www.biodas.org/wiki/DAS1.6"> The 1.6 spec</a> is the latest and soon to be official DAS spec that mainly focuses on genomic annotations but also refers to the <em>E</em>xtentions specified in the 1.53<em>E</em> spec below.
 
<a href="http://www.dasregistry.org/spec_1.53E.jsp"> The 1.53E spec</a> contains up to date specifications for servers and clients that support information that can be exchanged using DAS that is not genome centric. Types of data include Proteins- Structures and alignments, Molecular Interactions, volume map data.
 
  
<h3><a name="currentStatus">Current Status/ DAS specifications 1.5, 1.53E, 1.6, 2.0 and Future Intentions</a></h3>
+
===== [#readyPlugins Using ready made plugins for datasources] =====
Currently DAS 1.5 is the most widely used and supported together with 1.53E. DAS 2.0 is quite different and is really running in parallel to the other 2 versions of DAS and it is hoped that in the next few years these versions will become one version (the 1.6 Spec includes some of the commands from the 2.0 spec).
 
If you wish your data to be widely accessible then use the <a href="http://www.biodas.org/wiki/DAS1.6"> The 1.6 spec</a> and <a href="http://www.dasregistry.org/spec_1.53E.jsp"> The 1.53E spec</a> documents as their guide. If your main priority is using the most recent technology and external libraries then <a href="http://biodas.org/documents/das2/das2_protocol.html">DAS2.0</a> may be of most interest to you.
 
<!--The DAS registry will soon be updated to accept only new data sources that conform to these 2 specifications.-->
 
  
 +
===== [#ownPlugins Writing your own plugin] =====
  
<h2><a name="settingUp"></a>Setting up a DAS Server</h2>
+
===== [#ensemblRef Deploying an Ensembl Reference Server] =====
There are several different options available for setting up a DAS server. All are either written in PERL or Java.
 
<h3><a name="serversList"></a>Servers available</h3>
 
<!-- put table here containing server name, language written in, advantages, disadvantages -->
 
<%@include file="sangertablestart.jsp"%><tr><th>Name</th><th>Programming Language</th><th>advantages</th><th>disadvantages</th></tr>
 
  <tr><td>Dazzle</td><td>Java<td>Standard implementation, includes support for extensions (structure, interaction, vol)</td><td>Some people say it can be hard to configure and deploy if you are not used to Java web development</td></tr>
 
  <tr><td>Proserver</td><td>PERL</td><td>Standard implementation includes support for extensions (structure, interaction, vol)</td><td></td></tr>
 
  <tr><td>MyDAS</td><td>Java</td><td>Some people say it's easier to set up and configure than Dazzle</td><td>Doesn't support extensions currently</td></tr>
 
  <tr><td>LDAS</td><td>PERL</td><td>Very Easy to set up?</td><td>Limited support for DAS functionality and sources</td></tr>
 
  <%@include file="sangertableend.jsp" %>
 
<h4><a name="dazzle"></a>Dazzle</h4>
 
Dazzle is currently the standard/default implementation for Java users- however MyDas (mentioned below) is popular.
 
<h4>Dazzle Eclipse Tutorial</h4>
 
<a href="DazzzleTutorial.jsp">Dazzle Eclipse Tutorial</a> This tutorial takes you through setting up Dazzle in eclipse and then shows you how to add your own plugins
 
<h5><a name="gettingDazzle"></a>Getting Dazzle</h5>
 
<a href="http://biojava.org/wiki/Dazzle#Getting_Dazzle">http://biojava.org/wiki/Dazzle#Getting_Dazzle</a>
 
The latest version from the cutting edge source code is available here from subversion:<a href="http://www.derkholm.net/svn/repos/dazzle/"> http://www.derkholm.net/svn/repos/dazzle/</a>
 
  
<h5><a name="readyPlugins"></a>Using ready made plugins for datasources</h5>
+
==== [#myDas MyDas] ====
<a href="http://biojava.org/wiki/Dazzle:plugins">http://biojava.org/wiki/Dazzle:plugins</a>
 
<font color="red">More examples needed here and tips for using mysql etc?</font>
 
<a href="http://biojava.org/wiki/Dazzle:deployment">http://biojava.org/wiki/Dazzle:deployment</a>
 
<h5><a name="ownPlugins"></a>Writing your own plugin</h5>
 
<a href="http://biojava.org/wiki/Dazzle:writeplugin">http://biojava.org/wiki/Dazzle:writeplugin</a><br/>
 
<a href="http://biojava.org/wiki/Dazzle:eclipse"> How to write a plugin using eclipse</a>
 
  
 +
==== [#proserver Proserver] ====
  
 +
=== [#impLatest Implementing the latest specs] ===
  
<font color="red">more on what interfaces need to be implemented and give a full example that implements all needed functionality such as sources.cmd and coordinate system etc.</font>
+
=== [#testingServer Testing your implementation] ===
<h5><a name="ensemblRef"></a>Deploying an Ensembl Reference Server</h5>
 
<a href="http://biojava.org/wiki/Dazzle:Ensembl"> link to Ensembl reference server instructions</a>
 
<h4><a name="myDas"></a>MyDas</h4>
 
<a href="http://code.google.com/p/mydas/">Information about MyDas can be found here.</a>
 
  
<h4><a name="proserver"></a>Proserver</h4>
+
=== [#validation Validation and Registering of your Server] ===
<a href="http://www.sanger.ac.uk/Software/analysis/proserver/">Proserver Page at the Sanger Institute.</a><br>
 
<a href="http://proserver.svn.sourceforge.net/viewvc/proserver/trunk/doc/proserver_tutorial.html">Proserver Tutorial</a><br>
 
<a href="http://proserver.svn.sourceforge.net/viewvc/proserver/trunk/doc/proserver_guide.html">Guide to Proserver</a><br>
 
  
<h3><a name="impLatest"></a>Implementing the latest specs</h3>
+
==== [#relaxNG RelaxNG and other validation in the Registry] ====
Proserver example of config to implement sources cmd:
 
<pre>
 
coordinates = TAIR_8,Chromosome,Arabidopsis thaliana -> 1:2000,3000
 
properties  = key1 -> value1 ; key2 -> value2
 
mapmaster  =  
 
http://www.gramene.org/das/Arabidopsis_thaliana.TAIR8.reference
 
capabilities = features -> 1.0
 
</pre>
 
The coordinates data is taken from the
 
coordinates/registry_coordinates.xml file, which is an archived copy of
 
the list of coordinates available in the DAS registry. Specifying the
 
name (or URI, actually) and test range is enough, ProServer will pick up
 
the rest from the XML file.
 
  
If the full data is not picked up, you may need to update the
+
== [#dasregistry The DAS Registry] ==
coordinates XML file from the registry
 
(http://www.dasregistry.org/das/coordinatesystem). If your coordinate
 
system is not in the Registry, an admin can add it for you.
 
<h4><a name="ontologies"></a>Protein Annotations and Ontologies</h4>
 
<a href="extension_ontology.jsp">explanation of ontologies for proteins usage in DAS</a>
 
<h3><a name="testingServer"></a>Testing your implementation</h3>
 
<h3><a name="validation"></a>Validation and Registering of your Server</h3>
 
<h4><a name="relaxNG"></a>RelaxNG and other validation in the Registry</h4>
 
The DAS Registry uses <a href="http://relaxng.org/">RelaxNG</a> to validate the xml responses from DAS servers before allowing them to register as a valid das source. RelaxNG is essentially a document like a dtd except that it uses an xml syntax that is
 
easy to learn quickly. The registry uses the documents found at the following <a href="http://www.dasregistry.org/validation/">http://www.dasregistry.org/validation/</a> and has one document for each
 
of the DAS commands (note you may need to right click "view the source" to see anything on these pages in a web browser) <a href="http://www.dasregistry.org/validation/features.rng">features.rng</a>,
 
<a href="http://www.dasregistry.org/validation/sources.rng">sources.rng</a>,
 
<a href="http://www.dasregistry.org/validation/alignments.rng">alignments.rng</a>,
 
<a href="http://www.dasregistry.org/validation/structure.rng">structure.rng</a>,
 
<a href="http://www.dasregistry.org/validation/entry_points.rng">entry_points.rng</a>,
 
<a href="http://www.dasregistry.org/validation/interaction.rng">interaction.rng</a>,
 
<a href="http://www.dasregistry.org/validation/sequence.rng">sequence.rng</a> and
 
<a href="http://www.dasregistry.org/validation/types.rng">types.rng</a>.
 
  
 +
=== [#dasRegGenInfo Introduction to the DAS Registry] ===
  
<h2><a name="dasregistry"></a>The DAS Registry</h2>
+
=== Connecting to the Registry Programmatically ===
<h3><a name="dasRegGenInfo"></a>Introduction to the DAS Registry</h3>
 
<h3><a name="connectingToReg"></a>Connecting to the Registry Programmatically</h3>
 
There are several commands that can be used to query the registry including:
 
The sources cmd with optional parameters:
 
label, organism, authority, capability, type and unique source_id. You can also use the organsim, coordinatesystem and lastmodified commands.  For examples see
 
<a href="http://www.dasregistry.org/help_scripting.jsp">Scripting</a>
 
an example of a java classe written using Dasobert to access the Registry is here <a href="http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/ContactRegistry.java">http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/ContactRegistry.java</a>
 
  
 +
== [#settingUpClient Setting up a DAS client] ==
  
+
=== [#clientsList Currently Available DAS Clients - table?] ===
  
<h2><a name="settingUpClient"></a>Setting up a DAS client</h2>
+
=== [#writingClients Writing your own DAS client] ===
<h3><a name="clientsList"></a>Currently Available DAS Clients - table?</h3>
 
<%@include file="sangertablestart.jsp"%><tr><th>Name</th><th>Description<th>Programming Language</th><th>Links</th></tr>
 
 
  <tr><td>GBrowse</td><td>quote from GBrowse Website "GBrowse[1] is the most popular viewer in GMOD. For a list of GBrowse and GMOD installations see the GMOD Users page. For a demo of its features, try the <a href="http://www.wormbase.org/db/seq/gbrowse/wormbase/">WormBase</a>, <a href="http://flybase.org/cgi-bin/gbrowse/dmel">FlyBase</a>, or <a href="http://projects.tcag.ca/cgi-bin/duplication/dupbrowse/human_b35">Human Genome Segmental Duplication Database</a> web sites. Spec DAS 1.53E and 1.6 soon</td><td>PERL</td><td><a href="http://gmod.org/wiki/Gbrowse">http://gmod.org/wiki/Gbrowse</a></td></tr>
 
  <tr><td>EnsEMBL</td><td>EnsEMBL is a web based genome browser and database system which supports DAS 1.53E and soon 1.6</td><td>PERL</td><td><a href="http://www.ensembl.org">http://www.ensembl.org/</a></td></tr>
 
  <tr><td>IGB</td><td>is an application built upon the GenoViz SDK and Genometry for visualization and exploration of genomes and corresponding annotations from multiple data sources</td><td>Java</td><td><a http://genoviz.sourceforge.net/">http://genoviz.sourceforge.net/</a></td></tr>
 
  <tr><td>Jalview</td><td>A multiple sequence alignment editor & viewer<td>Java</td><td><a href="http://www.jalview.org/">http://www.jalview.org/</a></td></tr>
 
  <tr><td>Dasty2</td><td>Dasty, a protein DAS client is implemented for visualising protein sequence feature information. The client is able to connect, to a reference server and one or many DAS servers. It merges the data from all the servers, and displays sequence information as well as annotated feature information form all the available DAS Servers in a very user friendly way . <td>PERL and AJAX</td><td><a href="http://www.ebi.ac.uk/dasty/">http://www.ebi.ac.uk/dasty/</a></td></tr>
 
  <font color="red">Add clients from the workshop</font>
 
  <%@include file="sangertableend.jsp" %>
 
<h3><a name="writingClients"></a>Writing your own DAS client</h3>
 
<h4><a name="dasobert"></a>A Java DAS Client Library - Dasobert</h4>
 
Examples of client code written in Java using Dasobert can be found here: <a href="http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/">http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/</a>
 
<p>There is also a tutorial for using Dasobert within eclipse that (follows on from the Dazzle eclipse tutorial here): <a href="DasobertTutorial.jsp">Dasobert Eclipse Tutorial</a>
 
  
<h4><a name="perlWalking"></a>Example of walking a DAS source using perl</h4>
+
==== [#dasobert A Java DAS Client Library - Dasobert] ====
  
This example was kindly provided by Felix Kokocinski:
+
==== [#perlWalking Example of walking a DAS source using perl] ====
  
You can specify a region or let it walk through all regions if the
+
== [#acknowledgements Acknowledgments] ==
server can supply entry points with lengths. This is done in eg. 20 MB
 
slices. It takes quite some time, but works nicely.
 
<PRE>
 
<FONT COLOR="#555500"><I># Example script that reads genomic data from DAS server
 
</I></FONT><FONT COLOR="#555500"><I># using a defined chunk size
 
</I></FONT><FONT COLOR="#555500"><I># writing the data out to a gff file.
 
</I></FONT>
 
<FONT COLOR="#555500"><I># fsk@sanger.ac.uk, 2008
 
</I></FONT>
 
<FONT COLOR="#000055"><B>use</B></FONT> strict;
 
<FONT COLOR="#000055"><B>use</B></FONT> Bio::Das::Lite;
 
<FONT COLOR="#000055"><B>use</B></FONT> Getopt::Long;
 
  
<FONT COLOR="#555500"><I>#default DAS server adress
+
= Content: =
  
</I></FONT><FONT COLOR="#000055"><B>my</B></FONT> $server = "http://das.sanger.ac.uk/das";
+
== What is DAS? ==
<FONT COLOR="#555500"><I>#default DAS source name
 
</I></FONT><FONT COLOR="#000055"><B>my</B></FONT> $source = 'otter_das';
 
<FONT COLOR="#555500"><I>#proxy name
 
</I></FONT><FONT COLOR="#000055"><B>my</B></FONT> $http_proxy = <FONT COLOR="#000055"><B>undef</B></FONT>;
 
<FONT COLOR="#555500"><I>#genomic chunk size to query
 
</I></FONT><FONT COLOR="#000055"><B>my</B></FONT> $max_len    = 20000000;
 
  
 +
As biological databases are becoming so large with the advent of high throughput technologies such as sequencers and microarray chips it is becoming increasing difficult to download all the data relevant for a research team. DAS gets around this by keeping the data stored with it's originators and allows users around the world to access just the relevant parts they need at any one time. Put another way: by making use of DAS you can take advantage of being able to view integrated information from multiple sources, without these sources needing to be aware of each other. You can also add your own DAS data source, perhaps privately in your own institution and then view the information served from this source in the context of features from other institutions. DAS stands for Distributed Annotation System. It was originally set up to be used with genomic information where annotations/features are layered on top of a reference sequence , usually a genome. The idea is that a genome browser such as ensembl or GBrowse (both DAS clients in this scenario) can be used to look at annotations from data sources both that exist on the same server/machine the browser is running on and display annotations in the same view from data sources (data served by DAS servers) that could be on the other side of the world (communicating via the WWW). The DAS system consists of the DAS Registry www.dasregistry.org as well as DAS Servers and Clients. The Registry is there to enable people and computers to easily find the DAS data sources available around the world and also to help these data sources conform to the specifications. It's important that data served by DAS servers conform to enable the interoperability of different clients and servers around the world. [http://www.biodas.org/wiki/DAS1.6  The 1.6 spec] is the latest and soon to be official DAS spec that mainly focuses on genomic annotations but also refers to the ''E''xtentions specified in the 1.53''E'' spec below. [http://www.dasregistry.org/spec_1.53E.jsp  The 1.53E spec] contains up to date specifications for servers and clients that support information that can be exchanged using DAS that is not genome centric. Types of data include Proteins- Structures and alignments, Molecular Interactions, volume map data.
  
<FONT COLOR="#000055"><B>my</B></FONT> $chromosome = <FONT COLOR="#000055"><B>undef</B></FONT>;
+
=== Current Status/ DAS specifications 1.5, 1.53E, 1.6, 2.0 and Future Intentions ===
<FONT COLOR="#000055"><B>my</B></FONT> $start      = 0;
 
<FONT COLOR="#000055"><B>my</B></FONT> $end        = 0;
 
<FONT COLOR="#000055"><B>my</B></FONT> $gff_file  = <FONT COLOR="#000055"><B>undef</B></FONT>;
 
<FONT COLOR="#000055"><B>my</B></FONT> %transcripts = ();
 
  
<FONT COLOR="#000055"><B>my</B></FONT> $type;
+
Currently DAS 1.5 is the most widely used and supported together with 1.53E. DAS 2.0 is quite different and is really running in parallel to the other 2 versions of DAS and it is hoped that in the next few years these versions will become one version (the 1.6 Spec includes some of the commands from the 2.0 spec). If you wish your data to be widely accessible then use the [http://www.biodas.org/wiki/DAS1.6  The 1.6 spec] and [http://www.dasregistry.org/spec_1.53E.jsp  The 1.53E spec] documents as their guide. If your main priority is using the most recent technology and external libraries then [http://biodas.org/documents/das2/das2_protocol.html DAS2.0] may be of most interest to you.
  
&amp;GetOptions(
+
== Setting up a DAS Server ==
    'file=<FONT COLOR="#000055"><B>s</B></FONT>'                => \$gff_file,
 
    'chromosome=<FONT COLOR="#000055"><B>s</B></FONT>'          => \$chromosome,
 
    'start=<FONT COLOR="#000055"><B>s</B></FONT>'                => \$start,
 
    'end=<FONT COLOR="#000055"><B>s</B></FONT>'                  => \$end,
 
            'server=<FONT COLOR="#000055"><B>s</B></FONT>'              => \$server,
 
            'source=<FONT COLOR="#000055"><B>s</B></FONT>'              => \$source,
 
  );
 
  
<FONT COLOR="#555500"><I>#connect to DAS server
+
There are several different options available for setting up a DAS server. All are either written in PERL or Java.
  
</I></FONT><FONT COLOR="#000055"><B>my</B></FONT> $das = connect_das("$server/$source", $http_proxy);
+
=== Servers available ===
  
<FONT COLOR="#555500"><I>#get entry point list/lengths
+
&lt;%@include file="sangertablestart.jsp"%&gt;
</I></FONT><FONT COLOR="#555500"><I>#requires the DAS server to support the entry-points function
 
</I></FONT><FONT COLOR="#000055"><B>my</B></FONT> $chrom_lens = get_entry_points();
 
  
<FONT COLOR="#000055"><B>open</B></FONT>(GFF, ">$gff_file") or <FONT COLOR="#000055"><B>die</B></FONT> "Can't <FONT COLOR="#000055"><B>open</B></FONT> file $gff_file.\n";
+
{|
 +
! Name
 +
! Programming Language
 +
! advantages
 +
! disadvantages
 +
|-
 +
| Dazzle
 +
| Java
 +
| Standard implementation, includes support for extensions (structure, interaction, vol)
 +
| Some people say it can be hard to configure and deploy if you are not used to Java web development
 +
|-
 +
| Proserver
 +
| PERL
 +
| Standard implementation includes support for extensions (structure, interaction, vol)
 +
|
 +
|-
 +
| MyDAS
 +
| Java
 +
| Some people say it's easier to set up and configure than Dazzle
 +
| Doesn't support extensions currently
 +
|-
 +
| LDAS
 +
| PERL
 +
| Very Easy to set up?
 +
| Limited support for DAS functionality and sources
 +
|-
 +
|
 +
&lt;%@include file="sangertableend.jsp" %&gt;
  
<FONT COLOR="#000055"><B>if</B></FONT>($chromosome){
+
==== Dazzle ====
  <FONT COLOR="#555500"><I>#query specific region
 
  
</I></FONT>  get_region($chromosome, $start, $end);
+
  Dazzle is currently the standard/default implementation for Java users- however MyDas (mentioned below) is popular.
}
 
<FONT COLOR="#000055"><B>else</B></FONT>{
 
  <FONT COLOR="#555500"><I>#go through all chromosomes
 
</I></FONT> <FONT COLOR="#000055"><B>foreach</B></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $chrom (<FONT COLOR="#000055"><B>keys</B></FONT> %$chrom_lens){
 
<FONT COLOR="#000055"><B>print</B></FONT> "getting $chrom\n";
 
get_region($chrom, <FONT COLOR="#000055"><B>undef</B></FONT>, <FONT COLOR="#000055"><B>undef</B></FONT>);
 
%transcripts = ();
 
  }
 
}
 
  
 +
==== Dazzle Eclipse Tutorial ====
  
<FONT COLOR="#000055"><B>close</B></FONT>(GFF)or <FONT COLOR="#000055"><B>die</B></FONT> "Can't <FONT COLOR="#000055"><B>close</B></FONT> file $gff_file.\n";
+
[DazzzleTutorial.jsp Dazzle Eclipse Tutorial] This tutorial takes you through setting up Dazzle in eclipse and then shows you how to add your own plugins
  
 +
===== Getting Dazzle =====
  
  <FONT COLOR="#555500"><I>################################################
+
http://biojava.org/wiki/Dazzle#Getting_Dazzle The latest version from the cutting edge source code is available here from subversion:[http://www.derkholm.net/svn/repos/dazzle/  http://www.derkholm.net/svn/repos/dazzle/]
</I></FONT>
 
  
<FONT COLOR="#555500"><I>#connect to DAS server
+
===== Using ready made plugins for datasources =====
</I></FONT><FONT COLOR="#000055"><B>sub</B></FONT> connect_das {
 
  <FONT COLOR="#000055"><B>my</B></FONT> ($dsn, $proxy) = @_;
 
  
  <FONT COLOR="#000055"><B>my</B></FONT> $das = Bio::Das::Lite->new({
+
http://biojava.org/wiki/Dazzle:plugins <font color="red">More examples needed here and tips for using mysql etc?</font> http://biojava.org/wiki/Dazzle:deployment
'timeout'    => 10000,
 
'dsn'        => $dsn,
 
'http_proxy' => $proxy,
 
}) or <FONT COLOR="#000055"><B>die</B></FONT> "cant <FONT COLOR="#000055"><B>connect</B></FONT> to DAS server!\n";
 
  
  <FONT COLOR="#000055"><B>return</B></FONT> $das;
+
===== Writing your own plugin =====
}
 
  
 +
http://biojava.org/wiki/Dazzle:writeplugin<br />[http://biojava.org/wiki/Dazzle:eclipse  How to write a plugin using eclipse] <font color="red">more on what interfaces need to be implemented and give a full example that implements all needed functionality such as sources.cmd and coordinate system etc.</font>
  
 +
===== Deploying an Ensembl Reference Server =====
  
<FONT COLOR="#555500"><I>#look at the region requested
+
[http://biojava.org/wiki/Dazzle:Ensembl  link to Ensembl reference server instructions]
</I></FONT><FONT COLOR="#000055"><B>sub</B></FONT> get_region {
 
  <FONT COLOR="#000055"><B>my</B></FONT> ($chromosome, $start, $end) = @_;
 
  
  <FONT COLOR="#000055"><B>my</B></FONT> $chrom_len    = $chrom_lens->{$chromosome};
+
==== MyDas ====
  <FONT COLOR="#000055"><B>my</B></FONT> $region      = "";
 
  
  <FONT COLOR="#000055"><B>if</B></FONT>( $start and $end){
+
[http://code.google.com/p/mydas/ Information about MyDas can be found here.]
    <FONT COLOR="#000055"><B>if</B></FONT>($start > $end){
 
      <FONT COLOR="#000055"><B>die</B></FONT> "Coordinates wrong: $start > $end!\n";
 
    }
 
    <FONT COLOR="#000055"><B>if</B></FONT>( ($end - $start) &lt;= $max_len ){
 
      <FONT COLOR="#555500"><I>#get entire region
 
  
</I></FONT>      <FONT COLOR="#000055"><B>my</B></FONT> $region = ":".$start.",".$end;
+
==== Proserver ====
      get_transcripts($region, $chromosome);
 
    }
 
    <FONT COLOR="#000055"><B>else</B></FONT>{
 
      go_through_chunks($start, $end, $chromosome, $chrom_len);
 
    }
 
  }
 
  <FONT COLOR="#000055"><B>elsif</B></FONT>( $chrom_len &lt;= $max_len ){
 
    <FONT COLOR="#555500"><I>#get entire chromosome
 
</I></FONT>    get_transcripts($region, $chromosome);
 
  }
 
  <FONT COLOR="#000055"><B>else</B></FONT>{
 
    go_through_chunks(1, $chrom_len, $chromosome, $chrom_len);
 
  }
 
  
}
+
[http://www.sanger.ac.uk/Software/analysis/proserver/ Proserver Page at the Sanger Institute.]<br />[http://proserver.svn.sourceforge.net/viewvc/proserver/trunk/doc/proserver_tutorial.html Proserver Tutorial]<br />[http://proserver.svn.sourceforge.net/viewvc/proserver/trunk/doc/proserver_guide.html Guide to Proserver]<br />
  
 +
=== Implementing the latest specs ===
  
<FONT COLOR="#555500"><I>#go through a region in chunks
+
Proserver example of config to implement sources cmd:
</I></FONT><FONT COLOR="#000055"><B>sub</B></FONT> go_through_chunks {
 
  <FONT COLOR="#000055"><B>my</B></FONT> ($chunk_start, $chunk_end, $chromosome, $chrom_len) = @_;
 
  
  <FONT COLOR="#000055"><B>my</B></FONT> ($region_start, $region_end);
+
   <FONT COLOR="#000055"><B>my</B></FONT> %ids_seen;
+
coordinates = TAIR_8,Chromosome,Arabidopsis thaliana -&gt; 1:2000,3000
 +
properties  = key1 -&gt; value1 ; key2 -&gt; value2
 +
mapmaster   =
 +
http://www.gramene.org/das/Arabidopsis_thaliana.TAIR8.reference
 +
capabilities = features -&gt; 1.0
  
  <FONT COLOR="#555500"><I>#loop through regions until all is covered
+
The coordinates data is taken from the coordinates/registry_coordinates.xml file, which is an archived copy of the list of coordinates available in the DAS registry. Specifying the name (or URI, actually) and test range is enough, ProServer will pick up the rest from the XML file. If the full data is not picked up, you may need to update the coordinates XML file from the registry (http://www.dasregistry.org/das/coordinatesystem). If your coordinate system is not in the Registry, an admin can add it for you.
  
</I></FONT>  <FONT COLOR="#555500"><I>#keep track of genes to avoid duplicates!
+
==== Protein Annotations and Ontologies ====
</I></FONT>  <FONT COLOR="#000055"><B>for</B></FONT>($region_start = $chunk_start, $region_end = $region_start + $max_len;
 
      $region_start &lt; $chunk_end;
 
      $region_start = $region_end + 1, $region_end += $max_len){
 
  
    <FONT COLOR="#000055"><B>if</B></FONT>($region_end > $chrom_len){
+
[extension_ontology.jsp explanation of ontologies for proteins usage in DAS]
      $region_end = $chrom_len;
 
    }<FONT COLOR="#000055"><B>elsif</B></FONT>($region_end > $chunk_end){
 
      $region_end = $chunk_end;
 
    }
 
    <FONT COLOR="#000055"><B>my</B></FONT> $region = ":".$region_start.",".$region_end;
 
  
    <FONT COLOR="#555500"><I>#get all transcripts from chunk
+
=== Testing your implementation ===
</I></FONT>    <FONT COLOR="#000055"><B>my</B></FONT> $new_ids = get_transcripts($region, $chromosome, \%ids_seen);
 
    %ids_seen = (%ids_seen, %$new_ids);
 
  }
 
  
}
+
=== Validation and Registering of your Server ===
  
 +
==== RelaxNG and other validation in the Registry ====
  
 +
The DAS Registry uses [http://relaxng.org/ RelaxNG] to validate the xml responses from DAS servers before allowing them to register as a valid das source. RelaxNG is essentially a document like a dtd except that it uses an xml syntax that is easy to learn quickly. The registry uses the documents found at the following http://www.dasregistry.org/validation/ and has one document for each of the DAS commands (note you may need to right click "view the source" to see anything on these pages in a web browser) [http://www.dasregistry.org/validation/features.rng features.rng], [http://www.dasregistry.org/validation/sources.rng sources.rng], [http://www.dasregistry.org/validation/alignments.rng alignments.rng], [http://www.dasregistry.org/validation/structure.rng structure.rng], [http://www.dasregistry.org/validation/entry_points.rng entry_points.rng], [http://www.dasregistry.org/validation/interaction.rng interaction.rng], [http://www.dasregistry.org/validation/sequence.rng sequence.rng] and [http://www.dasregistry.org/validation/types.rng types.rng].
  
<FONT COLOR="#555500"><I>#fetch all available entry-points (chromosomes) and their lengths from server
+
== The DAS Registry ==
</I></FONT><FONT COLOR="#000055"><B>sub</B></FONT> get_entry_points {
 
  
  <FONT COLOR="#000055"><B>my</B></FONT> %chrom_lens;
+
=== Introduction to the DAS Registry ===
  
  <FONT COLOR="#000055"><B>my</B></FONT> $entry_points = $das->entry_points();
+
=== Connecting to the Registry Programmatically ===
  
  <FONT COLOR="#000055"><B>foreach</B></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $k (<FONT COLOR="#000055"><B>keys</B></FONT> %$entry_points){
+
There are several commands that can be used to query the registry including: The sources cmd with optional parameters: label, organism, authority, capability, type and unique source_id. You can also use the organsim, coordinatesystem and lastmodified commands. For examples see [http://www.dasregistry.org/help_scripting.jsp Scripting] an example of a java classe written using Dasobert to access the Registry is here http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/ContactRegistry.java
<FONT COLOR="#000055"><B>foreach</B></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $l (@{$entry_points->{$k}}){
 
<FONT COLOR="#000055"><B>foreach</B></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $segment (@{ $l->{"segment"} }){
 
$chrom_lens{ $segment->{"segment_id"} } = $segment->{"segment_size"};
 
}
 
  }
 
  }
 
  
  <FONT COLOR="#000055"><B>return</B></FONT> \%chrom_lens;
+
== Setting up a DAS client ==
}
 
  
 +
=== Currently Available DAS Clients - table? ===
  
 +
&lt;%@include file="sangertablestart.jsp"%&gt;
 +
|-
 +
! Name
 +
! Description
 +
! Programming Language
 +
! Links
 +
|-
 +
| GBrowse
 +
|
 +
quote from GBrowse Website "GBrowse[1] is the most popular viewer in GMOD. For a list of GBrowse and GMOD installations see the GMOD Users page. For a demo of its features, try the [http://www.wormbase.org/db/seq/gbrowse/wormbase/ WormBase], [http://flybase.org/cgi-bin/gbrowse/dmel FlyBase], or [http://projects.tcag.ca/cgi-bin/duplication/dupbrowse/human_b35 Human Genome Segmental Duplication Database] web sites. Spec DAS 1.53E and 1.6 soon
 +
| PERL
 +
|
 +
http://gmod.org/wiki/Gbrowse
 +
|-
 +
| EnsEMBL
 +
| EnsEMBL is a web based genome browser and database system which supports DAS 1.53E and soon 1.6
 +
| PERL
 +
|
 +
[http://www.ensembl.org http://www.ensembl.org/]
 +
|-
 +
| IGB
 +
| is an application built upon the GenoViz SDK and Genometry for visualization and exploration of genomes and corresponding annotations from multiple data sources
 +
| Java
 +
| http://genoviz.sourceforge.net/
 +
|-
 +
| Jalview
 +
| A multiple sequence alignment editor &amp; viewer
 +
| Java
 +
|
 +
http://www.jalview.org/
 +
|-
 +
| Dasty2
 +
| Dasty, a protein DAS client is implemented for visualising protein sequence feature information. The client is able to connect, to a reference server and one or many DAS servers. It merges the data from all the servers, and displays sequence information as well as annotated feature information form all the available DAS Servers in a very user friendly way .
 +
| PERL and AJAX
 +
|
 +
http://www.ebi.ac.uk/dasty/
 +
|-
 +
|
 +
<font color="red">Add clients from the workshop</font> &lt;%@include file="sangertableend.jsp" %&gt;
  
<FONT COLOR="#555500"><I>#fetch the data and process it.
+
=== Writing your own DAS client ===
</I></FONT><FONT COLOR="#555500"><I>#note that this function is quite specific to the way your DAS source is set-up.
 
</I></FONT><FONT COLOR="#555500"><I>#the idea is to get together all exons, etc that belong to a transcript and all transcripts
 
</I></FONT><FONT COLOR="#555500"><I>#that belong to a gene.
 
</I></FONT><FONT COLOR="#000055"><B>sub</B></FONT> get_transcripts {
 
  <FONT COLOR="#000055"><B>my</B></FONT> ( $region, $chromosome, $previous_genes ) = @_;
 
  
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "have <FONT COLOR="#000055"><B>chr</B></FONT> $chromosome$region\n";
+
==== A Java DAS Client Library - Dasobert ====
  
  <FONT COLOR="#000055"><B>my</B></FONT> %genes = ();
+
Examples of client code written in Java using Dasobert can be found here: http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/
  <FONT COLOR="#000055"><B>my</B></FONT> %new_features = ();
 
  <FONT COLOR="#000055"><B>my</B></FONT> $response = <FONT COLOR="#000055"><B>undef</B></FONT>;
 
  
  <FONT COLOR="#555500"><I>#fetch DAS features
+
There is also a tutorial for using Dasobert within eclipse that (follows on from the Dazzle eclipse tutorial here): [DasobertTutorial.jsp Dasobert Eclipse Tutorial]
  
</I></FONT>  $response = $das->features({
+
==== Example of walking a DAS source using perl ====
      'segment' => $chromosome.$region,
 
      'type'    => $type,
 
    });
 
  
  <FONT COLOR="#000055"><B>while</B></FONT> (<FONT COLOR="#000055"><B>my</B></FONT> ($url, $features) = <FONT COLOR="#000055"><B>each</B></FONT> %$response) {
+
This example was kindly provided by Felix Kokocinski: You can specify a region or let it walk through all regions if the server can supply entry points with lengths. This is done in eg. 20 MB slices. It takes quite some time, but works nicely.
  
    <FONT COLOR="#000055"><B>if</B></FONT>(<FONT COLOR="#000055"><B>ref</B></FONT> $features <FONT COLOR="#000055"><B>eq</B></FONT> "ARRAY"){
+
      <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "Received ".<FONT COLOR="#000055"><B>scalar</B></FONT> @$features." features.\n";
+
<font color="#555500">''<nowiki># Example script that reads genomic data from DAS server </nowiki>''</font><font color="#555500">''<nowiki># using a defined chunk size </nowiki>''</font><font color="#555500">''<nowiki># writing the data out to a gff file. </nowiki>''</font>
 
+
<font color="#555500">''<nowiki># fsk@sanger.ac.uk, 2008 </nowiki>''</font>
    FEATURES:
+
<font color="#000055">'''use'''</font> strict;
      <FONT COLOR="#000055"><B>foreach</B></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $feature (@$features) {
+
<font color="#000055">'''use'''</font> Bio::Das::Lite;
 
+
<font color="#000055">'''use'''</font> Getopt::Long;
<FONT COLOR="#000055"><B>my</B></FONT> %notes = ();
+
 
+
<font color="#555500">''<nowiki>#default DAS server adress </nowiki>''</font><font color="#000055">'''my'''</font> $server = "http://das.sanger.ac.uk/das";
<FONT COLOR="#000055"><B>my</B></FONT> $grouphash = $feature->{'group'}->[0];
+
<font color="#555500">''<nowiki>#default DAS source name </nowiki>''</font><font color="#000055">'''my'''</font> $source = 'otter_das';
 
+
<font color="#555500">''<nowiki>#proxy name </nowiki>''</font><font color="#000055">'''my'''</font> $http_proxy = <font color="#000055">'''undef'''</font><nowiki>;
<FONT COLOR="#555500"><I>#get other notes
+
</nowiki><font color="#555500">''<nowiki>#genomic chunk size to query </nowiki>''</font><font color="#000055">'''my'''</font> $max_len    = 20000000;
 
+
</I></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $i = 0;
+
<FONT COLOR="#000055"><B>my</B></FONT> $morenote_entry = '';
+
<font color="#000055">'''my'''</font> $chromosome = <font color="#000055">'''undef'''</font><nowiki>;
<FONT COLOR="#000055"><B>while</B></FONT>(<FONT COLOR="#000055"><B>defined</B></FONT>($feature->{'note'}->[$i])){
+
</nowiki><font color="#000055">'''my'''</font> $start      = 0;
  <FONT COLOR="#000055"><B>my</B></FONT> $morenotes = $feature->{'note'}->[$i];
+
<font color="#000055">'''my'''</font> $end        = 0;
  <FONT COLOR="#000055"><B>my</B></FONT> ($morenotes_type, $morenotes_value) = <FONT COLOR="#000055"><B>split</B></FONT>('=', $morenotes);
+
<font color="#000055">'''my'''</font> $gff_file  = <font color="#000055">'''undef'''</font><nowiki>;
  $morenotes_value =~ <FONT COLOR="#000055"><B>s</B></FONT>/\&amp;\<FONT COLOR="#555500"><I>#39\;/\'/g;
+
</nowiki><font color="#000055">'''my'''</font> %transcripts = ();
 
+
</I></FONT>   $notes{$morenotes_type} = $morenotes_value;
+
<font color="#000055">'''my'''</font> $type;
  $i++;
+
}
+
&amp;GetOptions(
 
+
    'file=<font color="#000055">'''s'''</font>'                =&gt; \$gff_file,
<FONT COLOR="#555500"><I>#remove duplicates from overlapping regions
+
    'chromosome=<font color="#000055">'''s'''</font>'          =&gt; \$chromosome,
</I></FONT> <FONT COLOR="#000055"><B>if</B></FONT>(<FONT COLOR="#000055"><B>defined</B></FONT> $previous_genes and <FONT COLOR="#000055"><B>exists</B></FONT>($previous_genes->{$grouphash->{'group_type'}})){
+
    'start=<font color="#000055">'''s'''</font>'                =&gt; \$start,
  <FONT COLOR="#000055"><B>next</B></FONT> FEATURES;
+
    'end=<font color="#000055">'''s'''</font>'                  =&gt; \$end,
}
+
            'server=<font color="#000055">'''s'''</font>'              =&gt; \$server,
 
+
            'source=<font color="#000055">'''s'''</font>'              =&gt; \$source,
<FONT COLOR="#555500"><I>#you could do some filtering of the response at this point
+
  );
</I></FONT>
+
<FONT COLOR="#000055"><B>my</B></FONT> %gff_element;
+
<font color="#555500">''<nowiki>#connect to DAS server </nowiki>''</font><font color="#000055">'''my'''</font> $das = connect_das("$server/$source", $http_proxy);
 
+
<FONT COLOR="#555500"><I>#build structure for exons and general items
+
<font color="#555500">''<nowiki>#get entry point list/lengths </nowiki>''</font><font color="#555500">''<nowiki>#requires the DAS server to support the entry-points function </nowiki>''</font><font color="#000055">'''my'''</font> $chrom_lens = get_entry_points();
 
+
</I></FONT> <FONT COLOR="#555500"><I>#find type
+
<font color="#000055">'''open'''</font>(GFF, "&gt;$gff_file") or <font color="#000055">'''die'''</font> "Can't <font color="#000055">'''open'''</font> file $gff_file.\n";
</I></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $element_type = $feature->{'type'} || "exon";
+
$element_type    =~ <FONT COLOR="#000055"><B>m</B></FONT>/((intron)|(UTR)|(exon))/g;
+
<font color="#000055">'''if'''</font>($chromosome){
<FONT COLOR="#000055"><B>if</B></FONT>($1){ $element_type = $1 }
+
  <font color="#555500">''<nowiki>#query specific region </nowiki>''</font> get_region($chromosome, $start, $end);
 
+
}
<FONT COLOR="#000055"><B>my</B></FONT> $group_type  = $grouphash->{'group_type'};
+
<font color="#000055">'''else'''</font>{
 
+
  <font color="#555500">''<nowiki>#go through all chromosomes </nowiki>''</font> <font color="#000055">'''foreach'''</font> <font color="#000055">'''my'''</font> $chrom (<font color="#000055">'''keys'''</font> %$chrom_lens){
<FONT COLOR="#000055"><B>my</B></FONT> $strand      = $feature->{'orientation'};
+
<font color="#000055">'''print'''</font> "getting $chrom\n";
<FONT COLOR="#000055"><B>if</B></FONT>($feature->{'orientation'}    =~ /^(\+|\-|\.)$/) { }
+
get_region($chrom, <font color="#000055">'''undef'''</font>, <font color="#000055">'''undef'''</font>);
<FONT COLOR="#000055"><B>elsif</B></FONT>($feature->{'orientation'} ==  1){ $strand = '+' }
+
%transcripts = ();
<FONT COLOR="#000055"><B>elsif</B></FONT>($feature->{'orientation'} == -1){ $strand = '-' }
+
  }
<FONT COLOR="#000055"><B>elsif</B></FONT>($feature->{'orientation'} ==  0){ $strand = '.' }
+
}
<FONT COLOR="#000055"><B>else</B></FONT>{ <FONT COLOR="#000055"><B>die</B></FONT> "INVALID STRAND SYMBOL: ".$feature->{'orientation'}."\n"; }
+
 
+
<FONT COLOR="#000055"><B>my</B></FONT> $phase        = ".";
+
<font color="#000055">'''close'''</font>(GFF)or <font color="#000055">'''die'''</font> "Can't <font color="#000055">'''close'''</font> file $gff_file.\n";
<FONT COLOR="#000055"><B>if</B></FONT>($feature->{'phase'}){
+
  $phase = $feature->{'phase'};
+
}
+
  <font color="#555500">''<nowiki>################################################ </nowiki>''</font>
<FONT COLOR="#000055"><B>elsif</B></FONT>($element_type <FONT COLOR="#000055"><B>eq</B></FONT> "exon"){
+
  $phase = "0";
+
<font color="#555500">''<nowiki>#connect to DAS server </nowiki>''</font><font color="#000055">'''sub'''</font> connect_das {
}
+
  <font color="#000055">'''my'''</font> ($dsn, $proxy) = @_;
 
+
<FONT COLOR="#000055"><B>if</B></FONT>(!$notes{"Transcriptstatus"}){
+
  <font color="#000055">'''my'''</font> $das = Bio::Das::Lite-&gt;new({
  <FONT COLOR="#000055"><B>die</B></FONT> "PROBLEM: $element_type, ".$feature->{'feature_id'}."\n";
+
'timeout'    =&gt; 10000,
}
+
'dsn'        =&gt; $dsn,
 
+
'http_proxy' =&gt; $proxy,
$gff_element{'seqid'}      = $chromosome;
+
}) or <font color="#000055">'''die'''</font> "cant <font color="#000055">'''connect'''</font> to DAS server!\n";
$gff_element{'source'}     = $notes{"Transcripttype"};
+
$gff_element{'type'}      = $element_type;
+
  <font color="#000055">'''return'''</font> $das;
$gff_element{'start'}      = $feature->{'start'};
+
}
$gff_element{'end'}        = $feature->{'end'};
+
$gff_element{'score'}      = ".";
+
$gff_element{'strand'}    = $strand;
+
$gff_element{'phase'}     = $phase;
+
<font color="#555500">''<nowiki>#look at the region requested </nowiki>''</font><font color="#000055">'''sub'''</font> get_region {
 
+
  <font color="#000055">'''my'''</font> ($chromosome, $start, $end) = @_;
<FONT COLOR="#555500"><I>#check for some missing values
+
 
+
  <font color="#000055">'''my'''</font> $chrom_len    = $chrom_lens-&gt;{$chromosome};
</I></FONT> <FONT COLOR="#000055"><B>if</B></FONT>(!<FONT COLOR="#000055"><B>exists</B></FONT> $feature->{'feature_id'}){
+
  <font color="#000055">'''my'''</font> $region      = "";
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "Missing value <FONT COLOR="#000055"><B>for</B></FONT> Parent-feature_id\n";
+
  $feature->{'feature_id'} = "0";
+
  <font color="#000055">'''if'''</font>( $start and $end){
}
+
    <font color="#000055">'''if'''</font>($start &gt; $end){
<FONT COLOR="#000055"><B>if</B></FONT>(!<FONT COLOR="#000055"><B>exists</B></FONT> $notes{"Transcriptstatus"}){
+
      <font color="#000055">'''die'''</font> "Coordinates wrong: $start &gt; $end!\n";
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "Missing value <FONT COLOR="#000055"><B>for</B></FONT> Transcriptstatus\n";
+
    }
  $notes{"Transcriptstatus"} = "-";
+
    <font color="#000055">'''if'''</font>( ($end - $start) &lt;= $max_len ){
}
+
      <font color="#555500">''<nowiki>#get entire region </nowiki>''</font>     <font color="#000055">'''my'''</font> $region = ":".$start.",".$end;
<FONT COLOR="#000055"><B>if</B></FONT>(!<FONT COLOR="#000055"><B>exists</B></FONT> $notes{"Created"}){
+
      get_transcripts($region, $chromosome);
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "Missing value <FONT COLOR="#000055"><B>for</B></FONT> Created\n";
+
    }
  $notes{"Created"} = 0;
+
    <font color="#000055">'''else'''</font>{
}
+
      go_through_chunks($start, $end, $chromosome, $chrom_len);
<FONT COLOR="#000055"><B>if</B></FONT>(!<FONT COLOR="#000055"><B>exists</B></FONT> $notes{"Lastmod"}){
+
    }
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "Missing value <FONT COLOR="#000055"><B>for</B></FONT> Lastmod\n";
+
  }
  $notes{"Lastmod"} = 0;
+
  <font color="#000055">'''elsif'''</font>( $chrom_len &lt;= $max_len ){
}
+
    <font color="#555500">''<nowiki>#get entire chromosome </nowiki>''</font>   get_transcripts($region, $chromosome);
$gff_element{'attributes'} = "Parent=".$feature->{'feature_id'}.
+
  }
                            ";Status=".$notes{"Transcriptstatus"}.
+
  <font color="#000055">'''else'''</font>{
    ";CREATED=".$notes{"Created"}.
+
    go_through_chunks(1, $chrom_len, $chromosome, $chrom_len);
    ";LASTMOD=".$notes{"Lastmod"};
+
  }
 
+
<FONT COLOR="#000055"><B>if</B></FONT>(!<FONT COLOR="#000055"><B>exists</B></FONT> $genes{ $group_type }){
+
}
  $genes{ $group_type } = 1;
+
  <FONT COLOR="#000055"><B>my</B></FONT> %gff_gene;
+
 
+
<font color="#555500">''<nowiki>#go through a region in chunks </nowiki>''</font><font color="#000055">'''sub'''</font> go_through_chunks {
          <FONT COLOR="#000055"><B>my</B></FONT> $gene_region = $feature->{'target'};
+
  <font color="#000055">'''my'''</font> ($chunk_start, $chunk_end, $chromosome, $chrom_len) = @_;
          <FONT COLOR="#000055"><B>my</B></FONT> ($gs, $gene_loc) = <FONT COLOR="#000055"><B>split</B></FONT>('\=', $gene_region);
+
  <FONT COLOR="#000055"><B>my</B></FONT> ($gene_start, $gene_end) = <FONT COLOR="#000055"><B>split</B></FONT>('\-', $gene_loc);
+
  <font color="#000055">'''my'''</font> ($region_start, $region_end);
 
+
  <font color="#000055">'''my'''</font> %ids_seen;
  <FONT COLOR="#555500"><I>#build structure for gene
+
 
+
  <font color="#555500">''<nowiki>#loop through regions until all is covered </nowiki>''</font> <font color="#555500">''<nowiki>#keep track of genes to avoid duplicates! </nowiki>''</font> <font color="#000055">'''for'''</font>($region_start = $chunk_start, $region_end = $region_start  $max_len;
</I></FONT>   $gff_gene{'seqid'}      = $chromosome;
+
      $region_start &lt; $chunk_end;
  $gff_gene{'source'}    = $notes{"Genetype"};
+
      $region_start = $region_end  1, $region_end  = $max_len){
  $gff_gene{'type'}      = "gene";
+
  $gff_gene{'start'}      = $gene_start;
+
    <font color="#000055">'''if'''</font>($region_end &gt; $chrom_len){
  $gff_gene{'end'}        = $gene_end;
+
      $region_end = $chrom_len;
  $gff_gene{'score'}      = ".";
+
    }<font color="#000055">'''elsif'''</font>($region_end &gt; $chunk_end){
  $gff_gene{'strand'}    = $strand;
+
      $region_end = $chunk_end;
  $gff_gene{'phase'}      = ".";
+
    }
 
+
    <font color="#000055">'''my'''</font> $region = ":".$region_start.",".$region_end;
  <FONT COLOR="#555500"><I>#get gene description
+
</I></FONT>   <FONT COLOR="#000055"><B>my</B></FONT> $description = "";
+
    <font color="#555500">''<nowiki>#get all transcripts from chunk </nowiki>''</font>   <font color="#000055">'''my'''</font> $new_ids = get_transcripts($region, $chromosome, \%ids_seen);
  <FONT COLOR="#000055"><B>foreach</B></FONT> <FONT COLOR="#000055"><B>my</B></FONT> $gnote (@{$grouphash->{'note'}}){
+
    %ids_seen = (%ids_seen, %$new_ids);
    <FONT COLOR="#000055"><B>my</B></FONT> ($gnote_s, $gnote_string) = <FONT COLOR="#000055"><B>split</B></FONT>('=', $gnote);
+
  }
    <FONT COLOR="#000055"><B>if</B></FONT>($gnote_s <FONT COLOR="#000055"><B>eq</B></FONT> "DESCR"){
+
      $description = ";Description=".$gnote_string;
+
}
    }
+
  }
+
  $gff_gene{'attributes'} = "ID=".$grouphash->{'group_type'}.
+
                            $description.
+
<font color="#555500">''<nowiki>#fetch all available entry-points (chromosomes) and their lengths from server </nowiki>''</font><font color="#000055">'''sub'''</font> get_entry_points {
    ";Status=".$notes{"Genestatus"}.
+
                            ";CREATED=".$notes{"Created"}.
+
  <font color="#000055">'''my'''</font> %chrom_lens;
    ";LASTMOD=".$notes{"Lastmod"};
+
 
+
  <font color="#000055">'''my'''</font> $entry_points = $das-&gt;entry_points();
  <FONT COLOR="#555500"><I>#print entry for transcript
+
 
+
  <font color="#000055">'''foreach'''</font> <font color="#000055">'''my'''</font> $k (<font color="#000055">'''keys'''</font> %$entry_points){
</I></FONT>   print_gff_line(\%gff_gene);
+
<font color="#000055">'''foreach'''</font> <font color="#000055">'''my'''</font> $l (@{$entry_points-&gt;{$k}}){
  %gff_gene = ();
+
<font color="#000055">'''foreach'''</font> <font color="#000055">'''my'''</font> $segment (@{ $l-&gt;{"segment"} }){
 
+
$chrom_lens{ $segment-&gt;{"segment_id"} } = $segment-&gt;{"segment_size"};
  $new_features{$grouphash->{'group_type'}} = 1;
+
}
 
+
  }
}
+
  }
 
+
<FONT COLOR="#000055"><B>if</B></FONT>(!<FONT COLOR="#000055"><B>exists</B></FONT> $transcripts{ $feature->{'feature_id'} }){
+
  <font color="#000055">'''return'''</font> \%chrom_lens;
  $transcripts{ $feature->{'feature_id'} } = 1;
+
}
  <FONT COLOR="#000055"><B>my</B></FONT> %gff_transcript;
+
 
+
  <FONT COLOR="#555500"><I>#build structure for transcript
+
</I></FONT>   $gff_transcript{'seqid'}      = $chromosome;
+
<font color="#555500">''<nowiki>#fetch the data and process it. </nowiki>''</font><font color="#555500">''<nowiki>#note that this function is quite specific to the way your DAS source is set-up. </nowiki>''</font><font color="#555500">''<nowiki>#the idea is to get together all exons, etc that belong to a transcript and all transcripts </nowiki>''</font><font color="#555500">''<nowiki>#that belong to a gene. </nowiki>''</font><font color="#000055">'''sub'''</font> get_transcripts {
  $gff_transcript{'source'}    = $notes{"Transcripttype"};
+
  <font color="#000055">'''my'''</font> ( $region, $chromosome, $previous_genes ) = @_;
  $gff_transcript{'type'}      = "transcript";
+
  $gff_transcript{'start'}      = $feature->{'target_start'};
+
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "have <font color="#000055">'''chr'''</font> $chromosome$region\n";
  $gff_transcript{'end'}        = $feature->{'target_stop'};
+
  $gff_transcript{'score'}      = ".";
+
  <font color="#000055">'''my'''</font> %genes = ();
  $gff_transcript{'strand'}    = $strand;
+
  <font color="#000055">'''my'''</font> %new_features = ();
  $gff_transcript{'phase'}      = ".";
+
  <font color="#000055">'''my'''</font> $response = <font color="#000055">'''undef'''</font><nowiki>;
  $gff_transcript{'attributes'} = "ID=".$feature->{'feature_id'}.";Alias1=".$feature->{'target_id'}.
+
                                  ";Parent=".$grouphash->{'group_type'}.
+
  </nowiki><font color="#555500">''<nowiki>#fetch DAS features </nowiki>''</font> $response = $das-&gt;features({
  ";CREATED=".$notes{"Created"}.
+
      'segment' =&gt; $chromosome.$region,
  ";LASTMOD=".$notes{"Lastmod"}.
+
      'type'    =&gt; $type,
  ";Status=".$notes{"Transcriptstatus"};
+
    });
 
+
  <FONT COLOR="#555500"><I>#print entry for transcript
+
  <font color="#000055">'''while'''</font> (<font color="#000055">'''my'''</font> ($url, $features) = <font color="#000055">'''each'''</font> %$response) {
</I></FONT>   print_gff_line(\%gff_transcript);
+
  %gff_transcript = ();
+
    <font color="#000055">'''if'''</font>(<font color="#000055">'''ref'''</font> $features <font color="#000055">'''eq'''</font> "ARRAY"){
}
+
      <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "Received ".<font color="#000055">'''scalar'''</font> @$features." features.\n";
<FONT COLOR="#555500"><I>#else{ print STDERR "_" }
+
 
+
    FEATURES:
</I></FONT>
+
      <font color="#000055">'''foreach'''</font> <font color="#000055">'''my'''</font> $feature (@$features) {
<FONT COLOR="#555500"><I>#print entry for exons, etc.
+
</I></FONT> <FONT COLOR="#000055"><B>if</B></FONT>($feature->{'type_category'} =~ /error/){
+
<font color="#000055">'''my'''</font> %notes = ();
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> "Found an error feature:\n";
+
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'seqid'}."\t";
+
<font color="#000055">'''my'''</font> $grouphash = $feature-&gt;{'group'}-&gt;[0];
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'source'}."\t";
+
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'type'}."\t";
+
<font color="#555500">''<nowiki>#get other notes </nowiki>''</font> <font color="#000055">'''my'''</font> $i = 0;
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'start'}."\t";
+
<font color="#000055">'''my'''</font><nowiki> $morenote_entry = '';
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'end'}."\t";
+
</nowiki><font color="#000055">'''while'''</font>(<font color="#000055">'''defined'''</font>($feature-&gt;{'note'}-&gt;[$i])){
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'score'}."\t";
+
  <font color="#000055">'''my'''</font> $morenotes = $feature-&gt;{'note'}-&gt;[$i];
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'strand'}."\t";
+
  <font color="#000055">'''my'''</font> ($morenotes_type, $morenotes_value) = <font color="#000055">'''split'''</font>('=', $morenotes);
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'phase'}."\t";
+
  $morenotes_value =~ <font color="#000055">'''s'''</font>/\&amp;\<font color="#555500">''<nowiki>#39\;/\'/g; </nowiki>''</font>   $notes{$morenotes_type} = $morenotes_value;
  <FONT COLOR="#000055"><B>print</B></FONT> <FONT COLOR="#000055"><B>STDERR</B></FONT> $gff_element{'attributes'}."\n";
+
  $i  ;
} <FONT COLOR="#000055"><B>else</B></FONT> {
+
}
  print_gff_line(\%gff_element);
+
  %gff_element = ();
+
<font color="#555500">''<nowiki>#remove duplicates from overlapping regions </nowiki>''</font> <font color="#000055">'''if'''</font>(<font color="#000055">'''defined'''</font> $previous_genes and <font color="#000055">'''exists'''</font>($previous_genes-&gt;{$grouphash-&gt;{'group_type'}})){
}
+
  <font color="#000055">'''next'''</font> FEATURES;
 
+
}
$feature = <FONT COLOR="#000055"><B>undef</B></FONT>;
+
       }
+
<font color="#555500">''<nowiki>#you could do some filtering of the response at this point </nowiki>''</font>
      @$features = ();
+
<font color="#000055">'''my'''</font> %gff_element;
      $features  = <FONT COLOR="#000055"><B>undef</B></FONT>;
+
    }
+
<font color="#555500">''<nowiki>#build structure for exons and general items </nowiki>''</font> <font color="#555500">''<nowiki>#find type </nowiki>''</font> <font color="#000055">'''my'''</font> $element_type = $feature-&gt;{'type'} || "exon";
  }
+
$element_type    =~ <font color="#000055">'''m'''</font>/((intron)|(UTR)|(exon))/g;
 
+
<font color="#000055">'''if'''</font>($1){ $element_type = $1 }
  <FONT COLOR="#000055"><B>return</B></FONT> \%new_features;
+
}
+
<font color="#000055">'''my'''</font> $group_type  = $grouphash-&gt;{'group_type'};
 
+
 
+
<font color="#000055">'''my'''</font> $strand       = $feature-&gt;{'orientation'};
 
+
<font color="#000055">'''if'''</font>($feature-&gt;{'orientation'}   =~ /^(\ |\-|\.)$/) {  }
<FONT COLOR="#555500"><I>#print the different data types as GFF
+
<font color="#000055">'''elsif'''</font>($feature-&gt;{'orientation'} ==  1){ $strand = ' ' }
</I></FONT><FONT COLOR="#000055"><B>sub</B></FONT> print_gff_line {
+
<font color="#000055">'''elsif'''</font>($feature-&gt;{'orientation'} == -1){ $strand = '-' }
  <FONT COLOR="#000055"><B>my</B></FONT> ($element) = @_;
+
<font color="#000055">'''elsif'''</font>($feature-&gt;{'orientation'} ==  0){ $strand = '.' }
 
+
<font color="#000055">'''else'''</font>{ <font color="#000055">'''die'''</font> "INVALID STRAND SYMBOL: ".$feature-&gt;{'orientation'}."\n"; }
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'seqid'}."\t";
+
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'source'}."\t";
+
<font color="#000055">'''my'''</font> $phase        = ".";
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'type'}."\t";
+
<font color="#000055">'''if'''</font>($feature-&gt;{'phase'}){
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'start'}."\t";
+
  $phase = $feature-&gt;{'phase'};
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'end'}."\t";
+
}
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'score'}."\t";
+
<font color="#000055">'''elsif'''</font>($element_type <font color="#000055">'''eq'''</font> "exon"){
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'strand'}."\t";
+
  $phase = "0";
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'phase'}."\t";
+
}
  <FONT COLOR="#000055"><B>print</B></FONT> GFF $element->{'attributes'}."\n";
+
}
+
<font color="#000055">'''if'''</font>(!$notes{"Transcriptstatus"}){
 
+
  <font color="#000055">'''die'''</font> "PROBLEM: $element_type, ".$feature-&gt;{'feature_id'}."\n";
 
+
}
</PRE>
+
 +
$gff_element{'seqid'}      = $chromosome;
 +
$gff_element{'source'}    = $notes{"Transcripttype"};
 +
$gff_element{'type'}      = $element_type;
 +
$gff_element{'start'}      = $feature-&gt;{'start'};
 +
$gff_element{'end'}        = $feature-&gt;{'end'};
 +
$gff_element{'score'}      = ".";
 +
$gff_element{'strand'}    = $strand;
 +
$gff_element{'phase'}      = $phase;
 +
 +
<font color="#555500">''<nowiki>#check for some missing values </nowiki>''</font> <font color="#000055">'''if'''</font>(!<font color="#000055">'''exists'''</font> $feature-&gt;{'feature_id'}){
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "Missing value <font color="#000055">'''for'''</font> Parent-feature_id\n";
 +
  $feature-&gt;{'feature_id'} = "0";
 +
}
 +
<font color="#000055">'''if'''</font>(!<font color="#000055">'''exists'''</font> $notes{"Transcriptstatus"}){
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "Missing value <font color="#000055">'''for'''</font> Transcriptstatus\n";
 +
  $notes{"Transcriptstatus"} = "-";
 +
}
 +
<font color="#000055">'''if'''</font>(!<font color="#000055">'''exists'''</font> $notes{"Created"}){
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "Missing value <font color="#000055">'''for'''</font> Created\n";
 +
  $notes{"Created"} = 0;
 +
}
 +
<font color="#000055">'''if'''</font>(!<font color="#000055">'''exists'''</font> $notes{"Lastmod"}){
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "Missing value <font color="#000055">'''for'''</font> Lastmod\n";
 +
  $notes{"Lastmod"} = 0;
 +
}
 +
$gff_element{'attributes'} = "Parent=".$feature-&gt;{'feature_id'}.
 +
                            ";Status=".$notes{"Transcriptstatus"}.
 +
    ";CREATED=".$notes{"Created"}.
 +
    ";LASTMOD=".$notes{"Lastmod"};
 +
 +
<font color="#000055">'''if'''</font>(!<font color="#000055">'''exists'''</font> $genes{ $group_type }){
 +
  $genes{ $group_type } = 1;
 +
  <font color="#000055">'''my'''</font> %gff_gene;
 +
 +
          <font color="#000055">'''my'''</font> $gene_region = $feature-&gt;{'target'};
 +
          <font color="#000055">'''my'''</font> ($gs, $gene_loc) = <font color="#000055">'''split'''</font>('\=', $gene_region);
 +
  <font color="#000055">'''my'''</font> ($gene_start, $gene_end) = <font color="#000055">'''split'''</font>('\-', $gene_loc);
 +
 +
  <font color="#555500">''<nowiki>#build structure for gene </nowiki>''</font>   $gff_gene{'seqid'}      = $chromosome;
 +
  $gff_gene{'source'}    = $notes{"Genetype"};
 +
  $gff_gene{'type'}      = "gene";
 +
  $gff_gene{'start'}      = $gene_start;
 +
  $gff_gene{'end'}        = $gene_end;
 +
  $gff_gene{'score'}      = ".";
 +
  $gff_gene{'strand'}    = $strand;
 +
  $gff_gene{'phase'}      = ".";
 +
 +
  <font color="#555500">''<nowiki>#get gene description </nowiki>''</font>   <font color="#000055">'''my'''</font> $description = "";
 +
  <font color="#000055">'''foreach'''</font> <font color="#000055">'''my'''</font> $gnote (@{$grouphash-&gt;{'note'}}){
 +
    <font color="#000055">'''my'''</font> ($gnote_s, $gnote_string) = <font color="#000055">'''split'''</font>('=', $gnote);
 +
    <font color="#000055">'''if'''</font>($gnote_s <font color="#000055">'''eq'''</font> "DESCR"){
 +
      $description = ";Description=".$gnote_string;
 +
    }
 +
  }
 +
  $gff_gene{'attributes'} = "ID=".$grouphash-&gt;{'group_type'}.
 +
                            $description.
 +
    ";Status=".$notes{"Genestatus"}.
 +
                            ";CREATED=".$notes{"Created"}.
 +
    ";LASTMOD=".$notes{"Lastmod"};
 +
 +
  <font color="#555500">''<nowiki>#print entry for transcript </nowiki>''</font>   print_gff_line(\%gff_gene);
 +
  %gff_gene = ();
 +
 +
  $new_features{$grouphash-&gt;{'group_type'}} = 1;
 +
 +
}
 +
 +
<font color="#000055">'''if'''</font>(!<font color="#000055">'''exists'''</font> $transcripts{ $feature-&gt;{'feature_id'} }){
 +
  $transcripts{ $feature-&gt;{'feature_id'} } = 1;
 +
  <font color="#000055">'''my'''</font> %gff_transcript;
 +
 +
  <font color="#555500">''<nowiki>#build structure for transcript </nowiki>''</font>   $gff_transcript{'seqid'}      = $chromosome;
 +
  $gff_transcript{'source'}    = $notes{"Transcripttype"};
 +
  $gff_transcript{'type'}      = "transcript";
 +
  $gff_transcript{'start'}      = $feature-&gt;{'target_start'};
 +
  $gff_transcript{'end'}        = $feature-&gt;{'target_stop'};
 +
  $gff_transcript{'score'}      = ".";
 +
  $gff_transcript{'strand'}    = $strand;
 +
  $gff_transcript{'phase'}      = ".";
 +
  $gff_transcript{'attributes'} = "ID=".$feature-&gt;{'feature_id'}.";Alias1=".$feature-&gt;{'target_id'}.
 +
                                  ";Parent=".$grouphash-&gt;{'group_type'}.
 +
  ";CREATED=".$notes{"Created"}.
 +
  ";LASTMOD=".$notes{"Lastmod"}.
 +
  ";Status=".$notes{"Transcriptstatus"};
 +
 +
  <font color="#555500">''<nowiki>#print entry for transcript </nowiki>''</font>   print_gff_line(\%gff_transcript);
 +
  %gff_transcript = ();
 +
}
 +
<font color="#555500">''<nowiki>#else{ print STDERR "_" } </nowiki>''</font>
 +
<font color="#555500">''<nowiki>#print entry for exons, etc. </nowiki>''</font> <font color="#000055">'''if'''</font>($feature-&gt;{'type_category'} =~ /error/){
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> "Found an error feature:\n";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'seqid'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'source'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'type'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'start'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'end'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'score'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'strand'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'phase'}."\t";
 +
  <font color="#000055">'''print'''</font> <font color="#000055">'''STDERR'''</font> $gff_element{'attributes'}."\n";
 +
} <font color="#000055">'''else'''</font> {
 +
  print_gff_line(\%gff_element);
 +
  %gff_element = ();
 +
}
 +
 +
$feature = <font color="#000055">'''undef'''</font><nowiki>;
 +
      }
 +
      @$features = ();
 +
      $features  = </nowiki><font color="#000055">'''undef'''</font><nowiki>;
 +
    }
 +
  }
 +
 +
  </nowiki><font color="#000055">'''return'''</font> \%new_features;
 +
}
 +
 +
 +
 +
<font color="#555500">''<nowiki>#print the different data types as GFF </nowiki>''</font><font color="#000055">'''sub'''</font> print_gff_line {
 +
  <font color="#000055">'''my'''</font> ($element) = @_;
 +
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'seqid'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'source'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'type'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'start'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'end'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'score'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'strand'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'phase'}."\t";
 +
  <font color="#000055">'''print'''</font> GFF $element-&gt;{'attributes'}."\n";
 +
}
 +
 +
  
 +
== [acknoledgments ]Acknowledgments ==
  
 +
(some of this document may have been cut an pasted from documentation contributed by the following people):
  
<h2><a href="acknoledgments"></a>Acknowledgments</h2> (some of this document may have been cut an pasted from documentation contributed by the following people):
+
* Andreas Prlic
<ul>
+
* Andy Jenkinson
<li>Andreas Prlic</li>
+
* Phil Jones
<li>Andy Jenkinson</li>
+
* Tim Hubbard
<li>Phil Jones</li>
+
* Lincoln Stein
<li>Tim Hubbard</li>
+
* Thomas Down
<li>Lincoln Stein</li>
+
|}
<li>Thomas Down</li>
 
</ul>
 
  
 
</div>
 
</div>

Revision as of 04:14, 21 April 2009

Last Updated 5th Feb 2009

The intention of this document is to bring together and add to all the documentation available on the WWW for the DAS system. The content on these pages draws from many sources of information and thus has many contributors. Eventually the intention if that this document will be a set of instructions that you can print out and use as reference documentation or a good read. If you find any errors on these pages and pages that it links to then please contact me (Jonathan Warren) to let me know, any suggestions and contributions are also welcomed.

Contents

Index:

[#what What is DAS?]

[#currentStatus Current Status]

[#settingUp Setting up a DAS Server]

[#serversList Servers available]

[#dazzle Dazzle]

[#gettingDazzle Getting Dazzle]
[#readyPlugins Using ready made plugins for datasources]
[#ownPlugins Writing your own plugin]
[#ensemblRef Deploying an Ensembl Reference Server]

[#myDas MyDas]

[#proserver Proserver]

[#impLatest Implementing the latest specs]

[#testingServer Testing your implementation]

[#validation Validation and Registering of your Server]

[#relaxNG RelaxNG and other validation in the Registry]

[#dasregistry The DAS Registry]

[#dasRegGenInfo Introduction to the DAS Registry]

Connecting to the Registry Programmatically

[#settingUpClient Setting up a DAS client]

[#clientsList Currently Available DAS Clients - table?]

[#writingClients Writing your own DAS client]

[#dasobert A Java DAS Client Library - Dasobert]

[#perlWalking Example of walking a DAS source using perl]

[#acknowledgements Acknowledgments]

Content:

What is DAS?

As biological databases are becoming so large with the advent of high throughput technologies such as sequencers and microarray chips it is becoming increasing difficult to download all the data relevant for a research team. DAS gets around this by keeping the data stored with it's originators and allows users around the world to access just the relevant parts they need at any one time. Put another way: by making use of DAS you can take advantage of being able to view integrated information from multiple sources, without these sources needing to be aware of each other. You can also add your own DAS data source, perhaps privately in your own institution and then view the information served from this source in the context of features from other institutions. DAS stands for Distributed Annotation System. It was originally set up to be used with genomic information where annotations/features are layered on top of a reference sequence , usually a genome. The idea is that a genome browser such as ensembl or GBrowse (both DAS clients in this scenario) can be used to look at annotations from data sources both that exist on the same server/machine the browser is running on and display annotations in the same view from data sources (data served by DAS servers) that could be on the other side of the world (communicating via the WWW). The DAS system consists of the DAS Registry www.dasregistry.org as well as DAS Servers and Clients. The Registry is there to enable people and computers to easily find the DAS data sources available around the world and also to help these data sources conform to the specifications. It's important that data served by DAS servers conform to enable the interoperability of different clients and servers around the world. The 1.6 spec is the latest and soon to be official DAS spec that mainly focuses on genomic annotations but also refers to the Extentions specified in the 1.53E spec below. The 1.53E spec contains up to date specifications for servers and clients that support information that can be exchanged using DAS that is not genome centric. Types of data include Proteins- Structures and alignments, Molecular Interactions, volume map data.

Current Status/ DAS specifications 1.5, 1.53E, 1.6, 2.0 and Future Intentions

Currently DAS 1.5 is the most widely used and supported together with 1.53E. DAS 2.0 is quite different and is really running in parallel to the other 2 versions of DAS and it is hoped that in the next few years these versions will become one version (the 1.6 Spec includes some of the commands from the 2.0 spec). If you wish your data to be widely accessible then use the The 1.6 spec and The 1.53E spec documents as their guide. If your main priority is using the most recent technology and external libraries then DAS2.0 may be of most interest to you.

Setting up a DAS Server

There are several different options available for setting up a DAS server. All are either written in PERL or Java.

Servers available

<%@include file="sangertablestart.jsp"%>
Name Programming Language advantages disadvantages
Dazzle Java Standard implementation, includes support for extensions (structure, interaction, vol) Some people say it can be hard to configure and deploy if you are not used to Java web development
Proserver PERL Standard implementation includes support for extensions (structure, interaction, vol)
MyDAS Java Some people say it's easier to set up and configure than Dazzle Doesn't support extensions currently
LDAS PERL Very Easy to set up? Limited support for DAS functionality and sources

<%@include file="sangertableend.jsp" %>

Dazzle

Dazzle is currently the standard/default implementation for Java users- however MyDas (mentioned below) is popular.

Dazzle Eclipse Tutorial

[DazzzleTutorial.jsp Dazzle Eclipse Tutorial] This tutorial takes you through setting up Dazzle in eclipse and then shows you how to add your own plugins

Getting Dazzle

http://biojava.org/wiki/Dazzle#Getting_Dazzle The latest version from the cutting edge source code is available here from subversion:http://www.derkholm.net/svn/repos/dazzle/

Using ready made plugins for datasources

http://biojava.org/wiki/Dazzle:plugins More examples needed here and tips for using mysql etc? http://biojava.org/wiki/Dazzle:deployment

Writing your own plugin

http://biojava.org/wiki/Dazzle:writeplugin
How to write a plugin using eclipse more on what interfaces need to be implemented and give a full example that implements all needed functionality such as sources.cmd and coordinate system etc.

Deploying an Ensembl Reference Server

link to Ensembl reference server instructions

MyDas

Information about MyDas can be found here.

Proserver

Proserver Page at the Sanger Institute.
Proserver Tutorial
Guide to Proserver

Implementing the latest specs

Proserver example of config to implement sources cmd:


coordinates = TAIR_8,Chromosome,Arabidopsis thaliana -> 1:2000,3000
properties  = key1 -> value1 ; key2 -> value2
mapmaster   =
http://www.gramene.org/das/Arabidopsis_thaliana.TAIR8.reference
capabilities = features -> 1.0
The coordinates data is taken from the coordinates/registry_coordinates.xml file, which is an archived copy of the list of coordinates available in the DAS registry. Specifying the name (or URI, actually) and test range is enough, ProServer will pick up the rest from the XML file. If the full data is not picked up, you may need to update the coordinates XML file from the registry (http://www.dasregistry.org/das/coordinatesystem). If your coordinate system is not in the Registry, an admin can add it for you.

Protein Annotations and Ontologies

[extension_ontology.jsp explanation of ontologies for proteins usage in DAS]

Testing your implementation

Validation and Registering of your Server

RelaxNG and other validation in the Registry

The DAS Registry uses RelaxNG to validate the xml responses from DAS servers before allowing them to register as a valid das source. RelaxNG is essentially a document like a dtd except that it uses an xml syntax that is easy to learn quickly. The registry uses the documents found at the following http://www.dasregistry.org/validation/ and has one document for each of the DAS commands (note you may need to right click "view the source" to see anything on these pages in a web browser) features.rng, sources.rng, alignments.rng, structure.rng, entry_points.rng, interaction.rng, sequence.rng and types.rng.

The DAS Registry

Introduction to the DAS Registry

Connecting to the Registry Programmatically

There are several commands that can be used to query the registry including: The sources cmd with optional parameters: label, organism, authority, capability, type and unique source_id. You can also use the organsim, coordinatesystem and lastmodified commands. For examples see Scripting an example of a java classe written using Dasobert to access the Registry is here http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/ContactRegistry.java

Setting up a DAS client

Currently Available DAS Clients - table?

<%@include file="sangertablestart.jsp"%>
Name Description Programming Language Links
GBrowse

quote from GBrowse Website "GBrowse[1] is the most popular viewer in GMOD. For a list of GBrowse and GMOD installations see the GMOD Users page. For a demo of its features, try the WormBase, FlyBase, or Human Genome Segmental Duplication Database web sites. Spec DAS 1.53E and 1.6 soon

PERL

http://gmod.org/wiki/Gbrowse

EnsEMBL EnsEMBL is a web based genome browser and database system which supports DAS 1.53E and soon 1.6 PERL

http://www.ensembl.org/

IGB is an application built upon the GenoViz SDK and Genometry for visualization and exploration of genomes and corresponding annotations from multiple data sources Java http://genoviz.sourceforge.net/
Jalview A multiple sequence alignment editor & viewer Java

http://www.jalview.org/

Dasty2 Dasty, a protein DAS client is implemented for visualising protein sequence feature information. The client is able to connect, to a reference server and one or many DAS servers. It merges the data from all the servers, and displays sequence information as well as annotated feature information form all the available DAS Servers in a very user friendly way . PERL and AJAX

http://www.ebi.ac.uk/dasty/

Add clients from the workshop <%@include file="sangertableend.jsp" %>

Writing your own DAS client

A Java DAS Client Library - Dasobert

Examples of client code written in Java using Dasobert can be found here: http://www.derkholm.net/svn/repos/dasobert/trunk/doc/examples/

There is also a tutorial for using Dasobert within eclipse that (follows on from the Dazzle eclipse tutorial here): [DasobertTutorial.jsp Dasobert Eclipse Tutorial]

Example of walking a DAS source using perl

This example was kindly provided by Felix Kokocinski: You can specify a region or let it walk through all regions if the server can supply entry points with lengths. This is done in eg. 20 MB slices. It takes quite some time, but works nicely.


# Example script that reads genomic data from DAS server # using a defined chunk size # writing the data out to a gff file. 
# fsk@sanger.ac.uk, 2008 
use strict;
use Bio::Das::Lite;
use Getopt::Long;

#default DAS server adress my $server = "http://das.sanger.ac.uk/das";
#default DAS source name my $source = 'otter_das';
#proxy name my $http_proxy = undef;
 #genomic chunk size to query my $max_len    = 20000000;


my $chromosome = undef;
 my $start      = 0;
my $end        = 0;
my $gff_file   = undef;
 my %transcripts = ();

my $type;

&GetOptions(
	    'file=s'                 => \$gff_file,
	    'chromosome=s'           => \$chromosome,
	    'start=s'                => \$start,
	    'end=s'                  => \$end,
            'server=s'               => \$server,
            'source=s'               => \$source,
	   );

#connect to DAS server my $das = connect_das("$server/$source", $http_proxy);

#get entry point list/lengths #requires the DAS server to support the entry-points function my $chrom_lens = get_entry_points();

open(GFF, ">$gff_file") or die "Can't open file $gff_file.\n";

if($chromosome){
  #query specific region   get_region($chromosome, $start, $end);
}
else{
  #go through all chromosomes	   foreach my $chrom (keys %$chrom_lens){
	print "getting $chrom\n";
	get_region($chrom, undef, undef);
	%transcripts = ();
  }
}


close(GFF)or die "Can't close file $gff_file.\n";


  ################################################ 

#connect to DAS server sub connect_das {
  my ($dsn, $proxy) = @_;

  my $das = Bio::Das::Lite->new({
				 'timeout'    => 10000,
				 'dsn'        => $dsn,
				 'http_proxy' => $proxy,
				}) or die "cant connect to DAS server!\n";

  return $das;
}



#look at the region requested sub get_region {
  my ($chromosome, $start, $end) = @_;

  my $chrom_len    = $chrom_lens->{$chromosome};
  my $region       = "";

  if( $start and $end){
    if($start > $end){
      die "Coordinates wrong: $start > $end!\n";
    }
    if( ($end - $start) <= $max_len ){
      #get entire region       my $region = ":".$start.",".$end;
      get_transcripts($region, $chromosome);
    }
    else{
      go_through_chunks($start, $end, $chromosome, $chrom_len);
    }
  }
  elsif( $chrom_len <= $max_len ){
    #get entire chromosome     get_transcripts($region, $chromosome);
  }
  else{
    go_through_chunks(1, $chrom_len, $chromosome, $chrom_len);
  }

}


#go through a region in chunks sub go_through_chunks {
  my ($chunk_start, $chunk_end, $chromosome, $chrom_len) = @_;

  my ($region_start, $region_end);
  my %ids_seen;

  #loop through regions until all is covered   #keep track of genes to avoid duplicates!   for($region_start = $chunk_start, $region_end = $region_start   $max_len;
      $region_start < $chunk_end;
      $region_start = $region_end   1, $region_end  = $max_len){

    if($region_end > $chrom_len){
      $region_end = $chrom_len;
    }elsif($region_end > $chunk_end){
      $region_end = $chunk_end;
    }
    my $region = ":".$region_start.",".$region_end;

    #get all transcripts from chunk     my $new_ids = get_transcripts($region, $chromosome, \%ids_seen);
    %ids_seen = (%ids_seen, %$new_ids);
  }

}



#fetch all available entry-points (chromosomes) and their lengths from server sub get_entry_points {

  my %chrom_lens;

  my $entry_points = $das->entry_points();

  foreach my $k (keys %$entry_points){
	foreach my $l (@{$entry_points->{$k}}){
		foreach my $segment (@{ $l->{"segment"} }){
			$chrom_lens{ $segment->{"segment_id"} } = $segment->{"segment_size"};
		}
  	}
  }

  return \%chrom_lens;
}



#fetch the data and process it. #note that this function is quite specific to the way your DAS source is set-up. #the idea is to get together all exons, etc that belong to a transcript and all transcripts #that belong to a gene. sub get_transcripts {
  my ( $region, $chromosome, $previous_genes ) = @_;

  print STDERR "have chr $chromosome$region\n";

  my %genes = ();
  my %new_features = ();
  my $response = undef;
 
   #fetch DAS features   $response = $das->features({
			      'segment' => $chromosome.$region,
			      'type'    => $type,
			     });

  while (my ($url, $features) = each %$response) {

    if(ref $features eq "ARRAY"){
      print STDERR "Received ".scalar @$features." features.\n";

    FEATURES:
      foreach my $feature (@$features) {

	my %notes = ();

	my $grouphash = $feature->{'group'}->[0];

	#get other notes 	my $i = 0;
	my $morenote_entry = '';
 	while(defined($feature->{'note'}->[$i])){
	  my $morenotes = $feature->{'note'}->[$i];
	  my ($morenotes_type, $morenotes_value) = split('=', $morenotes);
	  $morenotes_value =~ s/\&\#39\;/\'/g; 	  $notes{$morenotes_type} = $morenotes_value;
	  $i  ;
	}

	#remove duplicates from overlapping regions 	if(defined $previous_genes and exists($previous_genes->{$grouphash->{'group_type'}})){
	  next FEATURES;
	}

	#you could do some filtering of the response at this point 
	my %gff_element;

	#build structure for exons and general items 	#find type 	my $element_type = $feature->{'type'} || "exon";
	$element_type    =~ m/((intron)|(UTR)|(exon))/g;
	if($1){ $element_type = $1 }

	my $group_type   = $grouphash->{'group_type'};

	my $strand       = $feature->{'orientation'};
	if($feature->{'orientation'}    =~ /^(\ |\-|\.)$/) {  }
	elsif($feature->{'orientation'} ==  1){ $strand = ' ' }
	elsif($feature->{'orientation'} == -1){ $strand = '-' }
	elsif($feature->{'orientation'} ==  0){ $strand = '.' }
	else{ die "INVALID STRAND SYMBOL: ".$feature->{'orientation'}."\n"; }

	my $phase        = ".";
	if($feature->{'phase'}){
	  $phase = $feature->{'phase'};
	}
	elsif($element_type eq "exon"){
	  $phase = "0";
	}

	if(!$notes{"Transcriptstatus"}){
	  die "PROBLEM: $element_type, ".$feature->{'feature_id'}."\n";
	}

	$gff_element{'seqid'}      = $chromosome;
	$gff_element{'source'}     = $notes{"Transcripttype"};
	$gff_element{'type'}       = $element_type;
	$gff_element{'start'}      = $feature->{'start'};
	$gff_element{'end'}        = $feature->{'end'};
	$gff_element{'score'}      = ".";
	$gff_element{'strand'}     = $strand;
	$gff_element{'phase'}      = $phase;

	#check for some missing values 	if(!exists $feature->{'feature_id'}){
	  print STDERR "Missing value for Parent-feature_id\n";
	  $feature->{'feature_id'} = "0";
	}
	if(!exists $notes{"Transcriptstatus"}){
	  print STDERR "Missing value for Transcriptstatus\n";
	  $notes{"Transcriptstatus"} = "-";
	}
	if(!exists $notes{"Created"}){
	  print STDERR "Missing value for Created\n";
	  $notes{"Created"} = 0;
	}
	if(!exists $notes{"Lastmod"}){
	  print STDERR "Missing value for Lastmod\n";
	  $notes{"Lastmod"} = 0;
	}
	$gff_element{'attributes'} = "Parent=".$feature->{'feature_id'}.
	                             ";Status=".$notes{"Transcriptstatus"}.
				     ";CREATED=".$notes{"Created"}.
				     ";LASTMOD=".$notes{"Lastmod"};

	if(!exists $genes{ $group_type }){
	  $genes{ $group_type } = 1;
	  my %gff_gene;

          my $gene_region = $feature->{'target'};
          my ($gs, $gene_loc) = split('\=', $gene_region);
	  my ($gene_start, $gene_end) = split('\-', $gene_loc);

	  #build structure for gene 	  $gff_gene{'seqid'}      = $chromosome;
	  $gff_gene{'source'}     = $notes{"Genetype"};
	  $gff_gene{'type'}       = "gene";
	  $gff_gene{'start'}      = $gene_start;
	  $gff_gene{'end'}        = $gene_end;
	  $gff_gene{'score'}      = ".";
	  $gff_gene{'strand'}     = $strand;
	  $gff_gene{'phase'}      = ".";

	  #get gene description 	  my $description = "";
	  foreach my $gnote (@{$grouphash->{'note'}}){
	    my ($gnote_s, $gnote_string) = split('=', $gnote);
	    if($gnote_s eq "DESCR"){
	      $description = ";Description=".$gnote_string;
	    }
	  }
	  $gff_gene{'attributes'} = "ID=".$grouphash->{'group_type'}.
	                            $description.
				    ";Status=".$notes{"Genestatus"}.
	                            ";CREATED=".$notes{"Created"}.
				    ";LASTMOD=".$notes{"Lastmod"};

	  #print entry for transcript 	  print_gff_line(\%gff_gene);
	  %gff_gene = ();

	  $new_features{$grouphash->{'group_type'}} = 1;

	}

	if(!exists $transcripts{ $feature->{'feature_id'} }){
	  $transcripts{ $feature->{'feature_id'} } = 1;
	  my %gff_transcript;

	  #build structure for transcript 	  $gff_transcript{'seqid'}      = $chromosome;
	  $gff_transcript{'source'}     = $notes{"Transcripttype"};
	  $gff_transcript{'type'}       = "transcript";
	  $gff_transcript{'start'}      = $feature->{'target_start'};
	  $gff_transcript{'end'}        = $feature->{'target_stop'};
	  $gff_transcript{'score'}      = ".";
	  $gff_transcript{'strand'}     = $strand;
	  $gff_transcript{'phase'}      = ".";
	  $gff_transcript{'attributes'} = "ID=".$feature->{'feature_id'}.";Alias1=".$feature->{'target_id'}.
	                                  ";Parent=".$grouphash->{'group_type'}.
					  ";CREATED=".$notes{"Created"}.
					  ";LASTMOD=".$notes{"Lastmod"}.
					  ";Status=".$notes{"Transcriptstatus"};

	  #print entry for transcript 	  print_gff_line(\%gff_transcript);
	  %gff_transcript = ();
	}
	#else{ print STDERR "_" } 
	#print entry for exons, etc. 	if($feature->{'type_category'} =~ /error/){
	  print STDERR "Found an error feature:\n";
	  print STDERR $gff_element{'seqid'}."\t";
	  print STDERR $gff_element{'source'}."\t";
	  print STDERR $gff_element{'type'}."\t";
	  print STDERR $gff_element{'start'}."\t";
	  print STDERR $gff_element{'end'}."\t";
	  print STDERR $gff_element{'score'}."\t";
	  print STDERR $gff_element{'strand'}."\t";
	  print STDERR $gff_element{'phase'}."\t";
	  print STDERR $gff_element{'attributes'}."\n";
	} else {
	  print_gff_line(\%gff_element);
	  %gff_element = ();
	}

	$feature = undef;
       }
       @$features = ();
       $features  = undef;
     }
   }
 
   return \%new_features;
}



#print the different data types as GFF sub print_gff_line {
  my ($element) = @_;

  print GFF $element->{'seqid'}."\t";
  print GFF $element->{'source'}."\t";
  print GFF $element->{'type'}."\t";
  print GFF $element->{'start'}."\t";
  print GFF $element->{'end'}."\t";
  print GFF $element->{'score'}."\t";
  print GFF $element->{'strand'}."\t";
  print GFF $element->{'phase'}."\t";
  print GFF $element->{'attributes'}."\n";
}


[acknoledgments ]Acknowledgments

(some of this document may have been cut an pasted from documentation contributed by the following people):
  • Andreas Prlic
  • Andy Jenkinson
  • Phil Jones
  • Tim Hubbard
  • Lincoln Stein
  • Thomas Down