RegistryTutorial2010
Contents
Tutorial: An Introduction to the DAS Registry
Finding DAS sources – the DAS Registry Service at the Sanger Institute
“The purpose of the DAS registration service is to keep track which DAS services are around, which DAS commands they are understanding and about the coordinate systems of their data.”
The DAS Registration Server allows DAS clients to discover DAS services. At the same time it provides an elegant browsable web interface that allows you to search for DAS sources, test their status, examine their reliability and learn more about what they offer. At the time of writing, a total of 754 servers from 44 institutions in 17 different countries are included in the registry.
The following activities will familiarize you with the content of the registry, both the 'human' interface and also the web-service interface that can be queried by DAS clients (and possibly your client code in the exercises this afternoon).
First of all, a few exercises to familiarize you with the DAS Registry web site:
Navigate to The DAS Registration Server
Follow the "list" ... "list sources" menu item at the top of this page. On the resulting page, find all DAS annotation servers that use UniProt as their 'authority'.
Select any one of these services and follow the information link ('i' in a blue circle) to find out details of this server including how reliable it has been recently.
For the same DAS service, use the DAS Registration Server to test its current capabilities to find out if it is supporting all of the functionality that it claims to at the present time. (Start with the green tick).
The DAS Registration Server allows DAS sources to be organised into projects, which can also be used at the client level to request annotation from logically associated DAS sources, such as DAS sources that are associated with the BioSapiens Project.
To view all the projects currently described in the DAS Registration Server, take a look at: http://www.dasregistry.org/listProjects.jsp
Of course different DAS data sources serve sequences and features based upon numerous different accession and coordinate systems. The DAS Registration Server provides functionality to keep track of this and to ensure a common nomenclature.
Look at the response to the DAS Registry coordsys command: http://www.dasregistry.org/coordsys/CS_DS6. On this page you are presented with a description of a specific accession system, namely the UniProt coordinate system in this case. You will also find a link that allows you to view a list of all the DAS servers that implement this coordinate / accession system.
DAS web service
The next few exercises will allow you to examine some of the web service capabilities of the DAS Registration Server.
Note that the DAS Registration Service provides both a simple DAS style web service allowing you build a simple URL request and receive an XML response, as well as a SOAP (Simple Object Access Protocol) web service. Here we will focus on the DAS style web service.
Look at the response to the DAS Registry sources command: http://www.dasregistry.org/das1/sources.
You should see that the XML response provides a <SOURCES> XML file, containing any number of <SOURCE> elements. Each <SOURCE> element describes a single data source, providing parsable information including:
- The source URI as defined in the registry, of the form DS_NNN;
- The title and a description of the data source;
- Details of the datasource maintainer;
- The coordinate system used by the data source (note that this is defined in terms of DAS Registry registered coordinate systems, e.g. "UniProt" has been assigned the URI CS_DS6);
- The capabilities (implemented commands) of the data source;
- A range of additional properties for the service.
You can also perform the sources query for a single datasource: http://www.dasregistry.org/das1/DS_409
If you go back to the list of servers page you can search for data sources with specific types by for example typing exon in the text area close to the top of the page with a "search types only" button
Finally, you may like to take a look at the DAS Registry Server Help with Scripting Page that provides more information about how to use the available web services in your own scripts.
coordinate systems
You can list the coordinate systems by using http://www.dasregistry.org/das1/coordinatesystem
sources
You will see that there are parameters you can use with the main queries that will give more specific results for example: http://www.dasregistry.org/dasregistry/das1/sources?organism=9606&authority=NCBI&version=36&type=chromosome will return a list of soures that can all be mapped to the one coordinate system. You may also use this query to find all sources that can be mapped to humans http://www.dasregistry.org/das1/sources?organism=9606 for 1.5 sources only http://www.dasregistry.org/das1.5/sources/ or for 1.6 sources only http://www.dasregistry.org/das1.6/sources/
organisms
Note: http://www.dasregistry.org/das1/organism no longer works as it would have to return 400,000 results in xml format!
More on validation
The validation in the registry uses rules that are set up by RelaxNG documents found here: http://www.dasregistry.org/validation1.6/
if you select a link and then choose "view" "source" in your browser (browser dependant) you can see the rules in an xml like format
The tags are pretty self explanatory like <zeroOrMore> or <optional> relating to the elements within the tags. There are other rules the registry uses but seeing these documents will help you overcome most validation problems.
When registering a source or updating one, the capabilities you select will need to be valid otherwise you will not be able to register/update your source.
Registering/Updating a source in the registry
Log in with your open_id account ( having an account makes managing your das sources easier) choose "register new" (if updating you can select "sources", then "show mine" and click on the pencil icon to edit, the proceedure is then similar to registering a new source below). fill in the fields appropriately click update registration fill in a test region for your source e.g. 1:10000,110000 meaning chromosome 1 start and stop, which does return features. if your update/registered source is valid then the update will occur, if you DAS source is invalid it will tell you and not allow the update and you should follow the link shown to try validating your source to see where the problems are.
Jonathan Warren, Philip Jones, Sanger & EMBL-EBI, Hinxton, UK. February 2010