RFC010 Title: Normalizing Groups Author: Lincoln Stein Version: 1 Date: 29 August 2001 It is not uncommon for genomic features to be grouped, and for the groups themselves to be grouped into higher-ordered structures. Familiar examples include: exons that are grouped into alternative transcripts, transcripts that are grouped into genes, HSPs that are grouped into gapped alignments. The obvious way to handle these groupings is with a denormalized, hierarchical schema like this one: However, there's a problem with this, which is that the information for exon1 is repeated twice. If there's additional information associated with exon1 (for example, external links), then the client cannot know that it has collected all the information about exon1 until it has collected all the transcripts that contain it. Another way to handle this is to normalize the groups: In this way all the low-level features are declared explicitly and given an ID that is unique (or at least unique for the scope of the XML). The low-level features are then explicitly grouped. Groups IDs can be further grouped, for example: The problem with this approach is that it requires the client to do more bookeeping. However in the long run I think it makes the system more stable.