src/hg/makeDb/trackDb/snp125.html 1.21

1.21 2009/07/16 16:39:23 ann
added paragraph about the new feature of displaying the function and coding differences relative to particular gene sets.
Index: src/hg/makeDb/trackDb/snp125.html
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/hg/makeDb/trackDb/snp125.html,v
retrieving revision 1.20
retrieving revision 1.21
diff -b -B -U 1000000 -r1.20 -r1.21
--- src/hg/makeDb/trackDb/snp125.html	29 Nov 2008 20:00:33 -0000	1.20
+++ src/hg/makeDb/trackDb/snp125.html	16 Jul 2009 16:39:23 -0000	1.21
@@ -1,164 +1,177 @@
 <H2>Description</H2>
 <P>
 This track contains
 <A HREF="http://www.ncbi.nlm.nih.gov/SNP/" target=_blank>dbSNP</A>
 build 125, available from
 <A HREF="ftp://ftp.ncbi.nih.gov/snp/organisms" target=_blank>ftp.ncbi.nih.gov/snp</A>.
 </P>
 
 <H2>Interpreting and Configuring the Graphical Display</H2>
 <P>
   Variants are shown as single tick marks at most zoom levels.
   When viewing the track at or near base-level resolution, the displayed
   width of the SNP corresponds to the width of the variant in the reference
   sequence. Insertions are indicated by a single tick mark displayed between
   two nucleotides, single nucleotide polymorphisms are displayed as the width 
   of a single base, and multiple nucleotide variants are represented by a 
   block that spans two or more bases.
 </P>
 <P>
   The configuration categories reflect the following definitions (not all categories apply
   to this assembly):
 </P>
   <UL>
 
     <LI>
       <A name="LocType"></A>
       <B>Location Type</B>: Describes the alignment of the flanking sequence<BR>
       <UL>
         <LI><B>Range</B> - the flank alignments leave a gap of 2 or more bases in the reference assembly
         <LI><B>Exact</B> - the flank alignments leave exactly one base between them
         <LI><B>Between</B> - the flank alignments are contiguous; the variation is an insertion
         <LI><B>RangeInsertion</B> - the flank alignments surround a distinct polymorphism between
 	                            the submitted sequence and reference assembly;    
 				    the submitted sequence is shorter
         <LI><B>RangeSubstitution</B> - the flank alignments surround a distinct polymorphism between
 	                            the submitted sequence and reference assembly;
 				    the submitted sequence and the reference assembly sequence are of equal length
         <LI><B>RangeDeletion</B> - the flank alignments surround a distinct polymorphism between
 	                           the submitted sequence and reference assembly;
 				    the submitted sequence is longer
       </UL>
     </LI>
     <LI>
       <A name="Class"></A>
       <B>Class</B>: Describes the observed alleles<BR>
       <UL>
         <LI><B>Single</B> - single nucleotide variation: all observed alleles are single nucleotides
 	    (can have 2, 3 or 4 alleles)
         <LI><B>In-del</B> - insertion/deletion (applies to RangeInsertion, RangeSubstitution, RangeDeletion)
         <LI><B>Heterozygous</B> - heterozygous (undetermined) variation: allele contains string '(heterozygous)'
         <LI><B>Microsatellite</B> - the observed allele from dbSNP is variation in counts of short tandem repeats
         <LI><B>Named</B> - the observed allele from dbSNP is given as a text name
         <LI><B>No Variation</B> - no variation asserted for sequence
         <LI><B>Mixed</B> - the cluster contains submissions from multiple classes
         <LI><B>Multiple Nucleotide Polymorphism</B> - alleles of the same length, length > 1, and from set of {A,T,C,G}
         <LI><B>Insertion</B> - the polymorphism is an insertion relative to the reference assembly
         <LI><B>Deletion</B> - the polymorphism is a deletion relative to the reference assembly
         <LI><B>Unknown</B> - no classification provided by data contributor
       </UL>
     </LI>
 
 
     <LI>
       <A name="Valid"></A>
       <B><A HREF="http://www.ncbi.nlm.nih.gov/SNP/snp_legend.cgi?legend=validation" 
 	target="_blank">Validation</A></B>: Method used to validate
 	the variant (<I>each variant may be validated by more than one method</I>)<BR>
         <UL>
         <LI><B>By Frequency</B> - at least one submitted SNP in cluster has frequency data submitted
         <LI><B>By Cluster</B> - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method
         <LI><B>By Submitter</B> - at least one submitter SNP in cluster was validated by independent assay
         <LI><B>By 2 Hit/2 Allele</B> - all alleles have been observed in at least 2 chromosomes
         <LI><B>By HapMap</B> - validated by HapMap project
         <LI><B>Unknown</B> - no validation has been reported for this variant
       </UL>
     </LI>
     <LI>
       <A name="Func"></A>
       <B>Function</B>: Predicted functional role 
 	(<I>each variant may have more than one functional role</I>)<BR>
       <UL>
         <LI><B>Locus Region</B> - variation within 2000 bases of gene, but not in transcript
         <LI><B>Coding - Synonymous</B> - no change in peptide for allele with respect to reference assembly
         <LI><B>Coding - Non-Synonymous</B> - change in peptide for allele with respect to reference assembly
         <LI><B>Untranslated</B> - variation in transcript, but not in coding region interval
         <LI><B>Intron</B> - variation in intron, but not in first two or last two bases of intron
         <LI><B>Splice Site</B> - variation in first two or last two bases of intron
         <LI><B>Reference</B> - allele observed in a coding region of the reference sequence
         <LI><B>Unknown</B> - no known functional classification
       </UL>
     </LI>
     <LI>
       <A name="MolType"></A>
       <B>Molecule Type</B>: Sample used to find this variant<BR>
       <UL>
         <LI><B>Genomic</B> - variant discovered using a genomic template
         <LI><B>cDNA</B> - variant discovered using a cDNA template
         <LI><B>Unknown</B> - sample type not known
       </UL>
     </LI>
     <LI>
       <A name="AvHet"></A>
       <B>Average heterozygosity</B>: Calculated by dbSNP as described 
       <A HREF="http://www.ncbi.nlm.nih.gov/SNP/Hetfreq.html" target=_blank>here</A>
       <UL>
       <LI> Average heterozygosity should not exceed 0.5 for bi-allelic single-base substitutions.
       </UL>
     </LI>
     <LI>
       <A name="Weight"></A>
       <B>Weight</B>: Alignment count<BR>
       <UL>
       <LI>Weight can be 1, 2, 3 or 10.   
       <LI>Weight = 10 is excluded from the data set.
       <LI>A filter on maximum weight value is supported, which defaults to 3.
       <LI>Alignments to chrN_random are not included.
       </UL>
     </LI>
   </UL>
 
+ <P>
+ You can configure this track such that the details page displays
+ the function and coding differences relative to
+ particular gene sets.  Choose the gene sets from the list on the SNP
+ configuration page displayed beneath this heading: <EM>On details page,
+ show function and coding differences relative to</EM>.
+ When one or more gene tracks are selected, the SNP details page
+ lists all genes that the SNP hits (or is close to), with the same keywords
+ used in the <A HREF="#Func">function</A> category.  The function usually
+ agrees with NCBI's function, but can sometimes give a bit more detail
+ (e.g. more detail about how close a near-gene SNP is to a nearby gene).
+ </P>
+
 <H2>Insertions/Deletions</H2>
 <P>
 dbSNP uses a class called 'in-del'. This has been split into the 'insertion' and 'deletion' categories, based on location type.
 The location types 'range' and 'exact' are deletions relative to the reference assembly.
 The location type 'between' indicates insertions relative to the reference assembly.
 For the new location types, the class 'in-del' is preserved.
 
 <H2>UCSC Annotations</H2>
 <P>
   In addition to presenting the dbSNP data, the following annotations are provided:
 </P>
   <UL>
   <LI>The size of the dbSNP reference allele is checked to see if it matches the coordinate
       span; exceptions are noted.</LI>
   <LI>The dbSNP reference allele is compared to the UCSC reference allele, and a note is made
       if the dbSNP reference allele is the reverse complement of the UCSC reference allele.</LI>
   <LI>Single-base substitutions are noted where the alignments of the
       flanking sequences are adjacent or have a gap of more than one base.</LI>
   <LI>A note is made if the observed alleles are not available from the rs_fasta files.</LI>
   <LI>Observed alleles with an unexpected format are noted.</LI>
   <LI>The length of the observed alleles is checked for consistency with location types;
       exceptions are noted.</LI>
   <LI>Single-base substitutions are checked to see that one of the observed alleles matches
       the reference allele; exceptions are noted.</LI>
   <LI>Simple deletions are checked to see that the observed allele matches the reference allele;
       exceptions are noted.</LI>
   <LI>Tri-allelic and quad-allelic single-base substitutions are noted.</LI>
   <LI>Variants that have multiple mappings are noted.</LI>
   </UL>
 
 <H2>Data Sources</H2>
   <UL>
   <LI>Coordinates, orientation, location type and dbSNP reference allele 
       data were obtained from b125_SNPContigLoc.bcp.gz.  
   <LI>b125_SNPMapInfo.bcp.gz provided the alignment weights; alignments with 
       weight = 10 were filtered out.
   <LI>Functional classification information was obtained from b125_SNPContigLocusId.bcp.gz.
   <LI>Validation status and heterozygosity were obtained from SNP.bcp.gz.
   <LI>The header lines in the rs_fasta files were used for class, 
       observed polymorphism and molecule type.
   </UL>