0906b687e064630d4f016448385f5742ddb409fe jnavarr5 Fri Apr 2 15:19:31 2021 -0700 Adding a FAQ about single or more than 3 alleles for a SNP, refs #27313 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index 6f886dd..c2e92cd 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -34,30 +34,31 @@ <li><a href="#download28">Converting genome coordinates between assemblies</a></li> <li><a href="#download33">Linking gene name with accession number</a></li> <li><a href="#download31">Obtaining a list of Known Genes</a></li> <li><a href="#download16">Repeat-masking data</a></li> <li><a href="#download17">Availability of repeat-masked data</a></li> <li><a href="#download24">RepeatMasker version differences - UCSC vs. Repeatmasker website</a></li> <li><a href="#download18">Obtaining promoter sequence</a></li> <li><a href="#download19">Data from Evolutionary Conservation Score tracks</a></li> <li><a href="#download20">Minus strand coordinates - axtNet files</a></li> <li><a href="#download21">Mapping UCSC STS marker IDS to those of other groups</a></li> <li><a href="#download22">deCODE map data</a></li> <li><a href="#download29">Direct MariaDB (MySQL) access to data</a></li> <li><a href="#download34">Name of fourth column in BED output</a></li> <li><a href="#download36">Track data access</a></li> <li><a href="#snp">How do I download dbSNP data?</a></li> +<li><a href="#snpAlleles">Why doesn't this SNP have two alleles?</a></li> <li><a href="#download37">Known issues with Table Browser GTF output</a></li> <li><a href="#download38">Table Browser output file not ordered</a></li> <li><a href="#download39">'Permisssion denied' error when trying to use command-line utilities</a></li> <li><a href="#download40">Restricted Track Data</a></li> <li><a href="#downloadAnalysis">What is the genome analysis set?</a></li> </ul> <hr> <p> <a href="index.html">Return to FAQ Table of Contents</a></p> <a name="download1"></a> <h2>Downloading sequence and annotation data</h2> <h6>How do I obtain the sequence and/or annotation data for a release?</h6> <p> Sequence and annotation data downloads are usually made available within the first week of the @@ -1031,30 +1032,57 @@ that can be used to retrieve values from a particular chromosome range. A list of rs# IDs can also be pasted/uploaded in the <a href="/cgi-bin/hgVai" target=_blank>Variant Annotation Integrator</a> tool in order to find out which genes (if any) the variants are located in, as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc. </p><p> See our searchable <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps" target=_blank>mailing list archives</a> for more information and example queries. We also have information on <a href="http://genome.ucsc.edu/blog/">our blog</a> about <a href="http://genome.ucsc.edu/blog/?s=programmatic"> Accessing the Genome Browser Programmatically</a> to acquire data. </p> +<a name="snpAlleles"></a> +<h2>Why doesn't this SNP have two alleles?</h2> +<p> +When using the SNP tracks, some records may contain information about one or more alleles instead of +the usual two alleles for the SNP. The following information information should explain how this is +possible.</p> +<dl> + <dt>One allele (i.e. reference only):</dt> + <dd> + The human genome reference has gone through many different assembly versions. The reference + genome has always been a mosaic of sequences from multiple individuals, so it contains some + rare or singleton mutations and is not entirely free of errors. Some SNPs were discovered on + previous assembly versions, and the latest assembly version has the corrected or common allele, + which turns out to be the only observed allele (so the SNP was an artifact of the reference + assembly having a rare mutation or error in the past, not a real SNP).</dd> + <dt>Three alleles:</dt> + <dd> + It's rare, but possible, for the same base to be mutated to different values in different + people.</dd> + <dt>Four alleles:</dt> + <dd> + This would be even rarer than three alleles. In the past, it has often been a symptom of strand + errors, for example, the same variant is reported separately as A/G on the forward strand and + C/T on the reverse strand, but then the strand information being lost in processing and the + reports merged to A/C/G/T.</dd> +</dl> + <a name="download37"></a> <h2>Obtaining GTF (Gene Transfer Format)</h2> <h6>What is the best method for obtaining GTF output?</h6> <p> Currently, the <a href="../cgi-bin/hgTables">Table Browser</a> option return data in <a href="../FAQ/FAQformat.html#format4">GTF format</a> is limited as explained below. To convert custom GenePred format data into GTF, the best method is to use the command-line format conversion utility, <code>genePredToGtf</code>. This can optionally be set up to automatically connect to the UCSC public SQL database and return GTF files in a few minutes using <a href="http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format#Using_kent_commands_with_the_public_database_server"> this short guide</a>.</p> <p> For simplicity, GTF files have been generated using the <code>genePredToGtf</code> method described above and are available on our download server for the main gene transcript sets. These can be found at the following download server address: