c7ff827180db5c246eb48e0b52cec414c44370cb jnavarr5 Thu Oct 31 15:51:46 2024 -0700 Adding examples of how to extract positions using rsIDs or extract rsIDs using a list of positions. Max says it is a commonly asked question so updating our FAQ documentation, refs #34701 diff --git src/hg/htdocs/FAQ/FAQreleases.html src/hg/htdocs/FAQ/FAQreleases.html index 3e2180c..947044a 100755 --- src/hg/htdocs/FAQ/FAQreleases.html +++ src/hg/htdocs/FAQ/FAQreleases.html @@ -10,30 +10,35 @@

Topics


Return to FAQ Table of Contents

List of UCSC genome releases

How do UCSC's release numbers correspond to those of other organizations, such as NCBI?

The first release of an assembly is given a name using the first three characters of the organism's genus and species classification in the format gggSss#, with subsequent assemblies incrementing @@ -564,53 +569,88 @@ Browser using the dbSNP track for human assemblies (i.e. hg19 or hg38) or the EVA SNP track on mouse assemblies (i.e. mm10 or mm39) to perform the conversion.

To summarize the setps:

  1. Create a file of all rsIDs
  2. Use the Table Browser to map the file of rsIDs to the other assembly's coordinates
  3. Create another file containing any rsIDs that were not mapped by the Table Browser
  4. Using the file from the previous step, use the Table Browser to create a BED4 file for the rsIDs that were not mapped by the Table Browser
  5. Run LiftOver on the BED4 file to get the new coordinates in the other assembly
  6. Use the Data Integrator to map the LiftOver results to new rsIDs where possible
  7. Combine the Table Browser rsID-mapped BED4 with the LiftOver/Data Integrator-mapped BED4. Beware duplicates that will cause downstream problems. You will need to decide whether to remove duplicates as unreliable or resolve duplicates
+ +
How can I convert a large set of SNP annotations?

For bulk conversions, the Table Browser can be used to extract the coordinates for the rsIDs on the target assembly. More information about performing batch queries on the Table Browser can be found on the following Table Browser help page. An example of using the Table Browser to convert SNP between assemblies can be found on a previously answered question available on the mailing list archive.

If you are using versions dbSNP 153 and above, the data are formatted as bigBed files instead of being stored as a MariaDb table. For very large queries, this may cause the Table Browser to timeout before the query finishes as dbSNP has grown to include over 700 million variants. If you find that your Table Browser query timesout for your list of rsIDs, you can use the bigBedNamedItems command-line tool to extract the rsID coordinates directly from the -bigBed file instead of using the Table Browser. More information and examples using the +bigBed file instead of using the Table Browser.

+

More information and examples using the bigBedNamedItems utility can be found on the following FAQ entry. As a reminder, you can run any Kent command-line tool without arguments to get the usage statement.

+ +
How can I extract a list of rsIDs using chrom:start-end or vise versa?
+

+Several utilities for working with bigBed-formatted binary files can be downloaded +here. +Run a utility with no arguments to see a brief description of the utility and its options. +

+ + + +
Examples:
+
    +
  1. Retrieve all variants in the region chr1:200001-200400

    +
    bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp155.bb -chrom=chr1 -start=200000 -end=200400 stdout
    +
  2. +
  3. Retrieve variant rs6657048 +
    bigBedNamedItems dbSnp155.bb rs6657048 stdout
    +
  4. +
  5. Retrieve all variants with rs# IDs in file myIds.txt

    +
    bigBedNamedItems -nameFile dbSnp155.bb myIds.txt dbSnp155.myIds.bed
    +
  6. +
+

Missing annotation tracks

Why is my favorite annotation track missing from your latest release?

The initial release of a new genome assembly typically contains a small subset of core annotation tracks. New tracks are added as they are generated. In many cases, our annotation tracks are contributed by scientists not affiliated with UCSC who must first obtain the sequence, repeatmasked data, etc. before they can produce their tracks. If you have need of an annotation that has not appeared on an assembly within a month or so of its release, feel free to send an inquiry to genome@soe.ucsc.edu. Messages sent to this address will be posted to the moderated genome mailing list, which is archived on a SEARCHABLE, PUBLIC Google Groups forum.