39b458cbf20d613b1acc53c085d2416423379fe1 jnavarr5 Thu May 16 16:28:32 2024 -0700 Creating a JASPAR FAQ for users who are trying to view only one transcription factor or download the data using the Table Browser, refs #33581 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index f86a9a9..6c68dc8 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -22,30 +22,31 @@
  • Selection of GenBank ESTs
  • EST strand direction
  • Missing RefSeq ID
  • Finished vs. draft segments
  • chr_alt Chromosome
  • chr_fix Chromosome
  • chrN_random tables
  • Chromosome Un
  • Chromosome M
  • N characters at beginning of human chr22
  • Erroneous duplicated chrY_random region on Mouse Build 34 (mm6)
  • Mapping chimp chromosome numbers to human chromosomes numbers
  • Converting genome coordinates between assemblies
  • Linking gene name with accession number
  • Obtaining a list of Known Genes
  • +
  • Filtering for a transcription factor in the JASPAR database
  • Repeat-masking data
  • Availability of repeat-masked data
  • RepeatMasker version differences - UCSC vs. Repeatmasker website
  • Obtaining promoter sequence
  • Data from Evolutionary Conservation Score tracks
  • Minus strand coordinates - axtNet files
  • Mapping UCSC STS marker IDS to those of other groups
  • deCODE map data
  • Direct MariaDB (MySQL) access to data
  • Name of fourth column in BED output
  • Track data access
  • How do I download dbSNP data?
  • Why doesn't this SNP have two alleles?
  • Known issues with Table Browser GTF output
  • Table Browser output file not ordered
  • @@ -711,30 +712,95 @@

    Obtaining a list of Known Genes

    How can I obtain a complete list of all the genes in the UCSC Known Genes table for a particular organism?

    To obtain a complete copy of the entire Known Genes data set for an organism, open the Genome Browser Downloads page, jump to the section specific to the organism, click the Annotation database link in that section, then click the link for the knownGene.txt.gz table.

    Data for a specific region or chromosome may be obtained from the Table Browser by selecting the "Genes and Gene Prediction Tracks" group, the "UCSC Genes" track and the "knownGene" table. Set the position to the region of interest, then click the "get output" button.

    + +

    Filtering for a transcription factor in the JASPAR database

    +
    How do I display only one transcript?
    +
    +
    +
    +
    +
    +

    + On the track settings page for the + JASPAR + Transcription Factors track, you can filter for a transcription factor + using the Filter by Transcription factor name setting. +

    + For example, if you wanted to only display the TEAD1 transcription factor, under + "Filter by Transcription factor name" select TEAD1, then click + . +

    +
    +
    +
    +
    + JASPAR Setting +
    +
    +
    + + +
    How can I download the data for only one transcription factor using the Table Browser?
    +
    +
    +
    +
    +
    +

    + On the Table Browser, you can + create a filter + to limit the output to a set of transcription factors. A summary of the steps is as + follows: +

    +
      +
    1. Select a human or mouse genome with the JASPAR track, e.g. hg38 or mm10.
    2. +
    3. Select the JASPAR Transcription Factors track.
    4. +
    5. Define the region with the position field. This track is unavailable for + genome-wide download as the connection will timeout due to the billions of items in the + track.
    6. +
    7. Filter for the transcription factors by clicking +
        +
      • On the resulting page, create a filter similar to an SQL query: + TFName does match [transcription_factor] +
        e.g. + TFName does match TEAD1
      • +
      • Click
      • +
    8. +
    9. Select the output format and then click
    10. +
    +
    +
    +
    +
    + JASPAR Table Browser +
    +
    +
    +

    Repeat-masking data

    What version of RepeatMasker do you use on your data? Which flags do you use?

    UCSC uses the latest versions of RepeatMasker and repeat libraries available on the date when the assembly data is processed. RepeatMasker version information can usually be found in the README text for the assembly's bigZips downloads directory.

    Masking is done using the RepeatMasker -s flag. For mouse repeats, we also use -m. In addition to RepeatMasker, we use the Tandem Repeat Finder (trf) program, masking out repeats of period 12 or less. The repeats are just "soft" masked. Alignments are allowed to extend through repeats, but not initiate in them.