fff086f0ce7a549b800ec5ce3409a48287bd6732 dschmelt Wed Apr 3 12:14:13 2019 -0700 Updating the MANE track html description #23113 diff --git src/hg/htdocs/goldenPath/help/bigGenePred.html src/hg/htdocs/goldenPath/help/bigGenePred.html index d564189..2a76c29 100755 --- src/hg/htdocs/goldenPath/help/bigGenePred.html +++ src/hg/htdocs/goldenPath/help/bigGenePred.html @@ -1,36 +1,36 @@

bigGenePred Track Format

-The bigGenePred format stores positional annotations items for collections of exons, much as -BED files indexed as bigBeds do. However, the -bigGenePred format includes 8 additional fields that contain details about coding frames, -annotation status, and other gene-specific information. This is most commonly used in the Browser -to display individual codons, highlighting start, stop, and amino acid translations.

-bigGenePred files that have not been compressed can be described as bed12+8 files. They can be -created using the program bedToBigBed, run with the -as option to pull -in a special autoSql -(.as) file that defines the extra fields of the bigGenePred.

-Much like bigBed, bigGenePred files are in an indexed binary format. The main advantage of binary +The bigGenePred format stores positional annotation items for collections of exons in a compressed +format, similar to how BED files can be compressed +into bigBeds. The bigGenePred format includes 8 additional fields that contain details about coding +frames,annotation status, and other gene-specific information. This is commonly used in the Browser +to display start codons, stop codons, and amino acid translations.

+Before compression, bigGenePred files can be described as bed12+8 files. bigGenePred +files can be created using the program bedToBigBed, run with the -as +option to pull in a special +autoSql (.as) file that defines the extra fields of the bigGenePred.

+Much like bigBed, bigGenePred files are in an indexed binary format. The advantage of using a binary format is that only the portions of the file needed to display a particular region are read by the Genome Browser server. Because of this, indexed binary files have much faster display performance than regular BED format files when working with large data sets. The bigGenePred file remains on the user's web-accessible server (http, https or ftp) and only the portion needed to display the current genome position is cached as a "sparse file". If you want more information on finding a a web-accessible server or need hosting space for your bigGenePred files, please see the Hosting section of the Track Hub Help documentation.

bigGenePred file definition

The following autoSql definition specifies bigGenePred gene prediction files. This definition, contained in the file bigGenePred.as, is pulled in when the bedToBigBed utility is run with the @@ -62,33 +62,34 @@

The following bed12+8 is an example of a bigGenePred input file .

Creating a bigGenePred track

To create a bigGenePred track, follow these steps:

Step 1. Format your bigGenePred file. The first 12 fields of the bigGenePred bed12+8 format are described by the basic BED file format. You can also read the genePred format. Your bigGenePred file must also contain the 8 extra fields described in the autoSql file definition shown above: name2, cdsStartStat, cdsEndStat, exonFrames, type, geneName, geneName2, -geneType. Your bigGenePred file must be sorted first on the chrom field, and -secondarily on the chromStart field. You can use the UNIX sort command to -do this:

+geneType. For reference, you can use this example bed12+8 input file, +bigGenePred.txt. Your bigGenePred file must be sorted +first on the chrom field, and secondarily on the chromStart field. You +can use the UNIX sort command to do this:

sort -k1,1 -k2,2n unsorted.bed > input.bed

Step 2. Download the bedToBigBed program from the binary utilities directory.

Step 3. Download the chrom.sizes file for your genome assembly using the fetchChromSizes script from the same directory. Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.

@@ -196,30 +197,66 @@ computer.

Run the bedToBigBed utility to create the bigGenePred output file (step 4, above):

bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as bigGenePred.txt hg38.chrom.sizes bigGenePred.bb

Place the newly created bigGenePred file (bigGenePred.bb) on a web-accessible server (Step 5, above).

Construct a track line that points to the bigGenePred file (Step 6, above).

Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser (step 7, above).

Example #4

In this example, you will convert a genePred file to bigGenePred using command line utilities. +You can download utilities from the +utilities directory.

+ Obtain a genePred extended file. Here we are downloading the Comprehensive Gencode V28 gene data. +
```
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/wgEncodeGencodeCompV28.txt.gz
```
+ Uncompress the file. +
```
gunzip wgEncodeGencodeCompV28.txt.gz 
```
+Isolate columns 2 till the end, removing the bin column, and saving as wgCompV28Cut.txt. +
```
cut -f 2- wgEncodeGencodeCompV28.txt > wgCompV28Cut.txt 
```
+ Convert the genePred extended file to a bigGenePred text file, reordering and adding columns. +
```
genePredToBigGenePred wgCompV28Cut.txt wgEncodeGencodeCompV28BigGP.txt
```

+ Obtain input files for the binary conversion. +

fetchChromSizes hg38 > hg38.chrom.sizes
+wget https://hgwdev.gi.ucsc.edu/goldenPath/help/examples/bigGenePred.as

+ Convert your text bigGenePred to a binary indexed format. +

bedToBigBed -type=bed12+8 -tab -as=bigGenePred.as wgEncodeGencodeCompV28BigGP.txt hg38.chrom.sizes wgEncodeGencodeCompV28.bgp

+ Put your binary indexed file in a web-accessible location. See the hosting section for more information.

+ View your dataset in the Browser by entering your data URL in the bigDataUrl field of the URL. +

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=track%20type=bigGenePred%20bigDataUrl=https://hgwdev.gi.ucsc.edu/~dschmelt/wgEncodeGencodeCompV28.bgp

+You can also add your data in the custom track management +page. This allows you to set position, configuration options, and write a more complete +desciption. If you want to see codons, you will have to right click to configure codon view or +set this option using the baseColorDefault=genomicCodons code as is done below. +

browser position chr10:67,884,600-67,884,900 
+track type=bigGenePred baseColorDefault=genomicCodons name="bigGenePred Example Four" description="BGP Made from genePred" visibility=pack bigDataUrl=https://hgwdev.gi.ucsc.edu/~dschmelt/wgEncodeGencodeCompV28.bgp

Sharing your data with others

If you would like to share your bigGenePred data track with a colleague, learn how to create a URL link to your data by looking at Example #6.

Extracting data from bigBed format

Because the bigGenePred files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory.

bigBedToBed — converts a bigBed file to ASCII BED format.