d4ecfb99f3d657c23434e078924e18010ebfd5da ccpowell Thu Jul 18 15:48:58 2019 -0700 Remvoing refSeqCompositeHuman files because refSeqComposite is globally inherited, refs #23818 diff --git src/hg/makeDb/trackDb/human/refSeqCompositeHuman.html src/hg/makeDb/trackDb/human/refSeqCompositeHuman.html deleted file mode 100644 index 4a56d12..0000000 --- src/hg/makeDb/trackDb/human/refSeqCompositeHuman.html +++ /dev/null @@ -1,278 +0,0 @@ -

Description

-

-The NCBI RefSeq Genes composite track shows $organism protein-coding and non-protein-coding -genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use -coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by -realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences -between the annotation coordinates provided by UCSC and NCBI. See the -Methods section for more details about how the different tracks were -created.

-

-Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, -submit additions and corrections, or ask for help concerning RefSeq records.

- -

-For more information on the different gene tracks, see our Genes FAQ.

- -

Display Conventions and Configuration

-

-This track is a multi-view composite track that contains differing data set views. -Instructions for configuring multi-view tracks are -here. -To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to -hide.

- -The views available for this track include: -
-
RefSeq annotations and alignments
- -
- -
-
UCSC annotations
- -
- -

-The RefSeq All, RefSeq Curated, RefSeq Predicted, RefSeq Clinical -and UCSC RefSeq tracks follow the display conventions for -gene prediction tracks. -The color shading indicates the level of review the RefSeq record has undergone: -predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq.

- -

- - - - - - - - - - - - - - - - - - - -
ColorLevel of review
Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.
Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff.
Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.
-

- -The RefSeq Alignments track follows the display conventions for -PSL tracks.

-

-The item labels and codon display properties for features within this track can be configured -through the controls at the top of the track description page. Click the view name -(NCBI RefSeq or UCSC RefSeq) to globally modify the settings for all subtracks in -the view. To adjust the settings for an individual subtrack, click the wrench icon next to the -track name in the subtrack list (available only for views containing more than one track).

- - -

The RefSeq Diffs track contains five different types of inconsistency between the -reference genome sequence and the RefSeq transcript sequences. The five types of differences are -as follows: -

- -HGVS Terminology (Human Genome Variation Society): - -g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence. -

- -

-When reporting HGVS with RefSeq sequences, to make sure that results from -research articles can be mapped to the genome unambigously, -please specify the RefSeq annotation release displayed on the transcript's -Genome Browser details page and also the RefSeq transcript ID with version -(e.g. NM_012309.4 not NM_012309). -

- - - -

Methods

-

-Tracks contained in the RefSeq annotation and RefSeq RNA alignment views were created at UCSC using -data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and -converted to the genePred and PSL table formats for display in the Genome Browser. Information about -the NCBI annotation pipeline can be found -here.

- -

The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments.

-

-The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. -RefSeq RNAs were aligned against the $organism genome using BLAT. Those with an alignment of -less than 15% were discarded. When a single RNA aligned in multiple places, the alignment -having the highest base identity was identified. Only alignments having a base identity -level within 0.1% of the best and at least 96% base identity with the genomic sequence were -kept.

- -

Data Access

-

-The raw data for these tracks can be accessed in multiple ways. It can be explored interactively -using the Table Browser or -Data Integrator. The tables can also be accessed programmatically through our -public MySQL server or downloaded from our -downloads server for local processing.

-

-The data in the RefSeq Other and RefSeq Diffs tracks are organized in -bigBed file format; more -information about accessing the information in this bigBed file can be found -below. The other subtracks are associated with database tables as follows:

-
-
genePred format:
- -
PSL format:
- -
-

-The first column of each of these tables is "bin". This column is designed -to speed up access for display in the Genome Browser, but can be safely ignored in downstream -analysis. You can read more about the bin indexing system -here.

-

-The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed -files, which can be obtained from our downloads server here, -ncbiRefSeqOther.bb and -ncbiRefSeqDiffs.bb. -Individual regions or the whole set of genome-wide annotations can be obtained using our tool -bigBedToBed which can be compiled from the source code or downloaded as a precompiled -binary for your system from the utilities directory linked below. For example, to extract only -annotations in a given region, you could use the following command:

-

-bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/$db/ncbiRefSeq/ncbiRefSeqOther.bb --chrom=chr16 -start=34990190 -end=36727467 stdout

-

-The genePred format tracks can also be downloaded in GTF format using the -genePredToGtf utility, available from the -utilities directory on the UCSC downloads -server. The utility can be run from the command line like so:

-genePredToGtf $db ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf -

-Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore -must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access -section.

-

-A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, -and RefSeq Predicted tracks can be found on our downloads server -here.

-

-Please refer to our mailing list archives for questions.

- -

Credits

-

-This track was produced at UCSC from data generated by scientists worldwide and curated by the -NCBI RefSeq project.

- -

References

-

-Kent WJ. -BLAT - the BLAST-like -alignment tool. Genome Res. 2002 Apr;12(4):656-64. -PMID: 11932250; PMC: PMC187518

-

-Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, -Landrum MJ, McGarvey KM et al. -RefSeq: an update on mammalian reference sequences. -Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. -PMID: 24259432; PMC: -PMC3965018

-

-Pruitt KD, Tatusova T, Maglott DR. - -NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts -and proteins. -Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. -PMID: 15608248; PMC: PMC539979