2f106a9cd51707e6772b96064b2fcfc30bca95b0 ccpowell Thu Jul 18 14:54:51 2019 -0700 Switching YP with NP mention in all organism execept human, refs #23818 diff --git src/hg/makeDb/trackDb/human/refSeqCompositeHuman.html src/hg/makeDb/trackDb/human/refSeqCompositeHuman.html new file mode 100644 index 0000000..4a56d12 --- /dev/null +++ src/hg/makeDb/trackDb/human/refSeqCompositeHuman.html @@ -0,0 +1,278 @@ +

Description

+

+The NCBI RefSeq Genes composite track shows $organism protein-coding and non-protein-coding +genes taken from the NCBI RNA reference sequences collection (RefSeq). All subtracks use +coordinates provided by RefSeq, except for the UCSC RefSeq track, which UCSC produces by +realigning the RefSeq RNAs to the genome. This realignment may result in occasional differences +between the annotation coordinates provided by UCSC and NCBI. See the +Methods section for more details about how the different tracks were +created.

+

+Please visit NCBI's Feedback for Gene and Reference Sequences (RefSeq) page to make suggestions, +submit additions and corrections, or ask for help concerning RefSeq records.

+ +

+For more information on the different gene tracks, see our Genes FAQ.

+ +

Display Conventions and Configuration

+

+This track is a multi-view composite track that contains differing data set views. +Instructions for configuring multi-view tracks are +here. +To show only a selected set of subtracks, uncheck the boxes next to the tracks that you wish to +hide.

+ +The views available for this track include: +
+
RefSeq annotations and alignments
+ +
+ +
+
UCSC annotations
+ +
+ +

+The RefSeq All, RefSeq Curated, RefSeq Predicted, RefSeq Clinical +and UCSC RefSeq tracks follow the display conventions for +gene prediction tracks. +The color shading indicates the level of review the RefSeq record has undergone: +predicted (light), provisional (medium), or reviewed (dark), as defined by RefSeq.

+ +

+ + + + + + + + + + + + + + + + + + + +
ColorLevel of review
Reviewed: the RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.
Provisional: the RefSeq record has not yet been subject to individual review. The initial sequence-to-gene association has been established by outside collaborators or NCBI staff.
Predicted: the RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.
+

+ +The RefSeq Alignments track follows the display conventions for +PSL tracks.

+

+The item labels and codon display properties for features within this track can be configured +through the controls at the top of the track description page. Click the view name +(NCBI RefSeq or UCSC RefSeq) to globally modify the settings for all subtracks in +the view. To adjust the settings for an individual subtrack, click the wrench icon next to the +track name in the subtrack list (available only for views containing more than one track).

+ + +

The RefSeq Diffs track contains five different types of inconsistency between the +reference genome sequence and the RefSeq transcript sequences. The five types of differences are +as follows: +

+ +HGVS Terminology (Human Genome Variation Society): + +g. = genomic sequence ; c. = coding DNA sequence ; n. = non-coding RNA reference sequence. +

+ +

+When reporting HGVS with RefSeq sequences, to make sure that results from +research articles can be mapped to the genome unambigously, +please specify the RefSeq annotation release displayed on the transcript's +Genome Browser details page and also the RefSeq transcript ID with version +(e.g. NM_012309.4 not NM_012309). +

+ + + +

Methods

+

+Tracks contained in the RefSeq annotation and RefSeq RNA alignment views were created at UCSC using +data from the NCBI RefSeq project. Data files were downloaded from RefSeq in GFF file format and +converted to the genePred and PSL table formats for display in the Genome Browser. Information about +the NCBI annotation pipeline can be found +here.

+ +

The RefSeq Diffs track is generated by UCSC using NCBI's RefSeq RNA alignments.

+

+The UCSC RefSeq Genes track is constructed using the same methods as previous RefSeq Genes tracks. +RefSeq RNAs were aligned against the $organism genome using BLAT. Those with an alignment of +less than 15% were discarded. When a single RNA aligned in multiple places, the alignment +having the highest base identity was identified. Only alignments having a base identity +level within 0.1% of the best and at least 96% base identity with the genomic sequence were +kept.

+ +

Data Access

+

+The raw data for these tracks can be accessed in multiple ways. It can be explored interactively +using the Table Browser or +Data Integrator. The tables can also be accessed programmatically through our +public MySQL server or downloaded from our +downloads server for local processing.

+

+The data in the RefSeq Other and RefSeq Diffs tracks are organized in +bigBed file format; more +information about accessing the information in this bigBed file can be found +below. The other subtracks are associated with database tables as follows:

+
+
genePred format:
+ +
PSL format:
+ +
+

+The first column of each of these tables is "bin". This column is designed +to speed up access for display in the Genome Browser, but can be safely ignored in downstream +analysis. You can read more about the bin indexing system +here.

+

+The annotations in the RefSeqOther and RefSeqDiffs tracks are stored in bigBed +files, which can be obtained from our downloads server here, +ncbiRefSeqOther.bb and +ncbiRefSeqDiffs.bb. +Individual regions or the whole set of genome-wide annotations can be obtained using our tool +bigBedToBed which can be compiled from the source code or downloaded as a precompiled +binary for your system from the utilities directory linked below. For example, to extract only +annotations in a given region, you could use the following command:

+

+bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/$db/ncbiRefSeq/ncbiRefSeqOther.bb +-chrom=chr16 -start=34990190 -end=36727467 stdout

+

+The genePred format tracks can also be downloaded in GTF format using the +genePredToGtf utility, available from the +utilities directory on the UCSC downloads +server. The utility can be run from the command line like so:

+genePredToGtf $db ncbiRefSeqPredicted ncbiRefSeqPredicted.gtf +

+Note that using genePredToGtf in this manner accesses our public MySQL server, and you therefore +must set up your hg.conf as described on the MySQL page linked near the beginning of the Data Access +section.

+

+A file containing the RNA sequences in FASTA format for all items in the RefSeq All, RefSeq Curated, +and RefSeq Predicted tracks can be found on our downloads server +here.

+

+Please refer to our mailing list archives for questions.

+ +

Credits

+

+This track was produced at UCSC from data generated by scientists worldwide and curated by the +NCBI RefSeq project.

+ +

References

+

+Kent WJ. +BLAT - the BLAST-like +alignment tool. Genome Res. 2002 Apr;12(4):656-64. +PMID: 11932250; PMC: PMC187518

+

+Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, +Landrum MJ, McGarvey KM et al. +RefSeq: an update on mammalian reference sequences. +Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. +PMID: 24259432; PMC: +PMC3965018

+

+Pruitt KD, Tatusova T, Maglott DR. + +NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts +and proteins. +Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. +PMID: 15608248; PMC: PMC539979