2d342c85706fa108decc142314c97c0a0564280b angie Thu Mar 23 20:10:22 2023 -0700 Fixing a few typos I spotted in code review. refs #30826 diff --git src/hg/makeDb/trackDb/human/hg38/problematic.html src/hg/makeDb/trackDb/human/hg38/problematic.html index 1e5f015..d115f39 100644 --- src/hg/makeDb/trackDb/human/hg38/problematic.html +++ src/hg/makeDb/trackDb/human/hg38/problematic.html @@ -1,95 +1,95 @@ <h2>Description</h2> <p> This container track helps call out sections of the genome that often cause problems or confusion when working with the genome. There are two subtracks for now, Anshul Kundaje's <a href="https://github.com/Boyle-Lab/Blacklist/blob/master/lists/hg19-blacklist-README.pdf" target=_blank>ENCODE Blacklist</a> and the UCSC Unusual Regions track. <p>The hg19 genome has a track with the same name, but with many more subtracks, as the GeT-RM and Genome-in-a-Bottle artefact variants do not exist yet for hg38, to our knowledge. If you are missing a track here that you know from hg19 and have an idea how to add it hg38, do not hesitate to contact us.</p> <p> The <b>UCSC Unusual Regions</b> subtrack contains annotations collected at UCSC, -UCSC, put together from other tracks, our experiences and support email list +put together from other tracks, our experiences and support email list requests over the years.</p> For example, it contains the most well-known gene clusters (IGH, IGL, PAR1/2, TCRA, TCRB, etc) and annotations for the GRC <a href="../cgi-bin/hgTracks?db=hg19&chromInfoPage=">fixed sequences, alternate haplotypes, unplaced contigs, pseudo-autosomal regions, and mitochondria</a>. These loci can yield alignments with low-quality mapping scores and discordant read pairs, especially for short-read sequencing data. This data set was manually curated, based on the <a href="../cgi-bin/hgGateway">Genome Browser's assembly</a> description, the <a href="https://genome.ucsc.edu/FAQ/FAQdownloads.html">FAQs</a> about assembly, and the <a href="../cgi-bin/hgTrackUi?db=hg19&g=refSeqComposite">NCBI RefSeq "other" annotations</a> track data. </p> <p> The <b>ENCODE Blacklist</b> subtrack contains a comprehensive set of regions which are troublesome for high-throughput Next-Generation Sequencing (NGS) aligners. These regions tend to have a very high ratio of multi-mapping to unique mapping reads and high variance in mappability due to repetitive elements such as satellite, centromeric and telomeric repeats. </p> <h2>Display Conventions and Configuration</h2> <p> Each track contains a set of regions of varying length with no special configuration options. The <em>UCSC Unusual Regions</em> track has a mouse-over description, all other tracks have at most a name field, which can be shown in pack mode. The tracks are usually kept in dense mode. </p> <p> The <em>Hide empty subtracks</em> control hides subtracks with no data in the browser window. Changing the browser window by zooming or scrolling may result in the display of a different selection of tracks. </p> <H2>Data access</H2> <p> The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. <p> For automated download and analysis, the genome annotation is stored in bigBed files that can be downloaded from <a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/problematic/" target="_blank">our download server</a>. Individual regions or the whole genome annotation can be obtained using our tool <tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can also be used to obtain only features within a given range, e.g. <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/problematic/comments.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt></p> </p> <p> <h2>Methods</h2> <p> Files were downloaded from the respective databases and converted to bigBed format. The procedure is documented in our <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/problematic.txt" target="_blank">hg38 makeDoc file</a>. </p> <h2>Credits</h2> <p> Thanks to Anna Benet-Pages, Max Haeussler, Angie Hinrichs, and Daniel Schmelter at the UCSC Genome Browser for planning, building, and testing these tracks. The underlying data comes from the <a href="https://github.com/Boyle-Lab/Blacklist/blob/master/lists/hg19-blacklist-README.pdf" target=_blank>ENCODE Blacklist</a> and some parts were copied manually from the HGNC and NCBI RefSeq tracks. </p> <h2>References</h2> <p> Amemiya HM, Kundaje A, Boyle AP. <a href="https://www.nature.com/articles/s41598-019-45839-z" target="_blank"> The ENCODE Blacklist: Identification of Problematic Regions of the Genome</a>. <em>Sci Rep</em>. 2019 Jun 27;9(1):9354. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31249361" target="_blank">31249361</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6597582/" target="_blank">PMC6597582</a> </p>