2d342c85706fa108decc142314c97c0a0564280b angie Thu Mar 23 20:10:22 2023 -0700 Fixing a few typos I spotted in code review. refs #30826 diff --git src/hg/makeDb/trackDb/human/hg38/problematic.html src/hg/makeDb/trackDb/human/hg38/problematic.html index 1e5f015..d115f39 100644 --- src/hg/makeDb/trackDb/human/hg38/problematic.html +++ src/hg/makeDb/trackDb/human/hg38/problematic.html @@ -1,95 +1,95 @@

Description

This container track helps call out sections of the genome that often cause problems or confusion when working with the genome. There are two subtracks for now, Anshul Kundaje's ENCODE Blacklist and the UCSC Unusual Regions track.

The hg19 genome has a track with the same name, but with many more subtracks, as the GeT-RM and Genome-in-a-Bottle artefact variants do not exist yet for hg38, to our knowledge. If you are missing a track here that you know from hg19 and have an idea how to add it hg38, do not hesitate to contact us.

The UCSC Unusual Regions subtrack contains annotations collected at UCSC, -UCSC, put together from other tracks, our experiences and support email list +put together from other tracks, our experiences and support email list requests over the years.

For example, it contains the most well-known gene clusters (IGH, IGL, PAR1/2, TCRA, TCRB, etc) and annotations for the GRC fixed sequences, alternate haplotypes, unplaced contigs, pseudo-autosomal regions, and mitochondria. These loci can yield alignments with low-quality mapping scores and discordant read pairs, especially for short-read sequencing data. This data set was manually curated, based on the Genome Browser's assembly description, the FAQs about assembly, and the NCBI RefSeq "other" annotations track data.

The ENCODE Blacklist subtrack contains a comprehensive set of regions which are troublesome for high-throughput Next-Generation Sequencing (NGS) aligners. These regions tend to have a very high ratio of multi-mapping to unique mapping reads and high variance in mappability due to repetitive elements such as satellite, centromeric and telomeric repeats.

Display Conventions and Configuration

Each track contains a set of regions of varying length with no special configuration options. The UCSC Unusual Regions track has a mouse-over description, all other tracks have at most a name field, which can be shown in pack mode. The tracks are usually kept in dense mode.

The Hide empty subtracks control hides subtracks with no data in the browser window. Changing the browser window by zooming or scrolling may result in the display of a different selection of tracks.

Data access

The raw data can be explored interactively with the Table Browser or the Data Integrator.

For automated download and analysis, the genome annotation is stored in bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/problematic/comments.bb -chrom=chr21 -start=0 -end=100000000 stdout

Methods

Files were downloaded from the respective databases and converted to bigBed format. The procedure is documented in our hg38 makeDoc file.

Credits

Thanks to Anna Benet-Pages, Max Haeussler, Angie Hinrichs, and Daniel Schmelter at the UCSC Genome Browser for planning, building, and testing these tracks. The underlying data comes from the ENCODE Blacklist and some parts were copied manually from the HGNC and NCBI RefSeq tracks.

References

Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019 Jun 27;9(1):9354. PMID: 31249361; PMC: PMC6597582