198c9b8daecc44fbda6a6494c566c723920f030a lrnassar Wed Mar 11 18:25:21 2026 -0700 Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM. diff --git src/hg/htdocs/FAQ/FAQformat.html src/hg/htdocs/FAQ/FAQformat.html index 5197a05cb95..58a0bf6e26e 100755 --- src/hg/htdocs/FAQ/FAQformat.html +++ src/hg/htdocs/FAQ/FAQformat.html @@ -162,31 +162,31 @@ 834-944 ≥ 945

strand - Defines the strand. Either "." (=no strand) or "+" or "-".

thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays). When there is no thick part, thickStart and thickEnd are usually set to the chromStart position.

thickEnd - The ending position at which the feature is drawn thickly (for example the stop codon in gene displays).

itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). If the track line - itemRgb attribute is set to "On", this RBG value will determine the display + itemRgb attribute is set to "On", this RGB value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browser.

blockCount - The number of blocks (exons) in the BED line.

blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.

blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.

In BED files with block definitions, the first blockStart value must be 0, so that the first @@ -436,55 +436,55 @@ Note that there is also a GFF3 specification that is not currently supported by the Browser. All GFF tracks must be formatted according to Sanger's GFF2 specification.

If you would like to obtain browser data in GFF (GTF) format, please refer to Genes in gtf or gff format on the Wiki.

Here is a brief description of the GFF fields:

seqname - The name of the sequence. Must be a chromosome or scaffold.
source - The program that generated this feature.
feature - The name of this type of feature. Some examples of standard feature - types are "CDS" "start_codon" "stop_codon" and "exon"li> + types are "CDS" "start_codon" "stop_codon" and "exon".
start - The starting position of the feature in the sequence. The first base is numbered 1.
end - The ending position of the feature (inclusive).
score - A score between 0 and 1000. If the track line useScore attribute is set to 1 for this annotation data set, the score value will determine the level of gray in which this feature is displayed (higher numbers = darker gray). If there is no score - value, enter ".". + value, enter ".".
strand - Valid entries include "+", "-", or "." (for don't know/don't care).
frame - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be ".".
group - All lines with the same group are linked together into a single item.

Example:
-Here's an example of a GFF-based track. This data format require tabs and some operating systems convert tabs to spaces. If pasting doesn't work, this example's contents or the url itself can be pasted into the custom track text box.

browser position chr22:10000000-10025000
 browser hide all
 track name=regulatory description="TeleGene(tm) Regulatory Regions" visibility=2
 chr22	TeleGene	enhancer	10000000	10001000	500	+	.	touch1
 chr22	TeleGene	promoter	10010000	10010100	900	+	.	touch1
 chr22	TeleGene	promoter	10020000	10025000	800	-	.	touch2

Click here to display this track in the Genome Browser.

GTF format

@@ -529,41 +529,41 @@ Also, review the enhanced interact format for information on how to visualize pairwise interactions as arcs in the browser.

MAF format

The multiple alignment format stores a series of multiple alignments in a format that is easy to parse and relatively easy to read. This format stores multiple alignments at the DNA level between entire genomes. Previously used formats are suitable for multiple alignments of single proteins or regions of DNA without rearrangements, but would require considerable extension to cope with genomic issues such as forward and reverse strand directions, multiple pieces to the alignment, and so forth.

General Structure
-The .maf format is line-oriented. Each multiple alignment beigns with the reference genome +The .maf format is line-oriented. Each multiple alignment begins with the reference genome line and ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.

The file is divided into paragraphs that terminate in a blank line. Within a paragraph, the first word of a line indicates its type. Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple -alignment. The first sequence must be the reference genome on which the rest of the sequenes map. +alignment. The first sequence must be the reference genome on which the rest of the sequences map. Some MAF files may contain other optional line types:

an "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line
an "e" line containing information about the size of the gap between the alignments that span the current block
a "q" line indicating the quality of each aligned base for the species

Parsers may ignore any other types of paragraphs and other types of lines within an alignment paragraph.

@@ -646,31 +646,31 @@

Lines starting with "s" -- a sequence within an alignment block

 s hg16.chr7    27707221 13 + 158545518 gcagctgaaaaca
  s panTro1.chr6 28869787 13 + 161576975 gcagctgaaaaca
  s baboon         249182 13 +   4622798 gcagctgaaaaca
  s mm4.chr6     53310102 13 + 151104725 ACAGCTGAAAATA

The "s" lines together with the "a" lines define a multiple alignment. The first "s" line must be the reference genome, hg16 in the above example. The "s" lines have the following fields which are defined by position.

src -- The name of one of the source sequences for the alignment. For sequences that are resident in a browser assembly, the form 'database.chromosome' allows automatic creation - of links to other assemblies. Non-browser sequences are typically reference by the species name + of links to other assemblies. Non-browser sequences are typically referenced by the species name alone.
start -- The start of the aligning region in the source sequence. This is a zero-based number. If the strand field is "-" then this is the start relative to the reverse-complemented source sequence (see Coordinate Transforms).
size -- The size of the aligning region in the source sequence. This number is equal to the number of non-dash characters in the alignment text field below.
strand -- Either "+" or "-". If "-", then the alignment is to the reverse-complemented source.
srcSize -- The size of the entire source sequence, not just the parts involved in @@ -821,31 +821,31 @@ 0 98 Manually assigned F 99 Finished

A Simple Example

-Here is a simple example of a three alignment blocks derived from five starting sequences. The +Here is a simple example of three alignment blocks derived from five starting sequences. The first track line is necessary for custom tracks, but should be removed otherwise. Repeats are shown as lowercase, and each block may have a subset of the input sequences. All sequence columns and rows must contain at least one nucleotide (no columns or rows that contain only insertions).

track name=euArc visibility=pack
 ##maf version=1 scoring=tba.v8 
 # tba.v8 (((human chimp) baboon) (mouse rat)) 
                    
 a score=23262.0     
 s hg18.chr7    27578828 38 + 158545518 AAA-GGGAATGTTAACCAAATGA---ATTGTCTCTTACGGTG
 s panTro1.chr6 28741140 38 + 161576975 AAA-GGGAATGTTAACCAAATGA---ATTGTCTCTTACGGTG
 s baboon         116834 38 +   4622798 AAA-GGGAATGTTAACCAAATGA---GTTGTCTCTTATGGTG
 s mm4.chr6     53215344 38 + 151104725 -AATGGGAATGTTAAGCAAACGA---ATTGTCTCTCAGTGTG
 s rn3.chr4     81344243 40 + 187371129 -AA-GGGGATGCTAAGCCAATGAGTTGTTGTCTCTCAATGTG
                    
 a score=5062.0