6e38471f9e03b95d1a26a10a4e71616b7f2e6837
markd
  Mon Apr 11 17:31:15 2022 -0700
renamed ulgy chr13 directory to keep Hiram sane

diff --git src/hg/makeDb/doc/chm13v2.0userData/build.txt src/hg/makeDb/doc/chm13v2.0userData/build.txt
new file mode 100644
index 0000000..349b2da
--- /dev/null
+++ src/hg/makeDb/doc/chm13v2.0userData/build.txt
@@ -0,0 +1,326 @@
+================================================================
+Notes:
+
+  dataDir = /hive/data/genomes/asmHubs/genbankBuild/GCA/009/914/755/GCA_009914755.4_T2T-CHM13v2.0
+  stagingDir = /hive/data/genomes/asmHubs/GCA/009/914/755/GCA_009914755.4
+
+T2T CHM13 track spreadsheet
+
+   https://docs.google.com/spreadsheets/d/13BXuEFB904aje6zWXyZ0znZnXvQiu1qxKADA2uV2JU4/edit#gid=1966247802
+
+staging URL
+   https://hgdownload-test.gi.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/hub.txt
+   https://hgwdev.gi.ucsc.edu/cgi-bin/hgTracks?hubUrl=https://hgdownload-test.gi.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/hub.txt&genome=hub_24696_GCA_009914755.4
+
+   https://hgwdev.gi.ucsc.edu/h/GCA_009914755.4 [doesn't work]
+
+Public link:
+ https://genome.ucsc.edu/h/GCA_009914755.4
+ https://hgdownload-test.gi.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/hub.txt
+
+
+
+# tmp
+   kentDir=${HOME}/compbio/t2t/projs/chm13-v2.0/kent
+
+================================================================
+ucscChromNames (2022-02-22 markd)
+----------------------------------------------------------------
+Chromosome sizes files with UCSC names for building other tracks,
+not a track in the browser
+    t2t-chm13-v2.0.2bit
+    t2t-chm13-v2.0.fa.gz
+    t2t-chm13-v2.0.sizes
+    t2t-chm13-v2.0.sizes3
+
+================================================================
+proseq (2022-02-21 markd)
+----------------------------------------------------------------
+Supplied by Savannah Hoyt <savannah.klein@uconn.edu>
+from T2T Globus /team-epigenetics/PROseq-RNAseq_chm13v1.1/MappedToCHM13v1.1/PROseq_Bowtie2/
+trackData/proseq
+
+renaming files to something not as long
+    CHM13-AB_proseq_cutadapt-q20-m20_bt2-vs-dM_bt2-chm13v1.1_neg.bigwig                              -> PROseq_default_neg.bw
+    CHM13-AB_proseq_cutadapt-q20-m20_bt2-vs-dM_bt2-chm13v1.1_pos.bigwig                              -> PROseq_default_pos.bw
+    CHM13-AB_proseq_cutadapt-q20-m20_bt2-vs-dM_bt2-k100-chm13v1.1_meryl-21mer-chm13v1.1_neg.bigwig   -> PROseq_k100_21mer_neg.bw
+    CHM13-AB_proseq_cutadapt-q20-m20_bt2-vs-dM_bt2-k100-chm13v1.1_meryl-21mer-chm13v1.1_pos.bigwig   -> PROseq_k100_21mer_pos.bw
+    CHM13-AB_proseq_cutadapt-q20-m20_bt2-vs-dM_bt2-k100-chm13v1.1_neg.bigwig                         -> PROseq_k100_neg.bw
+    CHM13-AB_proseq_cutadapt-q20-m20_bt2-vs-dM_bt2-k100-chm13v1.1_pos.bigwig                         -> PROseq_k100_pos.bw
+    PROseq_k100_AB.markersandlength_meryl-21mer-chm13v1.1_neg.bigwig                                 -> PROseq_k100_dual_21mer_neg.bw
+    PROseq_k100_AB.markersandlength_meryl-21mer-chm13v1.1_pos.bigwig                                 -> PROseq_k100_dual_21mer_pos.bw
+
+================================================================
+rnaseq (2022-03-02 markd)
+----------------------------------------------------------------
+Supplied by Savannah Hoyt <savannah.klein@uconn.edu>
+/team-epigenetics/PROseq-RNAseq_chm13v1.1/MappedToCHM13v1.1/RNAseq_Bowtie2/
+
+renaming files to something not as long
+    CHM13_S182-183_rnaseq_cutadapt-q20-m100_bt2-chm13v1.1_F1548.bigwig                               -> RNAseq_default.bw
+    CHM13_S182-183_rnaseq_cutadapt-q20-m100_bt2-k100-chm13v1.1_F1548.bigwig                          -> RNAseq_k100.bw
+    CHM13_S182-S183_rnaseq_cutadapt-q20-m100_bt2-k100-chm13v1.1-F1548_meryl-21mer-chm13v1.1.bigwig   -> RNAseq_k100_21mer.bw
+    RNAseq_k100_AB.markersandlength_meryl-21mer-chm13v1.1.bigwig                                     -> RNAseq_k100_dual_21mer.bw
+
+================================================================
+cytoBandsMapped (2022-02-22 markd)
+----------------------------------------------------------------
+cytoBand tracks from T2T project mapped from GRCh38
+Supplied by Nick Altemose <nickaltemose@gmail.com>
+Delivered via Slack
+trackData/cytoBandMapped
+
+bedToBigBed -type=bed4+1 -as=${HOME}/kent/src/hg/lib/cytoBand.as chm13v2.0_cytobands_allchrs.bed../chromAlias/ucsc.sizes.txt cytoBandMapped.bb
+
+================================================================
+sedefSegDups (2022-02-24 markd)
+----------------------------------------------------------------
+Supplied by Mitchell Robert Vollger <mvollger@uw.edu>
+team-segdups/Assembly_analysis/SEDEF/T2T-CHM13v2.SDs.bed
+
+    bedToBigBed -as=${kentDir}/src/hg/makeDb/doc/GCA_009914755.4_T2T-CHM13v2.0/schema/sedefSegDups.as -type=bed9+ T2T-CHM13v2.SDs.bed../chromAlias/ucsc.sizes.txt sedefSegDups.bb
+
+
+================================================================
+rdnaModel (2022-03-02 markd)
+----------------------------------------------------------------
+from Adam Phillippy
+https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13v1.1.rdna_model.bed
+
+    bedToBigBed -type=bed4 chm13v1.1.rdna_model.bed../chromAlias/ucsc.sizes.txt rdnaModel.bb
+
+================================================================
+catLiftOffGenesV1 (2022-03-15 markd)
+----------------------------------------------------------------
+from Marina Haukness <mhauknes@ucsc.edu>
+
+http://courtyard.gi.ucsc.edu/~mhauknes/T2T/t2t_Y/annotation_set/CHM13.v2.0.bb
+http://courtyard.gi.ucsc.edu/~mhauknes/T2T/t2t_Y/annotation_set/CHM13.v2.0.gff3
+
+rename to
+  catLiftOffGenesV1.bb
+  catLiftOffGenesV1.gff3.gz
+
+# create GTF
+  zcat catLiftOffGenesV1.gff3.gz | gffread /dev/stdin -T -o catLiftOffGenesV1.gtf
+  pigz catLiftOffGenesV1.gtf 
+
+================================================================
+* hgLiftOver (2022-03-26 markd)
+----------------------------------------------------------------
+GRCh38 & GRCh37 Nae-Chyun Chen <naechyun.chen@gmail.com>
+
+# 2022-04-09 it was noted that chrM was left out of above alignments, so obtain them and repeat
+
+globus: /team-liftover/v1_nflo/with_chrM/
+    chm13v2-grch38.chain
+    grch38-chm13v2.chain
+    chm13v2-hg19_chrM.chain
+    chm13v2-hg19_chrMT.chain
+    hg19_chrM-chm13v2.chain
+    hg19_chrMT-chm13v2.chain
+
+   cd trackData/hgLiftOver
+
+# rename to match UCSC conventions
+    mv chm13v2-grch38.chain chm13v2-hg38.over.chain      
+    mv grch38-chm13v2.chain hg38-chm13v2.over.chain
+    mv chm13v2-hg19_chrM.chain chm13v2-hg19_chrM.over.chain
+    mv chm13v2-hg19_chrMT.chain chm13v2-hg19_chrMT.over.chain
+    mv hg19_chrM-chm13v2.chain hg19_chrM-chm13v2.over.chain
+    mv hg19_chrMT-chm13v2.chain  hg19_chrMT-chm13v2.over.chain
+
+# create hg19 chains that combine chrM and chrMT for use in browser.
+   cp chm13v2-hg19_chrM.over.chain chm13v2-hg19.over.chain
+   chainFilter -q=chrMT chm13v2-hg19_chrMT.over.chain >>chm13v2-hg19.over.chain
+   cp hg19_chrM-chm13v2.over.chain hg19-chm13v2.over.chain
+   chainFilter -t=chrMT  hg19_chrMT-chm13v2.over.chain >>hg19-chm13v2.over.chain
+
+   pigz *.chain
+
+# build tracks
+    hgLoadChain -noBin -test none bigChain chm13v2-hg38.over.chain.gz 
+    sed 's/\.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > bigChainIn.tab
+    bedToBigBed -type=bed6+6 -as=${HOME}/kent/src/hg/lib/bigChain.as -tab bigChainIn.tab ../chromAlias/ucsc.sizes.txt chm13v2-hg38.over.chain.bb
+    tawk '{print $1, $2, $3, $5, $4}' link.tab | csort -k1,1 -k2,2n --parallel=64 > bigLinkIn.tab
+    bedToBigBed -type=bed4+1 -as=${HOME}/kent/src/hg/lib/bigLink.as -tab bigLinkIn.tab ../chromAlias/ucsc.sizes.txt chm13v2-hg38.over.link.bb
+
+    hgLoadChain -noBin -test none bigChain chm13v2-hg19.over.chain.gz 
+    sed 's/\.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > bigChainIn.tab
+    bedToBigBed -type=bed6+6 -as=${HOME}/kent/src/hg/lib/bigChain.as -tab bigChainIn.tab ../chromAlias/ucsc.sizes.txt chm13v2-hg19.over.chain.bb
+    tawk '{print $1, $2, $3, $5, $4}' link.tab | csort -k1,1 -k2,2n --parallel=64 > bigLinkIn.tab
+    bedToBigBed -type=bed4+1 -as=${HOME}/kent/src/hg/lib/bigLink.as -tab bigLinkIn.tab ../chromAlias/ucsc.sizes.txt chm13v2-hg19.over.link.bb
+
+    rm *.tab
+
+   pigz *.chain
+   # make available is liftOver directory as we
+   ln -f *.chain.gz ../../liftOver/
+
+# GRCh38 mask used in liftover. This is based on:
+#  https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GCA_000001405.15_GRCh38_GRC_exclusions_T2Tv2.bed
+#  plus UCSC hg38 centromeres track
+
+   GRCh38: /team-liftover/grch38_masked_fasta/grch38-centromere_and_falsedup.bed (edited)
+   rename to hg38.liftover-mask.bed
+   ln -f hg38.liftover-mask.bed ../../liftOver/
+
+
+================================================================
+* hgCactus (2022-03-28 markd)
+----------------------------------------------------------------
+# HAL from Marina Haukness <mhauknes@ucsc.edu>
+
+   http://courtyard.gi.ucsc.edu/~mhauknes/T2T/t2t_Y/t2tChm13.v2.0.hal
+
+# rename genomes to match browser, in renameFile.tab put
+GRCh38	hg38
+CHM13	GCA_009914755.4
+
+    halRenameGenomes t2tChm13.v2.0.hal renameFile.tab 
+
+# NOTE: disabled due to Snakes not using chromAlias
+
+================================================================
+* hgUnique (2022-03-30 markd)
+----------------------------------------------------------------
+regions not in hg38:
+globus: /team-liftover/v1_nflo/T2T-CHM13v2.0_new_and_non_syntenic_regions.bed
+         chm13v2-unique_to_hg19.bed
+         chm13v2-unique_to_hg38.bed
+
+#
+chainToPslBasic ../hgLiftOver/chm13v2-hg38.over.chain.gz stdout \
+  | pslToBed stdin stdout \
+  | bedtools sort -i - -g ../ucscChromNames/t2t-chm13-v2.0.sizes \
+  | bedtools merge \
+  | bedtools complement -i - -g ../ucscChromNames/t2t-chm13-v2.0.sizes \
+  | bedtools merge \
+  | sort -k1,1 -k2,2n \
+  > chm13v2-unique_to_hg38.bed
+
+chainToPslBasic ../hgLiftOver/chm13v2-hg19.over.chain.gz stdout \
+  | pslToBed stdin stdout \
+  | bedtools sort -i - -g ../ucscChromNames/t2t-chm13-v2.0.sizes \
+  | bedtools merge \
+  | bedtools complement -i - -g ../ucscChromNames/t2t-chm13-v2.0.sizes \
+  | bedtools merge \
+  | sort -k1,1 -k2,2n \
+  > chm13v2-unique_to_hg19.bed
+
+
+bedToBigBed -type=bed3 -tab chm13v2-unique_to_hg38.bed ../chromAlias/ucsc.sizes.txt hgUnique.hg38.bb
+bedToBigBed -type=bed3 -tab chm13v2-unique_to_hg19.bed ../chromAlias/ucsc.sizes.txt hgUnique.hg19.bb
+
+
+================================================================
+* censat (2022-03-29 markd)
+----------------------------------------------------------------
+from Nick Altemose <nickaltemose@gmail.com> via Slack:
+   t2t_censat_CHM13v2.0_trackv2.0.10col.bed
+   t2t_censat_CHM13v2.0_trackv2.0_description.html
+
+   cd censat/
+
+   # drop track header
+   tawk 'NR>1' t2t_censat_CHM13v2.0_trackv2.0.10col.bed | csort -k1,1 -k2,2n  >tmp.bed
+   bedToBigBed -type=bed9+1 -as=${HOME}/compbio/t2t/projs/chm13-v2.0/makeDir/schema/cenSat.as -tab tmp.bed ../chromAlias/ucsc.sizes.txt censat.bb
+
+================================================================
+* dbSNP155 (2022-03-29 markd)
+----------------------------------------------------------------
+
+# dbSNP Variants	Lifted+Recovered	TBD	Dylan Taylor
+https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/liftover/chm13v2.0_dbSNPv155.vcf.gz
+dbSNP_lifted-recovered.html
+
+
+# need to use NCBI names until supported by chromAlias
+   zcat chm13v2.0_dbSNPv155.vcf.gz  | chromToUcsc --chromAlias=../chromAlias/GCA_009914755.4_T2T-CHM13v2.0.chromAlias.txt /dev/stdin | bgzip -c >chm13v2.0_dbSNPv155.ncbi-names.vcf.gz
+
+tabix -p vcf chm13v2.0_dbSNPv155.vcf.gz &
+tabix -p vcf chm13v2.0_dbSNPv155.ncbi-names.vcf.gz &
+
+
+================================================================
+* clinVar20220313 (2022-03-29 markd)
+----------------------------------------------------------------
+ClinVar	Lifted+Recovered	TBD	Dylan Taylor
+https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/liftover/chm13v2.0_ClinVar20220313.vcf.gz
+
+   zcat chm13v2.0_ClinVar20220313.vcf.gz | chromToUcsc --chromAlias=../chromAlias/GCA_009914755.4_T2T-CHM13v2.0.chromAlias.txt /dev/stdin | bgzip -c >chm13v2.0_ClinVar20220313.ncbi-names.vcf.gz
+
+tabix -p vcf chm13v2.0_ClinVar20220313.vcf.gz &
+tabix -p vcf chm13v2.0_ClinVar20220313.ncbi-names.vcf.gz &
+
+
+================================================================
+* gwasSNPs2022-03-08 (2022-03-29 markd
+----------------------------------------------------------------
+GWAS SNPs	Lifted+Recovered	TBD	Dylan Taylor
+
+https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/liftover/chm13v2.0_GWASv1.0rsids_e100_r2022-03-08.vcf.gz
+gwas_catalog_lifted-recovered.html											
+
+# need to use NCBI names until supported by chromAlias
+   zcat chm13v2.0_GWASv1.0rsids_e100_r2022-03-08.vcf.gz  | chromToUcsc --chromAlias=../chromAlias/GCA_009914755.4_T2T-CHM13v2.0.chromAlias.txt /dev/stdin | bgzip -c >chm13v2.0_GWASv1.0rsids_e100_r2022-03-08.ncbi-names.vcf.gz
+
+tabix -p vcf  chm13v2.0_GWASv1.0rsids_e100_r2022-03-08.ncbi-names.vcf.gz&
+tabix -p vcf  chm13v2.0_GWASv1.0rsids_e100_r2022-03-08.vcf.gz&
+
+================================================================
+pending:
+
+- ensembl:
+  http://ftp.ebi.ac.uk/pub/databases/ensembl/hprc/y1_freeze/ contains all Y1 assemblies;
+  http://ftp.ebi.ac.uk/pub/databases/ensembl/hprc/y1_freeze/GCA_009914755.4/ is CHM13v2
+
+- isoseq BAMs
+  http://courtyard.gi.ucsc.edu/~mhauknes/T2T/t2t_Y/out-t2t-chrY-augPB/assemblyHub/CHM13/
+  @PG   ID:minimap2   PN:minimap2   VN:2.22-r1105-dirty   CL:minimap2 -ax splice -f 1000 --sam-hit-only --secondary=no --eqx -K 100M -t 8 --cap-sw-mem=3g chm13v2.0.chrY.fasta HG002-NA24385-LCL-polished_isoforms_hq.fasta
+  globus /HG002-IsoSeq
+
+- isoseq
+    Fritz Sedlazeck  1 minute ago
+    STUDY: PRJNA754107
+     SAMPLE: GM27730 (SAMN20741798)
+      EXPERIMENT: PCD_NISTRM.NA27730-1_1sA-40 (SRX14226556)
+        RUN: m64139_220130_061226 (SRR18074967)
+    STUDY: PRJNA754107
+     SAMPLE: GM26105 (SAMN20741797)
+      EXPERIMENT: PCD_NISTRM.NA26105-1_1sA-40 (SRX14226558)
+        RUN: m64139_220131_122551 (SRR18074969)
+    STUDY: PRJNA200694
+     SAMPLE: NIST HG002 NA24385 (SAMN03283347)
+      EXPERIMENT: PCD_NISTRM.NA24385-1_1sA-40 (SRX14226557)
+        RUN: m64139_220127_180020 (SRR18074968)
+
+* unique kmers
+  Min unique k-mer (+)	Present in v1.0 and v2.0	Michael Sauria	/team-epigenetics/032522_chm13v2.0_kmers/mu/chm13v2.0.mul.bw	H	min_unique_kmer.html
+  Min unique k-mer (-)	Present in v1.0 and v2.0	Michael Sauria	/team-epigenetics/032522_chm13v2.0_kmers/mu/chm13v2.0.mur.bw	H
+
+* RepeatMasker
+  Savannah Hoyt/Jessica Storer	https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13v2.0_RepeatMasker_4.1.2p1.out	H
+
+* ENCODE
+  ENCODE pileups	Present in v1.0 and v2.0	Michael Sauria	/team-epigenetics/032522_chm13v2.0_encode/coverage/*.bw	H
+  ENCODE macs2 peaks	Present in v1.0 and v2.0	Michael Sauria	/team-epigenetics/032522_chm13v2.0_encode/peaks/*.bb	H
+  ENCoDE macs2 LO peaks	Present in v1.0	Michael Sauria		H
+
+* GRCh38
+  Unresolved in GRCh	GRCh38	TBD	Sergey Koren	browser/tracks/chm13v2.0_unmapped_byHG38.bed	H	chm13_uncovered_byGRCh38.html
+  GRCh37		Sergey Koren	browser/tracks/chm13v2.0_unmapped_byHG19.bed	H
+
+
+
+* GRCh38 variants
+  TBD	Nancy Hansen	team-liftover/chain_variants/vcffiles/v1_nflo/chm13v2-grch38.sort.vcf.gz	L	grch_allele_differences.html
+  GRCh37 variants	TBD	Nancy Hansen	team-liftover/chain_variants/vcffiles/v1_nflo/chm13v2-hg19.sort.vcf.gz	L
+
+* Gene GFF3/GTF downloads
+  http://courtyard.gi.ucsc.edu/~mhauknes/T2T/t2t_Y/annotation_set/CHM13.v2.0.gff3
+  
+================================================================
+Problems:
+- hub groups doesn't have phenDis, so put clinvar and GWAS in varRep
+================================================================