b1c4de4679118861216d329d842b6ed33bcccd78 markd Fri Feb 17 16:20:15 2023 -0800 started tabula-sapiens intron called, but stuck on help from CZI diff --git src/hg/makeDb/doc/hg38/tabulaSapiens.txt src/hg/makeDb/doc/hg38/tabulaSapiens.txt new file mode 100644 index 0000000..8ee393b --- /dev/null +++ src/hg/makeDb/doc/hg38/tabulaSapiens.txt @@ -0,0 +1,42 @@ +This describes building tabula-sapiens related tracks + + +############################################################################# +# bam-dec2022 download (Max) +############################################################################# + + +Download to: + /hive/data/inside/cells/datasets/tabula-sapiens/bam-dec2022/ + +some doc in https://github.com/czbiohub/tabula-sapiens + + +############################################################################# +# setup for intronProspector runs (markd) +############################################################################# + +Install htslib-1.16 and intronProspector-1.0.3 in /cluster/software/. Obtain +from: + + https://github.com/samtools/htslib/releases/download/1.16/htslib-1.16.tar.bz2 + https://github.com/diekhans/intronProspector/archive/refs/tags/v1.0.3.tar.gz + +Need to get genome sequence matching the BAMs. + +STAR/homo.gencode.v30.annotation.ERCC92 which is not in bucket + + +something is weird with samrtseq2 directory BAMs + cd /hive/data/inside/cells/datasets/tabula-sapiens/bam-dec2022/Pilot1/alignment-gencode/ + + % samtools view -H ./smartseq2/B107813_G5_S31.homo.covid19.Aligned.out.sorted.bam| head + [E::sam_hrecs_refs_from_targets_array] Duplicate entry "NC_040671" in target list + samtools view: failed to add PG line to the header + + % picard ValidateSamFile I=./smartseq2/B107813_G5_S31.homo.covid19.Aligned.out.sorted.bam + ERROR 2023-02-08 21:44:44 ValidateSamFile Cannot add sequence that already exists in SAMSequenceDictionary: NC_040671 + + +CZI contacted about problems +