USMortality

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR109/081/SRR10971381/SRR10971381_{1,2}.fastq.gz

henjin

Apr 3, 2023Edited

Maybe the 30,473-base sequence of MN908947.1 can't be reproduced because human reads were removed from the files uploaded to the SRA. Or maybe those reads that consist of only N letters were originally the human reads.

The publication date of the raw reads is listed as 2020-01-27 at the SRA, but at that time MN908947.2 had already been published so they had probably noticed the error they made in MN908947.1 (https://www.ncbi.nlm.nih.gov/sra/SRR10971381).

When I tried aligning the paired reads against MN908947.1, there was no read whose starting position was lower than 29,830:

curl -so MN908947.1.fa 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&rettype=fasta&id=MN908947.1'

bwa index MN908947.fa;bwa mem -t 4 MN908947.fa SRR10971381_{1,2}.fastq.gz|awk '/^[^@]/{if($4>x)x=$4}END{print x}'

Expand full comment

paul & mongol, can we move this to a slack or telegram chat group?

We should discuss all this, in a better format, these comments are kind of getting out of hand ;)

Please let me know your preference., and i'll set it up..

Expand full comment

Reply (2)

henjin

Apr 5, 2023Edited

I was thinking earlier that maybe someone could start a Discord for COVID conspiracytard coders, or people who are writing scripts about COVID statistics or bioinformatics.

I've been posting some of my scripts on two SARS Discords but I don't think anyone there has even tried out to run my code.

Discord would be my preference, or oldschool forums like XenForo or Discourse. Discord is pretty good for posting code and images, and even if you have code that's longer than 2000 characters, you can post it as a text file that gets displayed with an expandable preview block, like in this example: https://i.ibb.co/M8gjF7F/20230405122308.png.

Expand full comment

Reply (2)

US Mortality

Apr 6, 2023

Hi Mongol, would you like to join: https://discord.gg/9Weas6Ev

Thanks, would be great to get you in there as well :)

Expand full comment

That looks pretty good , even if we look at the same data and come to different conclusions it's all good , youve cut a boatload of my messing around with seqkit and I struggled with awk, so I would love to pick your brains on this.

Expand full comment

US Mortality

Apr 5, 2023Edited

I've created a discord, please join: https://discord.gg/9Weas6Ev

Expand full comment

Invite expired , unfortunately!

Expand full comment

US Mortality

https://discord.gg/9Weas6Ev

Expand full comment

Hi, is it possible to get an invite to this discord?

I don't know if I have much to say, but I'd like to understand these things better :)

Expand full comment

yes , agree . Substack is hard to paste up charts ,commands and it have any meaning to the reader etc . Telegram would be my preference.

Expand full comment

Apr 4, 2023

You might think I am pulling your leg here , I had 12 sra files that I went through that where 98% +

installed lxd for clustering and its wiped my main PC , tried testdisk to recover the deleted files , but I havent found them yet , but I will run through similar searches again . First one I tried was https://www.ncbi.nlm.nih.gov/sra/SRR3647349 this came out with a 2% mismatch overall .

here is the end including the first error (N) Ngagtgtacagtgaacaatgctagggagagctgcctatatggaagagccctaatgtgtaaaattaattttagtagtgctatccccatgtgattttaatagcttcttaggagaatgacaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Most of the errors are in the first 10 bases , quality controlled lower than 20 is set to N ( seqtk)

mapped with bbmap , no reformatting , consensus sequence generated with bwa. I will keep looking , there were better ones . Reads are chopped into pieces as small as 15bp up to 127 , which is why I don't think much of alignment.

Expand full comment