While some software packages blissfully ignore unusual characters or whitespaces, others will complain. Today I ran two pipelines over the same file and both reported different sequence length. So, I ran this handy command:
cat sequence.fa | fold -w1 | sort | uniq -c
The beauty of fold command is that it will wrap any long continuous sequence at a fixed number of characters. Even after every single one. And voila, here are the results:
And yeay, indeed two pipelines didn’t agree by 375 characters. I must have somehow introduced spaces during the concatenation step when I was building a single sequence out of many smaller.
Its really great blog . I recommended to other people
Thanks for sharing this interesting article
For more details, visit: http://bit.ly/321Uvq9