08-21-2012
need to remove invariant characters
Hello,
I have a nexus alignment file that looks like this:
bar101_min2covg_binarynex 11001-100111
bar102_min2covg_binarynex 110010010011
bar103_min2covg_binarynex 11101010--11
etc.
There are 41 rows and 28014 characters in each, with 0, 1, and missing data (-) as the three possibilities. Probably 80% of all the sites are invariant, and I would like to remove them from the alignment. So, I'm looking for a way to scan through this alignment file and remove all sites where all rows' values match, or where only 1 row differs, ignoring missing datapoints to make this determination (i.e. if several rows have missing data at a site but all the others match, it gets chopped). A slight complication is that the data come in pairs, so I need to evaluate sites 1/2, 3/4, 5/6, 7/8, etc. etc. in pairs and eliminate them only if both sites are invariant across all rows. I'm kind of stumped at how to approach this, and fairly new to this kind of data manipulation. Does anyone have suggestions for how I might approach this?
The ideal output from the example would be:
bar101_min2covg_binarynex 001001
bar102_min2covg_binarynex 000100
bar103_min2covg_binarynex 1010--
Thanks for the help!
8 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
When I do a man and save it into a file, I end up getting a lot of control characters. How can I remove them??
I tried this:
/1,$ s/^H//g
But I get an error saying "no previous regular expression".
Can someone help me with this.
Thanks,
Aravind (5 Replies)
Discussion started by: aravind_mg
5 Replies
2. UNIX for Dummies Questions & Answers
Hi,
I am having a file which contains records as follows:
DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131
DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131
DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131... (4 Replies)
Discussion started by: Amey Joshi
4 Replies
3. UNIX for Advanced & Expert Users
hi
i have a file with these strings:
123_abc_X1116990
how to get rid of 123_abc_ and keep only X1116990?
I have columns of these:
123_abc_X1134640
123_dfg_X1100237
123_tyu_X1103112
123_tyui_X1116990
thx (5 Replies)
Discussion started by: melanie_pfefer
5 Replies
4. Shell Programming and Scripting
Here is my code.
for file in *1.3.html ; do mv "$file" `echo $file | tr '.1.3' ''` ; done
For some reason I am getting an error.
mv: file.idlesince.1.3.html and file.idlesince.1.3.html are identical
Could this be done a different way? (5 Replies)
Discussion started by: mrlayance
5 Replies
5. UNIX for Dummies Questions & Answers
Dear Members,
We have a file which contains some special characters. I need to replace these special character by a new line character(\n).
The Special character is \x85.
I am not sure what this character means and how we can remove it.
Any inputs are greatly appreciated.
Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies
6. Shell Programming and Scripting
I assume removing whitespaces in the n first characters of a string would be an easy task for sed? If so, how? (7 Replies)
Discussion started by: KidCactus
7 Replies
7. UNIX for Dummies Questions & Answers
I have a file with all kinds of ^M at the end of each line. How the heck can these be removed? I tried a global search and replace, but it doesn't seem to work.
Thanks! (8 Replies)
Discussion started by: HmmBerger
8 Replies
8. Shell Programming and Scripting
here's what im trying to do.
i have a file containing lines similar to this:
data.txt:
1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU
1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies