Sponsored Content
Top Forums Shell Programming and Scripting need to remove invariant characters Post 302689575 by ljk on Tuesday 21st of August 2012 04:47:09 PM
Old 08-21-2012
need to remove invariant characters

Hello,
I have a nexus alignment file that looks like this:


bar101_min2covg_binarynex 11001-100111
bar102_min2covg_binarynex 110010010011
bar103_min2covg_binarynex 11101010--11

etc.

There are 41 rows and 28014 characters in each, with 0, 1, and missing data (-) as the three possibilities. Probably 80% of all the sites are invariant, and I would like to remove them from the alignment. So, I'm looking for a way to scan through this alignment file and remove all sites where all rows' values match, or where only 1 row differs, ignoring missing datapoints to make this determination (i.e. if several rows have missing data at a site but all the others match, it gets chopped). A slight complication is that the data come in pairs, so I need to evaluate sites 1/2, 3/4, 5/6, 7/8, etc. etc. in pairs and eliminate them only if both sites are invariant across all rows. I'm kind of stumped at how to approach this, and fairly new to this kind of data manipulation. Does anyone have suggestions for how I might approach this?

The ideal output from the example would be:

bar101_min2covg_binarynex 001001
bar102_min2covg_binarynex 000100
bar103_min2covg_binarynex 1010--


Thanks for the help!
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove control characters

Hi, When I do a man and save it into a file, I end up getting a lot of control characters. How can I remove them?? I tried this: /1,$ s/^H//g But I get an error saying "no previous regular expression". Can someone help me with this. Thanks, Aravind (5 Replies)
Discussion started by: aravind_mg
5 Replies

2. UNIX for Dummies Questions & Answers

How to remove Characters before '~'

Hi, I am having a file which contains records as follows: DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131 DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131 DETAIL_KEY~12344|ACTIVE_PASSIVE~Y|AVG_SIZE_OF_RESPONSE~123123131... (4 Replies)
Discussion started by: Amey Joshi
4 Replies

3. UNIX for Advanced & Expert Users

remove characters

hi i have a file with these strings: 123_abc_X1116990 how to get rid of 123_abc_ and keep only X1116990? I have columns of these: 123_abc_X1134640 123_dfg_X1100237 123_tyu_X1103112 123_tyui_X1116990 thx (5 Replies)
Discussion started by: melanie_pfefer
5 Replies

4. Shell Programming and Scripting

Remove characters from file name

Here is my code. for file in *1.3.html ; do mv "$file" `echo $file | tr '.1.3' ''` ; done For some reason I am getting an error. mv: file.idlesince.1.3.html and file.idlesince.1.3.html are identical Could this be done a different way? (5 Replies)
Discussion started by: mrlayance
5 Replies

5. UNIX for Dummies Questions & Answers

How to Remove Special Characters

Dear Members, We have a file which contains some special characters. I need to replace these special character by a new line character(\n). The Special character is \x85. I am not sure what this character means and how we can remove it. Any inputs are greatly appreciated. Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies

6. Shell Programming and Scripting

Remove whitespaces in the n first characters?

I assume removing whitespaces in the n first characters of a string would be an easy task for sed? If so, how? (7 Replies)
Discussion started by: KidCactus
7 Replies

7. UNIX for Dummies Questions & Answers

How do I remove ^M characters with VI

I have a file with all kinds of ^M at the end of each line. How the heck can these be removed? I tried a global search and replace, but it doesn't seem to work. Thanks! (8 Replies)
Discussion started by: HmmBerger
8 Replies

8. Shell Programming and Scripting

Remove first 2 characters and last two characters of each line

here's what im trying to do. i have a file containing lines similar to this: data.txt: 1hsRmRsbHRiSFZNTTA1dlEyMWFkbU5wUW5CSlIyeDFTVU5SYjJOSFRuWmpia0ZuWXpKV2FHTnRU 1lKUnpWMldrZFZaMG95V25oYQpSelEyWTBka2QyRklhSHBrUjA1b1kwUkJkd3BOVXpWM1lVaG5k... (5 Replies)
Discussion started by: SkySmart
5 Replies
All times are GMT -4. The time now is 07:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy