Removing columns with dashes


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing columns with dashes
# 1  
Old 07-07-2010
Removing columns with dashes

My files look like this
Quote:
>GHL8OVD01BNNCA Freq 4
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGT-GCAGCA-TA
>GHL8OVD01CMQVT Freq 15
TTGATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGT-GCAGCA-TA
>GHL8OVD01CMQVT Freq 50
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACCTTCGC-TA
>GHL8OVD01CMQVW Freq 700
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCA-TACCAG-AC
>GHL8OVD01A45V3 Freq 9
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTA-AGGACCTTC
>GHL8OVD01AV2U9 Freq 17
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGT-GCAGCG-TA
I need to remove the columns where dashes are the majority, if any of the sequences has any character in that particular position it should be removed too. The IDs and Freqs should be kept intact. Thus, the resulting file should look like this
Quote:
>GHL8OVD01BNNCA Freq 4
TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTGCAGCATA
>GHL8OVD01CMQVT Freq 15
TTGATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTGCAGCATA
>GHL8OVD01CMQVT Freq 50
TTGATGTGCCAGTTTCCCGTCTAGCAGCACTACCAGGACTTCGCTA
>GHL8OVD01CMQVW Freq 700
TTGATGTGTCCCGTCGACACCGGCAAATAGCAGCAGCATACCAGAC
>GHL8OVD01A45V3 Freq 9
TTGATTCCCGTCGACACCGGCAAATAGCAGCAGCACTAAGGACCTC
>GHL8OVD01AV2U9 Freq 17
TTGATGTGCCAGCTTTCGCGTCGACACCGGCAAATAGTGCAGCGTA
Thanks in advance
# 2  
Old 07-07-2010
tr command?

what about using
Code:
tr -d "-" <file1 >file2

to remove the dash characters.
# 3  
Old 07-07-2010
What have you tried so far?
You've had over 70 posts with multiple solutions given to you in sed/awk/perl for the similarly formatted file. You should be able to come up with at least the initial approach.
# 4  
Old 07-07-2010
vgersh99

I have initiated 16 threads for different 'actions' that happen to be used for the same type of data (DNA). When I decide to start a thread is because I do not know how to go about it, otherwise, I would not post it. I have initiated 23 threads in total, most of them for shell scripting but not exclusively (Red Hat, Windows & DOS, etc). Is it against the rules to post questions that will be dealing with the same type of data? I could always change the format and get a solution that will work but I do not see a reason to do so.
Please advice
# 5  
Old 07-07-2010
Question OK, now I think I may understand a little better

If there is a - in position 32 for any record, then position 32 should be deleted for all records?

If so, then need to:
determine what positions hold - characters
convert (delete columns) for records

Correct?
# 6  
Old 07-07-2010
Try this (it is one big command):
Code:
perl -nla -F"" -e 'if (!/^>/){$n++;for ($i=0;$i<=$#F;$i++){$a{$i}{$F[$i]}++}}END{for ($i=0;$i<=$#F;$i++){if ($a{$i}{"-"}/$n>0.5)\
{print $i}}}' file | awk -vFS="" -vOFS="" 'NR==FNR{a[$0+1]++}{for (i=1;i<=NF;i++) if (i in a) $i=""}1' - file

# 7  
Old 07-07-2010
bartus11

Thank you very much -I own you many at this stage Smilie
I tried your code
Code:
$ perl -nla -F"" -e 'if (!/^>/){$n++;for ($i=0;$i<=$#F;$i++){$a{$i}{$F[$i]}++}}END{for ($i=0;$i<=$#F;$i++){if ($a{$i}{"-"}/$n>0.5)\
> {print $i}}}' Input.txt | awk -vFS="" -vOFS="" 'NR==FNR{a[$0+1]++}{for (i=1;i<=NF;i++) if (i in a) $i=""}1' - Input.txt > Output.txt

and this is what I got
Code:
Backslash found where operator expected at -e line 1, near ")\"
(Missing operator before \?)
syntax error at -e line 1, near ")\"
syntax error at -e line 2, near ";}"
Execution of -e aborted due to compilation errors.
awk: cmd. line:1: fatal: cannot open file `Input.txt' for reading (No such file or directory)

Am I missing something?

---------- Post updated at 04:29 PM ---------- Previous update was at 04:25 PM ----------

Joeyg,
Exactly! Not only the dashes should be gone but also the character from any record in that particular position. In other words, the entire column should be removed (the IDs and Freqs should be kept intact).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns: 4 0.04972604 KLHL28 4 0.0497332 CSTB 4 0.04979822 AIF1 4 0.04983331 DECR2 4 0.04990344 KATNB1 4 4 4 4 How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies

2. Shell Programming and Scripting

Removing columns using awk

HI , I have a comma delimiter file, in which I want to remove 8th and 9th column. I tried removing those columns using the below code awk 'BEGIN { FS=","; OFS="," } {$8=$9="";gsub(",+",",",$0)}1' infile But the problem is 8th and 9th columns are user entered fields, theyvhave carriage... (1 Reply)
Discussion started by: mora
1 Replies

3. Shell Programming and Scripting

Removing columns using awk

HI , I want to remove 5th and 6th column from a .csv file using awk.is there any way of this apart from writing the each field as below awk -F, '{print $1,$2,$3,$7......$100}' OFS=, infile. Thx, Deepti (4 Replies)
Discussion started by: gaur.deepti
4 Replies

4. Shell Programming and Scripting

Removing columns from awk '{ print $0 }'

I have a one-line command, lsusb | awk '{ $1=""; $2=""; $3=""; $4=""; $5=""; $6=""; print $0 }' It works, and gives the results I expect, I was just wondering if I am missing some easier way to nullify the first 6 column variables? Something like, lsusb | awk '{ $(1-6)=""; print $0 }' But... (10 Replies)
Discussion started by: AlphaLexman
10 Replies

5. Shell Programming and Scripting

Removing blank columns from a file

Hi, I have a sample file as shown below -- id parent name dba -----------------------------------... (7 Replies)
Discussion started by: sumirmehta
7 Replies

6. Shell Programming and Scripting

Help removing lines with duplicated columns

Hi Guys... Please Could you help me with the following ? aaaa bbbb cccc sdsd aaaa bbbb cccc qwer as you can see, the 2 lines are matched in three fields... how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ? Thanks (14 Replies)
Discussion started by: yahyaaa
14 Replies

7. UNIX for Dummies Questions & Answers

Removing columns of a file using vi Editor

Hi Experts, I have a file which looks like in this way 1 2233|A.K Shukla |G.M |Sales |12/12/52|6000 2 9876|Jai Sharma |Director |Production |12/03/50|67000 3 5678|Sumit Chakarborty |D.G.M |Marketing |19/04/43|6000 4 2365|Barun... (2 Replies)
Discussion started by: DilipPanda
2 Replies

8. UNIX for Dummies Questions & Answers

Removing lines that are (same in content) based on columns

I have a file which looks like AA BB CC DD EE FF GG HH KK AA BB GG HH KK FF CC DD EE AA BB CC DD EE UU VV XX ZZ AA BB VV XX ZZ UU CC DD EE .... I want the script to give me only one line based on duplicate contents: AA BB CC DD EE FF GG HH KK AA BB CC DD EE UU VV XX ZZ (7 Replies)
Discussion started by: adsforall
7 Replies

9. Shell Programming and Scripting

removing spaces betweent columns

Hello Friends, Can any one help me with this issue: I would like to format a file: say if I have rows like: 4512 , SMITH , I-28984 ,, 4324 , 4343 42312 , SMITH , I-2EE8984 ,, 432E4E4 , 4343 I would like to have the output diaplayed like : 4512... (8 Replies)
Discussion started by: sbasetty
8 Replies

10. UNIX for Dummies Questions & Answers

help removing dashes from social security number

I have a file containing social security numbers with the format ###-##-####. I need to read each record in this file, reformat the SSN to the format #########, and write the record with the reformatted SSN to a new file. I am a UNIX newbie. I think I need to use either the sed or awk commands, but... (2 Replies)
Discussion started by: Marcia P
2 Replies
Login or Register to Ask a Question