delete duplicated characters in each line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting delete duplicated characters in each line
# 1  
Old 12-21-2009
delete duplicated characters in each line

I'm a biologist trying to analyse some data and I'll appreciate some help with the following problem. I have a column of characters which I'll like to delete the duplicated characters in each line and report only the unique one.No sorting should be done. E.g.

The original data:

GTG
CTC
CTC
CTC
GCGAGC
GCGAGC
GCGAGC
GCGAGC
GATGTG
GATGTG
GATGTG
GATGTG
A
A
C

And I'm hoping to get:

G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/

I've tried using tr, awk with if conditions but getting nowhere.

Thank you.
# 2  
Old 12-21-2009
Code:
cat abc.txt |  perl -e '
while(<>){ 
         chomp;
         my %hash;
         map { print "$_/" } grep(!$hash{$_}++, split(//));
         print "\n";
}'

HTH,
PL
# 3  
Old 12-21-2009
I have tried something with python, ghostdog74, please suggest.
Code:
def u(list):
    set = {}
    return [set.setdefault(a,a) for a in list if a not in set]

for line in open("input.txt"):
        t = tuple(line)
        p = u(t)
        print "/".join(p),

Code:
$ python mt.py
G/T/
C/T/
C/T/
C/T/
G/C/A/
G/C/A/
G/C/A/
G/C/A/
G/A/T/
G/A/T/
G/A/T/
G/A/T/
A/
A/
C/

# 4  
Old 12-22-2009
Code:
awk -F "" '{for (i=1;i<=NF;i++) { a[$i]++ ; if (a[$i]==1) printf $i"/" }} {printf "\n"} {for (i in a) a[i]=0} ' urfile

# 5  
Old 12-22-2009
Thanks everyone for your help!
# 6  
Old 12-22-2009
Code:
perl -nle'
  %_=()or print join("/",grep!$_{$_}++,split//),"/";
  ' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicated records and update last line record counts

Hi Gurus, I need to remove duplicate line in file and update TRAILER (last line) record count. the file is comma delimited, field 2 is key to identify duplicated record. I can use below command to remove duplicated. but don't know how to replace last line 2nd field to new count. awk -F","... (11 Replies)
Discussion started by: green_k
11 Replies

2. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Hi, I have the following output from an Oracle SQL statement and I want to remove duplicated column values. I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those. So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies

3. Shell Programming and Scripting

Delete duplicated fields in a line

Hi, I have files with this kind of format (separator is space): A1 B1 C1 D1 E1 F1 D1 C1 G1 H1 A2 B2 C2 D2 E2 F2 D2 C2 G2 H2 A3 B3 C3 D3 E3 F3 G3 D3 C3 H3 A4 B4 C4 D4 E4 F4 G4 D4 C4 H4 I want the output to be: A1 B1 E1 F1 G1 H1 A2 B2 E2 F2 G2 H2 A3 B3 E3 F3 G3 H3 A4 B4 E4 F4 G4... (12 Replies)
Discussion started by: Gr4wk
12 Replies

4. UNIX for Dummies Questions & Answers

How to delete text between two characters in line?

Hi, I have a large text file with the following format: >gi|347545744|gb|JN204951.1| Dismorphia spio voucher 5 ATCAAATTCCTTCCTCTCCTTAAA >gi|17544664774|gb|WN204922.32| Rodapara nigens gene region CCGGGCAAATTCCTTCCTCTCCTTAAA >gi|555466400|gb|SG255122.8| Bombyx mandariana genbank 3... (1 Reply)
Discussion started by: euspilapteryx
1 Replies

5. Shell Programming and Scripting

How delete characters of specific line with sed?

Hi, I have a text file with some lines like this: /MEDIA/DISK1/23568742.MOV /MEDIA/DISK1/87456321.AVI /MEDIA/DISK2/PART1/45753131.AVI /IMPORT/44452.WAV ... I want to remove the last 12 characters in each line that it ends "AVI". Should look like this: /MEDIA/DISK1/23568742.MOV... (12 Replies)
Discussion started by: inaki
12 Replies

6. Shell Programming and Scripting

Delete characters from each line until meet character ":"

Hello, I have file that looks like this : 765327564:line1 94:line2 7865:line3 ..... 765322:linen I want to cut all the digits from the beginning of each line up to ":" character and to have everything like this : line1 line2 line3 ..... linen P.S : content of line1 ...... (8 Replies)
Discussion started by: black_fender
8 Replies

7. Shell Programming and Scripting

delete first 2 characters for each line, please help

hi, ./R1_970330_210505.sard ./R1_970403_223412.sard ./R1_970626_115235.sard ./R1_970626_214344.sard ./R1_970716_234214.sard ... ... ... for these strings, i wanna remove the ./ for each line how can i do that? i know it could possibly be done by sed, but i really have not idea how... (4 Replies)
Discussion started by: sunnydanniel
4 Replies

8. Shell Programming and Scripting

Delete characters from each line

Hi, I have a file that has data in the following manner, tt_0.00001.dat 123.000 tt_0.00002.dat 124.000 tt_0.00002.dat 125.000 This is consistent for all the entries in the file. I want to delete the 'tt_' and '.dat' from each line. Could anyone please guide me how to do this using awk or... (2 Replies)
Discussion started by: lost.identity
2 Replies

9. Shell Programming and Scripting

Delete new line characters from a file

Hi, I have a file with about 25 colums separated with '~', but few of the lines have extra tabs ('^') and new line characters ('$'). Is there a way I can delete those characters if they are anywhere before the 25th column in a line? example: CLUB000650;12345678;0087788667;NOOP MEMBER ... (4 Replies)
Discussion started by: rudoraj
4 Replies

10. Shell Programming and Scripting

grep and delete 2nd duplicated of txt... -part2

Hi, I find out one problem is...the main point is we must delete 2nd duplicated of word in txt file. For example apple orange pink green orange yellow orange red output should be: apple orange pink green yellow orange (16 Replies)
Discussion started by: happyv
16 Replies
Login or Register to Ask a Question