delete repeated strings (tags) in a line and concatenate corresponding words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting delete repeated strings (tags) in a line and concatenate corresponding words
# 1  
Old 11-08-2010
delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends!

Each line of my input file has this format:
word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma

Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the lemmata related to that tag, by concatenating them with a “|” separator.

My INPUT (sample):
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria N:abl abecedarium N:acc abecedaria N:acc abecedarium N:nom abecedaria N:nom abecedarium N:voc abecedaria N:voc abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorrueritis V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorruero V:IND abhorreo V:IND abhorresco

Desired OUTPUT:
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria|abecedarium N:acc abecedaria|abecedarium N:nom abecedaria|abecedarium N:voc abecedaria|abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo|abhorresco V:SUB abhorreo|abhorresco
abhorrueritis V:IND abhorreo |abhorresco V:SUB abhorreo|abhorresco
abhorruero V:IND abhorreo|abhorresco

Very gratefull to anyone who can help me!
mjomba from Tanzania
# 2  
Old 11-08-2010
Hi mjomba, try this:
Code:
sed 's/\( [A-Z]:[[:alnum:]]* \)\([[:alnum:]]*\)\1/\1\2|/g' infile

# 3  
Old 11-08-2010
Hi,

Scrutinizer was faster, but another one using 'sed':
Code:
sed 's/\( \+[A-Za-z]\+:[A-Za-z]\+ \+\)\(.*\)\(\1\)/\1\2|/g' infile

Regards,
Birei
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove repeated letter words

Hi, I have this text file with these words and I need help with removing words with repeated letter from these lines. 1 ama 5 bib 29 bob 2 bub 5 civic 2 dad 10 deed 1 denned 335 did 1 eeee 1 eeeee 2 eke 8... (4 Replies)
Discussion started by: crepe6
4 Replies

2. Shell Programming and Scripting

Delete duplicate strings in a line

Hi, i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:- The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained... (12 Replies)
Discussion started by: redse171
12 Replies

3. Shell Programming and Scripting

USING sed to remove multiple strings/words from a line

Hi I use sed comnand to remove occurance of one workd from a line. However I need to removed occurance of dufferent words in ne line. Original-1 Hi this is the END of my begining Comand sed s/"END"/"start"/g Output-1 Hi this is the start of my beginig But I have more... (9 Replies)
Discussion started by: mnassiri
9 Replies

4. UNIX for Dummies Questions & Answers

Delete lines according to a key words in that line

HI, I have a file A like this: c 1 length 14432 width 3434 temp 34 c 2 length 3343 width 0923 height 9383 hm 902 temp34 c 3 length 938 height 982 hm 9292 temp 23 ... (2 Replies)
Discussion started by: the_simpsons
2 Replies

5. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

6. Shell Programming and Scripting

Delete 2 strings from 1 line with sed?

Hi guys, I wonder if it's possible to search for a line containing 2 strings and delete that line and perhaps replace the source file with already deleted line(s). What I mean is something like this: sourcefile.txt line1: something 122344 somethin2 24334 45554676 line2: another something... (6 Replies)
Discussion started by: netrom
6 Replies

7. Shell Programming and Scripting

To Delete the repeated occurances and print in same line by appending values

Hi All, I am trying to work on below text a b c 1 a b c 2 a b c 3 x y z 6 x y z 44 a b c 89 Need to delete the occurances and get in single line like below: a b c 1 2 3 89 x y z 6 44 89 Please help me i am new into unix scripting ..... ---------- Post updated at 03:00... (8 Replies)
Discussion started by: shaliniyadav
8 Replies

8. Shell Programming and Scripting

How can i delete some words in every line in a file

Hi, I need help to delete a few words in every line in my file. This is how the file look like: VDC DQ 14900098,,,,157426.06849776753,786693.2919373367 10273032,,,,157525.49445429695,776574.5546672409 VDC DG ,10273033,,3er55,,149565.57096061576,801778.9379555212 AS174 892562,,,,, ... (2 Replies)
Discussion started by: andy_s
2 Replies

9. Shell Programming and Scripting

Concatenate strings line by line

Hi, I have a noob question . Can someone help me how to concatenate line by line using this variables? var1: Apple| Banana| var2: Red Yellow then how can I concatenate both line by line? in which the result would be: Apple|Red Banana|Yellow just to generate a row result i was... (6 Replies)
Discussion started by: hagdanan
6 Replies

10. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies
Login or Register to Ask a Question