delete repeated strings (tags) in a line and concatenate corresponding words Post: 302469741

Sponsored Content

Top Forums Shell Programming and Scripting delete repeated strings (tags) in a line and concatenate corresponding words Post 302469741 by mjomba on Monday 8th of November 2010 03:48:09 AM

11-08-2010

Registered User

delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends!

Each line of my input file has this format:
word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma

Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the lemmata related to that tag, by concatenating them with a �|� separator.

My INPUT (sample):
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria N:abl abecedarium N:acc abecedaria N:acc abecedarium N:nom abecedaria N:nom abecedarium N:voc abecedaria N:voc abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorrueritis V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorruero V:IND abhorreo V:IND abhorresco

Desired OUTPUT:
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria|abecedarium N:acc abecedaria|abecedarium N:nom abecedaria|abecedarium N:voc abecedaria|abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo|abhorresco V:SUB abhorreo|abhorresco
abhorrueritis V:IND abhorreo |abhorresco V:SUB abhorreo|abhorresco
abhorruero V:IND abhorreo|abhorresco

Very gratefull to anyone who can help me!
mjomba from Tanzania

mjomba

View Public Profile for mjomba

Find all posts by mjomba

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot.

2. Shell Programming and Scripting

Concatenate strings line by line

Hi, I have a noob question . Can someone help me how to concatenate line by line using this variables? var1: Apple| Banana| var2: Red Yellow then how can I concatenate both line by line? in which the result would be: Apple|Red Banana|Yellow just to generate a row result i was...

3. Shell Programming and Scripting

How can i delete some words in every line in a file

Hi, I need help to delete a few words in every line in my file. This is how the file look like: VDC DQ 14900098,,,,157426.06849776753,786693.2919373367 10273032,,,,157525.49445429695,776574.5546672409 VDC DG ,10273033,,3er55,,149565.57096061576,801778.9379555212 AS174 892562,,,,, ...

4. Shell Programming and Scripting

To Delete the repeated occurances and print in same line by appending values

Hi All, I am trying to work on below text a b c 1 a b c 2 a b c 3 x y z 6 x y z 44 a b c 89 Need to delete the occurances and get in single line like below: a b c 1 2 3 89 x y z 6 44 89 Please help me i am new into unix scripting ..... ---------- Post updated at 03:00...

5. Shell Programming and Scripting

Delete 2 strings from 1 line with sed?

Hi guys, I wonder if it's possible to search for a line containing 2 strings and delete that line and perhaps replace the source file with already deleted line(s). What I mean is something like this: sourcefile.txt line1: something 122344 somethin2 24334 45554676 line2: another something...

6. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30...

7. UNIX for Dummies Questions & Answers

Delete lines according to a key words in that line

HI, I have a file A like this: c 1 length 14432 width 3434 temp 34 c 2 length 3343 width 0923 height 9383 hm 902 temp34 c 3 length 938 height 982 hm 9292 temp 23 ...

8. Shell Programming and Scripting

USING sed to remove multiple strings/words from a line

Hi I use sed comnand to remove occurance of one workd from a line. However I need to removed occurance of dufferent words in ne line. Original-1 Hi this is the END of my begining Comand sed s/"END"/"start"/g Output-1 Hi this is the start of my beginig But I have more...

9. Shell Programming and Scripting

Delete duplicate strings in a line

Hi, i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:- The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained...

10. Shell Programming and Scripting

Remove repeated letter words

Hi, I have this text file with these words and I need help with removing words with repeated letter from these lines. 1 ama 5 bib 29 bob 2 bub 5 civic 2 dad 10 deed 1 denned 335 did 1 eeee 1 eeeee 2 eke 8...

LEARN ABOUT DEBIAN

html::diff

HTML::Diff(3pm) 					User Contributed Perl Documentation					   HTML::Diff(3pm)

NAME

       HTML::Diff - compare two strings of HTML

       This module compares two strings of HTML and returns a list of a chunks which indicate the diff between the two input strings, where
       changes in formatting are considered changes.

       HTML::Diff does not strictly parse the HTML. Instead, it uses regular expressions to make a decent effort at understanding the given HTML.
       As a result, there are many valid HTML documents for which it will not produce the correct answer. But there may be some invalid HTML
       documents for which it gives you the answer you're looking for. Your mileage may vary; test it on lots of inputs from your domain before
       relying on it.

SYNOPSIS

	   $result = html_word_diff($left_text, $right_text);

DESCRIPTION

       Returns a reference to a list of triples [<flag>, <left>, <right>].  Each triple represents a check of the input texts. The flag tells you
       whether it represents a deletion, insertion, a modification, or an unchanged chunk.

       Every character of each input text is accounted for by some triple in the output. Specifically, Concatenating all the <left> members from
       the return value should produce $left_text, and likewise the <right> members concatenate together to produce $right_text.

       The <flag> is either 'u', '+', '-', or 'c', indicating whether the two chunks are the same, the $right_text contained this chunk and the
       left chunk didn't, or vice versa, or the two chunks are simply different. This follows the usage of Algorithm::Diff.

       The difference is computed on a word-by-word basis, "breaking" on visible words in the HTML text. If a tag only is changed, it will not be
       returned as an independent chunk but will be shown as a change to one of the neighboring words. For balanced tags, such as <b> </b>, it is
       intended that a change to the tag will be treated as a change to all words in between.

AUTHOR

       Whipped up by Ezra elias kilty Cooper, <ezra@ezrakilty.net>.

       Patch contributed by Adam <asjo@koldfront.dk>.

SEE ALSO

       Algorithm::Diff

perl v5.14.2							    2012-01-01							   HTML::Diff(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

Discussion started by: fontana

2. Shell Programming and Scripting

Concatenate strings line by line

Discussion started by: hagdanan

3. Shell Programming and Scripting

How can i delete some words in every line in a file

Discussion started by: andy_s

4. Shell Programming and Scripting

To Delete the repeated occurances and print in same line by appending values

Discussion started by: shaliniyadav