Sponsored Content
Top Forums Shell Programming and Scripting delete repeated strings (tags) in a line and concatenate corresponding words Post 302469741 by mjomba on Monday 8th of November 2010 03:48:09 AM
Old 11-08-2010
delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends!

Each line of my input file has this format:
word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma

Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the lemmata related to that tag, by concatenating them with a “|” separator.

My INPUT (sample):
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria N:abl abecedarium N:acc abecedaria N:acc abecedarium N:nom abecedaria N:nom abecedarium N:voc abecedaria N:voc abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorrueritis V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorruero V:IND abhorreo V:IND abhorresco

Desired OUTPUT:
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria|abecedarium N:acc abecedaria|abecedarium N:nom abecedaria|abecedarium N:voc abecedaria|abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo|abhorresco V:SUB abhorreo|abhorresco
abhorrueritis V:IND abhorreo |abhorresco V:SUB abhorreo|abhorresco
abhorruero V:IND abhorreo|abhorresco

Very gratefull to anyone who can help me!
mjomba from Tanzania
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to concatenate two strings or several strings into one string in B-shell?

like connect "summer" and "winter" to "summerwinter"? Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies

2. Shell Programming and Scripting

Concatenate strings line by line

Hi, I have a noob question . Can someone help me how to concatenate line by line using this variables? var1: Apple| Banana| var2: Red Yellow then how can I concatenate both line by line? in which the result would be: Apple|Red Banana|Yellow just to generate a row result i was... (6 Replies)
Discussion started by: hagdanan
6 Replies

3. Shell Programming and Scripting

How can i delete some words in every line in a file

Hi, I need help to delete a few words in every line in my file. This is how the file look like: VDC DQ 14900098,,,,157426.06849776753,786693.2919373367 10273032,,,,157525.49445429695,776574.5546672409 VDC DG ,10273033,,3er55,,149565.57096061576,801778.9379555212 AS174 892562,,,,, ... (2 Replies)
Discussion started by: andy_s
2 Replies

4. Shell Programming and Scripting

To Delete the repeated occurances and print in same line by appending values

Hi All, I am trying to work on below text a b c 1 a b c 2 a b c 3 x y z 6 x y z 44 a b c 89 Need to delete the occurances and get in single line like below: a b c 1 2 3 89 x y z 6 44 89 Please help me i am new into unix scripting ..... ---------- Post updated at 03:00... (8 Replies)
Discussion started by: shaliniyadav
8 Replies

5. Shell Programming and Scripting

Delete 2 strings from 1 line with sed?

Hi guys, I wonder if it's possible to search for a line containing 2 strings and delete that line and perhaps replace the source file with already deleted line(s). What I mean is something like this: sourcefile.txt line1: something 122344 somethin2 24334 45554676 line2: another something... (6 Replies)
Discussion started by: netrom
6 Replies

6. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

7. UNIX for Dummies Questions & Answers

Delete lines according to a key words in that line

HI, I have a file A like this: c 1 length 14432 width 3434 temp 34 c 2 length 3343 width 0923 height 9383 hm 902 temp34 c 3 length 938 height 982 hm 9292 temp 23 ... (2 Replies)
Discussion started by: the_simpsons
2 Replies

8. Shell Programming and Scripting

USING sed to remove multiple strings/words from a line

Hi I use sed comnand to remove occurance of one workd from a line. However I need to removed occurance of dufferent words in ne line. Original-1 Hi this is the END of my begining Comand sed s/"END"/"start"/g Output-1 Hi this is the start of my beginig But I have more... (9 Replies)
Discussion started by: mnassiri
9 Replies

9. Shell Programming and Scripting

Delete duplicate strings in a line

Hi, i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:- The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained... (12 Replies)
Discussion started by: redse171
12 Replies

10. Shell Programming and Scripting

Remove repeated letter words

Hi, I have this text file with these words and I need help with removing words with repeated letter from these lines. 1 ama 5 bib 29 bob 2 bub 5 civic 2 dad 10 deed 1 denned 335 did 1 eeee 1 eeeee 2 eke 8... (4 Replies)
Discussion started by: crepe6
4 Replies
HTML::Diff(3pm) 					User Contributed Perl Documentation					   HTML::Diff(3pm)

NAME
HTML::Diff - compare two strings of HTML This module compares two strings of HTML and returns a list of a chunks which indicate the diff between the two input strings, where changes in formatting are considered changes. HTML::Diff does not strictly parse the HTML. Instead, it uses regular expressions to make a decent effort at understanding the given HTML. As a result, there are many valid HTML documents for which it will not produce the correct answer. But there may be some invalid HTML documents for which it gives you the answer you're looking for. Your mileage may vary; test it on lots of inputs from your domain before relying on it. SYNOPSIS
$result = html_word_diff($left_text, $right_text); DESCRIPTION
Returns a reference to a list of triples [<flag>, <left>, <right>]. Each triple represents a check of the input texts. The flag tells you whether it represents a deletion, insertion, a modification, or an unchanged chunk. Every character of each input text is accounted for by some triple in the output. Specifically, Concatenating all the <left> members from the return value should produce $left_text, and likewise the <right> members concatenate together to produce $right_text. The <flag> is either 'u', '+', '-', or 'c', indicating whether the two chunks are the same, the $right_text contained this chunk and the left chunk didn't, or vice versa, or the two chunks are simply different. This follows the usage of Algorithm::Diff. The difference is computed on a word-by-word basis, "breaking" on visible words in the HTML text. If a tag only is changed, it will not be returned as an independent chunk but will be shown as a change to one of the neighboring words. For balanced tags, such as <b> </b>, it is intended that a change to the tag will be treated as a change to all words in between. AUTHOR
Whipped up by Ezra elias kilty Cooper, <ezra@ezrakilty.net>. Patch contributed by Adam <asjo@koldfront.dk>. SEE ALSO
Algorithm::Diff perl v5.14.2 2012-01-01 HTML::Diff(3pm)
All times are GMT -4. The time now is 05:27 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy