11-08-2010
delete repeated strings (tags) in a line and concatenate corresponding words
Hello friends!
Each line of my input file has this format:
word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma
Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the lemmata related to that tag, by concatenating them with a “|” separator.
My INPUT (sample):
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria N:abl abecedarium N:acc abecedaria N:acc abecedarium N:nom abecedaria N:nom abecedarium N:voc abecedaria N:voc abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorrueritis V:IND abhorreo V:IND abhorresco V:SUB abhorreo V:SUB abhorresco
abhorruero V:IND abhorreo V:IND abhorresco
Desired OUTPUT:
abecedaria ADJ abecedarius ADJ:abl abecedarius N:abl abecedaria|abecedarium N:acc abecedaria|abecedarium N:nom abecedaria|abecedarium N:voc abecedaria|abecedarium
abecedariabus N:abl abecedaria N:dat abecedaria
abhorruerimus V:IND abhorreo|abhorresco V:SUB abhorreo|abhorresco
abhorrueritis V:IND abhorreo |abhorresco V:SUB abhorreo|abhorresco
abhorruero V:IND abhorreo|abhorresco
Very gratefull to anyone who can help me!
mjomba from Tanzania
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
like connect "summer" and "winter" to "summerwinter"?
Can anybody help me? thanks a lot. (2 Replies)
Discussion started by: fontana
2 Replies
2. Shell Programming and Scripting
Hi, I have a noob question . Can someone help me how to concatenate line by line using this variables?
var1:
Apple|
Banana|
var2:
Red
Yellow
then how can I concatenate both line by line? in which the result would be:
Apple|Red
Banana|Yellow
just to generate a row result i was... (6 Replies)
Discussion started by: hagdanan
6 Replies
3. Shell Programming and Scripting
Hi,
I need help to delete a few words in every line in my file.
This is how the file look like:
VDC DQ 14900098,,,,157426.06849776753,786693.2919373367
10273032,,,,157525.49445429695,776574.5546672409
VDC DG ,10273033,,3er55,,149565.57096061576,801778.9379555212
AS174 892562,,,,,
... (2 Replies)
Discussion started by: andy_s
2 Replies
4. Shell Programming and Scripting
Hi All,
I am trying to work on below text
a b c 1
a b c 2
a b c 3
x y z 6
x y z 44
a b c 89
Need to delete the occurances and get in single line like below:
a b c 1 2 3 89
x y z 6 44 89
Please help me i am new into unix scripting .....
---------- Post updated at 03:00... (8 Replies)
Discussion started by: shaliniyadav
8 Replies
5. Shell Programming and Scripting
Hi guys, I wonder if it's possible to search for a line containing 2 strings and delete that line and perhaps replace the source file with already deleted line(s).
What I mean is something like this:
sourcefile.txt
line1: something 122344 somethin2 24334 45554676
line2: another something... (6 Replies)
Discussion started by: netrom
6 Replies
6. Shell Programming and Scripting
Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it
Input
fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies
7. UNIX for Dummies Questions & Answers
HI, I have a file A like this:
c 1
length 14432
width 3434
temp 34
c 2
length 3343
width 0923
height 9383
hm 902
temp34
c 3
length 938
height 982
hm 9292
temp 23
... (2 Replies)
Discussion started by: the_simpsons
2 Replies
8. Shell Programming and Scripting
Hi
I use sed comnand to remove occurance of one workd from a line.
However I need to removed occurance of dufferent words in ne line.
Original-1 Hi this is the END of my begining
Comand sed s/"END"/"start"/g
Output-1 Hi this is the start of my beginig
But I have more... (9 Replies)
Discussion started by: mnassiri
9 Replies
9. Shell Programming and Scripting
Hi,
i need help to remove duplicates in my file. The problem is i need to delete one duplicate for each line only. the input file as follows and it is not tab delimited:-
The output need to remove 2nd word (in red) that duplicate with 1st word (in blue). Other duplicates should remained... (12 Replies)
Discussion started by: redse171
12 Replies
10. Shell Programming and Scripting
Hi,
I have this text file with these words and I need help with removing words with repeated letter from these lines.
1 ama
5 bib
29 bob
2 bub
5 civic
2 dad
10 deed
1 denned
335 did
1 eeee
1 eeeee
2 eke
8... (4 Replies)
Discussion started by: crepe6
4 Replies
LEARN ABOUT DEBIAN
html::diff
HTML::Diff(3pm) User Contributed Perl Documentation HTML::Diff(3pm)
NAME
HTML::Diff - compare two strings of HTML
This module compares two strings of HTML and returns a list of a chunks which indicate the diff between the two input strings, where
changes in formatting are considered changes.
HTML::Diff does not strictly parse the HTML. Instead, it uses regular expressions to make a decent effort at understanding the given HTML.
As a result, there are many valid HTML documents for which it will not produce the correct answer. But there may be some invalid HTML
documents for which it gives you the answer you're looking for. Your mileage may vary; test it on lots of inputs from your domain before
relying on it.
SYNOPSIS
$result = html_word_diff($left_text, $right_text);
DESCRIPTION
Returns a reference to a list of triples [<flag>, <left>, <right>]. Each triple represents a check of the input texts. The flag tells you
whether it represents a deletion, insertion, a modification, or an unchanged chunk.
Every character of each input text is accounted for by some triple in the output. Specifically, Concatenating all the <left> members from
the return value should produce $left_text, and likewise the <right> members concatenate together to produce $right_text.
The <flag> is either 'u', '+', '-', or 'c', indicating whether the two chunks are the same, the $right_text contained this chunk and the
left chunk didn't, or vice versa, or the two chunks are simply different. This follows the usage of Algorithm::Diff.
The difference is computed on a word-by-word basis, "breaking" on visible words in the HTML text. If a tag only is changed, it will not be
returned as an independent chunk but will be shown as a change to one of the neighboring words. For balanced tags, such as <b> </b>, it is
intended that a change to the tag will be treated as a change to all words in between.
AUTHOR
Whipped up by Ezra elias kilty Cooper, <ezra@ezrakilty.net>.
Patch contributed by Adam <asjo@koldfront.dk>.
SEE ALSO
Algorithm::Diff
perl v5.14.2 2012-01-01 HTML::Diff(3pm)