There wasn't any duplicate text in your sample input file. And, it isn't clear if you want to remove text that is duplicated or if you want to remove text along with the opening and closing XML <Text> tags where the text and tags are duplicated.
Furthermore, although gawk on Linux systems uses the entire string assigned to the RS variable as the input record separator, the standards say that the behavior is unspecified if RS is more than one character. The version of awk I'm using only uses the first character of RS, so the structure of my code is slightly different from that suggested by cgkmal.
Your sample input included the line:
Code:
<Text Text_ID="10154713369385165_10154714426085165" From="415855878601070" Created="2015-10-30T23:27:48+0000" use_count="1">This is the fourth text........</Text>
but that line does not appear in the output that you said should be produced. Why shouldn't this line be included in the output?
Assuming that you just want to consider the text between the tags (and not the tags themselves) when looking for duplicates, you could try something like:
If the sample input you showed us in post #1 is contained in a file named file, it produces the output:
Code:
<Text Text_ID="10155645315850165_10155645333075165" From="460350337463650" Created="2014-10-16T17:05:37+0000" use_count="536">This is the first text</Text>
<Text Text_ID="10155645315850165_10155645317025165" From="1626711840908498" Created="2014-10-16T17:01:02+0000" use_count="408">This is the second text</Text>
<Text Text_ID="10155645315850165_10155645320000165" From="1481727095388591" Created="2014-10-16T17:02:04+0000" use_count="1064">This is the third text
If counted
GOT IT... ����</Text>
<Text Text_ID="10154713369385165_10154714450825165" From="464236763734179" Created="2015-10-30T23:34:47+0000" use_count="1">This is is just a sample text......</Text>
<Text Text_ID="10154713369385165_10154714444345165" From="642181809247720" Created="2015-10-30T23:31:48+0000" use_count="1">This is just another sample text.......</Text>
<Text Text_ID="10154713369385165_10154714426085165" From="415855878601070" Created="2015-10-30T23:27:48+0000" use_count="1">This is the fourth text........</Text>
<Text Text_ID="10154713369385165_10154714406055165" From="10202898434142187" Created="2015-10-30T23:23:34+0000" use_count="1">Jor se Bharat Mata ki jai</Text>
(including the line shown in red that was not included in your desired output).
Note that this code assumes that there is no whitespace between the last word in your text and the closing </Text> tag, that there are no > characters in the text in your file, and that the only tags in your XML file are opening and closing text tags (<Text ...> and </Text>, respectively). The code cgkmal provided makes these same assumptions and additionally assumes that there is no whitespace after the end of the opening text tag before the first word of the text, that there is nothing other than a <newline> character after a closing text tag, and that there are always five words in an opening text tag.
This User Gave Thanks to Don Cragun For This Post:
I am trying to print 1st, 2nd, 13th and 14th fields of a file of line numbers from 29 to 10029. I dont know how to put this in one code. Currently I am removing the selected lines by
awk 'NR==29,NR==10029' File1 > File2
and then doing
awk '{print $1, $2, $13, $14}' File2 > File3
Can... (3 Replies)
Hi!
I am trying to create a script to reorder the contents of a text file. Below is the text file initially, followed by how I would like it reordered:
File initially:
---
Initial lines with text and/or numbers
Initial lines with text and/or numbers
Initial lines with text and/or numbers... (11 Replies)
Hi Experts,
I just want to copy some selected strings from a a file into a new .txt file .
I am using below command to find the data now want to copy the search results into another .txt file please help me .
find /Path -exec grep -w "filename1|filename1|filename1|" '{}' \;... (2 Replies)
hi,
i have a list box , a text box and a button in a html form.
list box displays some values, when a user selects a value from the list box and press the button. the selected value should be copied to the text box value.
can any1 give me a html and javascript code to do this facility.
... (1 Reply)
I have a file that has about 3000 commands , listed one below the other. I would like to execute them all in one go. Is there a simpler way to do it - like a batch file processing, than executing one line at a time? (3 Replies)
In the bash below I am asking the user for a panel and reading that into bed. Then asking the user for a file and reading that into file1.Is the grep in bold the correct way to apply the selected panel to the file? I am getting a syntax error. Thank you :)
... (4 Replies)
Hello,
I have standard loop
while read -r info; do
command $info
done < info
in info text file I have multiple commands each on line that I want to execute. When I used them in console they worked, but not with this loop.
This is one of the commands in info file:
grep... (4 Replies)
Hey.
Someone find or write some jQuery code where we can select text with our mouse and then click or double click the highlighted / selected text and then it will wrap code tags around the highlighted text (in our editors).
:) (0 Replies)