I have a grep written to pull out values; below (in the code snip-it) is an example of the output.
What I'm struggling to do, and looking for assistance on, is identifying the lines that have duplicate strings.
For example 74859915K74859915K in the below is 74859915K repeated twice but 32575310100014 is not a whole repeating value so I don't want to see it.
In my head (and what I'm unable to do) I want to do something like count it's length, split it in half and confirm the first half matches the second half... I'm open to suggestions as there may be a better way to do it.
Background - these values are in multiple files within an xml tag <foo></foo>. My grep is extracting them and removing the xml tags with sed leaving just the below output... it's the next step where I want to only have the true dupes.
Thanks for the reply Robin; you're on the same page as me.
Not being one to sit back and expect it to be written for me I've had a go with that pointer you gave me but I'm getting some strange results from it.
I thought I'd start simple and built up. I was half expecting (excuse the pun) that the below would output half the length of the string and store it in $half then using that combined with an awk sub string I would be able to just out put the first half of the string. The idea being I could store that in a variable do the same for the second half by getting the awk substr to start at $half for $half and I could compare the two. If they match then output if they don't then bin them.
That doesn't give me the output I was expecting. $TEMP_1 is a file name which contains the values one per line as per my previous post.
While you are one the right track, it is best not to call an external program inside a loop because that will make it very slow. You could do it all in shell inside the loop, or use a single utility instead of a shell loop..
--
Another option would be to use a back reference in a regex :
The anchors ^ and $ make sure the two identical patterns glued together form the whole line...
This User Gave Thanks to Scrutinizer For This Post:
Well good for you. We all learn better by trying, rather than being spoon-fed. With a nice pun like that, are you British?
You might need $1 in your awk rather than $0
It should still work though. This will give you the first half of each line, so you'd need to catch and compare that to the original, something like:-
Personally, I'd replace the awk with a substitution, so you are not calling awk over and again, something like this:-
Does that suit? Does it work even.........?
Spot on Scrutinizer that does exactily what I need it to do; both as a pipe on the end of my original grep or in the loop whilst reading each line.
I'm not going to pretend I know what it's doing. Can you recommend some reading on this? is it know as back referencing within normal regex?
Robin - Thank you. Whilst Scrutinizer has answered it I'm still going to read and digest your reply so that I understand how what I was trying to achieve should work. All good learning.
Thank you both.
These 2 Users Gave Thanks to brighty For This Post:
Hi All,
I have a requirement where I have to get the duplicate string count and uniq error message. Below is my file:
Rejected - Error on table TableA, column ColA.
Error String 1.
Rejected - Error on table TableA, column ColB.
Error String 2.
Rejected - Error on table TableA, column... (6 Replies)
Hi all
I have a grep written to pull out values; below (in the code snip-it) is an example of the output.
What I'm struggling to do, and looking for assistance on, is identifying the lines that have duplicate strings.
For example 74859915K74859915K in the below is 74859915K repeated twice but... (3 Replies)
Hi,
I have a file which is an extract of jil codes of all autosys jobs in our server.
Sample jil code:
**************************
permission:gx,wx
date_conditions:yes
days_of_week:all
start_times:"05:00"
condition: notrunning(appDev#box#ProductLoad)... (1 Reply)
I have a script that builds a database ~30 million lines, ~3.7 GB .cvs file. After multiple optimzations It takes about 62 min to bring in and parse all the files and used to take 10 min to remove duplicates until I was requested to add another column. I am using the highly optimized awk code:
awk... (34 Replies)
Hi Perl users,
I have another problem with text processing in Perl. I have a file below:
Linux Unix Linux Windows SUN
MACOS SUN SUN HP-AUX
I want the result below:
Unix Windows SUN
MACOS HP-AUX
so the duplicate string will be removed and also the keyword of the string on... (2 Replies)
Hi,
do you have awk or sed sommand taht will delete duplicate lines like.
sample:
server1-log1-14
server1-log2-14
superserver-time-2
superserver-log-2
output:
server-log1-14
superserver-time-2
thansk (2 Replies)
My input contains a single word lines.
From each line
data.txt
prjtestBlaBlatestBlaBla
prjthisBlaBlathisBlaBla
prjthatBlaBladpthatBlaBla
prjgoodBlaBladpgoodBlaBla
prjgood1BlaBla123dpgood1BlaBla123
Desired output -->
data_out.txt
prjtestBlaBla
prjthisBlaBla... (8 Replies)
Hi,
I have two strings like this in an array:
For example:
@a=("Brain aging is associated with a progressive imbalance between intracellular concentration of Reactive Oxygen Species","Brain aging is associated with a progressive imbalance between intracellular concentration of Reactive... (9 Replies)
Hi,
I have 3 lines in a text file that is similar to this (as a result of a diff between 2 files):
35,36d34
< DATA.EVENT.EVENT_ID.s = "3661208"
< DATA.EVENT.EVENT_ID.s = "3661208"
I am trying to get it down to just this:
DATA.EVENT.EVENT_ID.s = "3661208"
How can I do this?... (11 Replies)