Multiple line duplicates


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Multiple line duplicates
# 1  
Old 03-22-2012
Multiple line duplicates

I'm trying to find the text that occurs for both numbers not just one, I'm sure there's an easy way to do this but I can't see it. Can someone point me in the right direction? Not just give me an answer without explanation --> I want to learn.

myText file:
xyc 1
xyd 1
xye 1
xyf 1
xyf 1
xyf 1
xyf 2
xyg 1
xyh 1

I want to be able to essentially go "xyf has both 1 and 2 so print xyf", but I am being thrown by it being across multiple lines. I would normally try to just remove everything except what I want, but this doesn't work in this case.

I have worded this incredibly badly, but if you still understand what I mean, can you please point me in the right direction how to do this?
I've tried fiddling with awk, grep and sed to do this, all to no avail.
# 2  
Old 03-22-2012
Hi maximus73,

So you want to learn, that's good. Here is an awk script that does what you wish, if I understood correctly your question (code is commented):
Code:
$ cat infile
xyc 1
xyd 1
xye 1
xyf 1
xyf 1
xyf 1
xyf 2
xyg 1
xyh 1
leg 2
$ cat script.awk
## For every line...
{
        ## Concatenate each number in a hash. The key will
        ## be the first field.
        data[ $1 ] = data[ $1 ] $2
}

## After processing file...
END {
        ## Variable that indicates if the field has not both numbers.
        bad = 0

        ## Go throught the hash.
        for ( idx in data ) {

                ## Check if both numbers are found, when one of them fails, set
                ## 'bad' variable and break the loop.
                for ( i = 1; i <= 2; i++) {
                        if ( ! index( data[idx], i ) ) {
                                bad = 1
                                break
                        }
                }

                ## Check 'bad' variable. If unset, both numbers were found, so
                ## print the key of the hash and reset variable for next loop.
                if ( ! bad ) {
                        printf "%s\n", idx
                }
                bad = 0
        }
}
$ awk -f script.awk infile
xyf

# 3  
Old 03-22-2012
Could there be other values? ( than 1 and 2 ), can you have more line with 2?

Because if not here you can see only one 2 showing so using numeric sort it would be the last line, so using tail -1 you would have the line you would load in 2 variables to use the first one to see if there is more than one occurence...

I suppose I am as clear as you... Smilie

Last edited by vbe; 03-22-2012 at 01:31 PM.. Reason: typos
# 4  
Old 03-22-2012
Code:
perl -lane '
if (defined $x{$F[0]} && $x{$F[0]} == $F[1]) { # Check if the 1st field is already defined and is equal to 2nd field in associative array %x
    $y{$F[0]}++; # If so, increment the count of 1st field in another array %y
}
else {
    $x{$F[0]} = $F[1]; # If 1st field doesn't exist, then store it and 2nd field in array %x
    $y{$F[0]}++; # Increment count of 1st field in array %y
}
END {
    for (keys %y) { # Loop through the keys of %y
        ($y{$_} > 1) && print $_; # If count of 1st field is more than 1, then print the corresponding key
    }
}' myTextFile

# 5  
Old 03-22-2012
Smilie
I tried hard to resist but I could no more Smilie
My explanation translated in commands would give:
Code:
n12:/home/vbe/wks/z $ if [ $(sort -k 2 infile|tail -1|read VALUE NUMVAL;grep $VALUE infile|uniq|wc -l) -gt 1 ];then echo VALUE=$VALUE;fi 
VALUE=xyf

Smilie
But birei's or balajesuri's code is the way to go... Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find duplicates in file with line numbers

Hello All, This is a noob question. I tried searching for the answer but the answer found did not help me . I have a file that can have duplicates. 100 200 300 400 100 150 the number 100 is duplicated twice. I want to find the duplicate along with the line number. expected... (4 Replies)
Discussion started by: vatigers
4 Replies

2. Shell Programming and Scripting

Match a char with duplicates in a line and replace one of them

Hi, i have a huge file that need to check for a pattern that occur more than once in a line like below:- #lkk>cd-m>A0DV0>192.134.1.1 blablabladsdjsk jshdfskfslfs #lqk>cd-m>A1SV0>192.14.11.1 blalalbnalablab balablablajakjakjakja #pldqw>sf-w>PH67FR>168.55.1.1 balablabala... (5 Replies)
Discussion started by: redse171
5 Replies

3. Shell Programming and Scripting

Find All duplicates based on multiple keys

Hi All, Input.txt 123,ABC,XYZ1,A01,IND,I68,IND,NN 123,ABC,XYZ1,A01,IND,I67,IND,NN 998,SGR,St,R834,scot,R834,scot,NN 985,SGR0399,St,R180,T15,R180,T1,YY 985,SGR0399,St,R180,T15,R180,T1,NN 985,SGR0399,St,R180,T15,R180,T1,NN 2943,SGR?99,St,R68,Scot,R77,Scot,YY... (2 Replies)
Discussion started by: unme
2 Replies

4. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ... (5 Replies)
Discussion started by: saj
5 Replies

5. UNIX for Dummies Questions & Answers

script to remove duplicates per line

Hello experts! I'd like a way to remove duplicates per line. Strings are enclosed in brackets, and I would prefer to maintain the order of the file: example input (56)(63) (56)(70)(56)(70)(24) (25)(78) (12)(33)(12) (10) (10) desired output (56)(63) (56)(70)(24) (25)(78)... (5 Replies)
Discussion started by: torchij
5 Replies

6. Shell Programming and Scripting

delete from line and remove duplicates

My Input.....file1 ABCDE4435 Connected to 107.71.136.122 (SubNetwork=ONRM_RootMo_R SubNetwork=XYVLTN29CRBR99 MeContext=ABCDE4435 ManagedElement=1) ABCDE4478 Connected to 166.208.30.57 (SubNetwork=ONRM_RootMo_R SubNetwork=KLFMTN29CR0R04 MeContext=ABCDE4478 ManagedElement=1) ABCDE4478... (5 Replies)
Discussion started by: pareshkp
5 Replies

7. Shell Programming and Scripting

Search Duplicates, Print Line #

Masters, I have a text file in the following format. vrsonlviee RVEBAALSKE lyolzteglx UUOSIWMDLR pcybtapfee DKGFJBHBJO ozhrucfeau YQXATYMGJD cjwvjolrcv YDHALRYQTG mdukphspbc CQZRIOWEUB nbiqomzsgw DYSUBQSSPZ xovgvkneav HJFQQYBLAF boyyzdmzka BVTVUDHSCR vrsonlviee TGTKUCUYMA... (2 Replies)
Discussion started by: genehunter
2 Replies

8. Shell Programming and Scripting

Scanning columns for duplicates and printing in one line

Description of data: NC_002737.1 4 F1VI4M001A3IAU F1VI4M001A3IAU F1VI4M001A3IAU F1VI4M001A3IAU NC_006372.1 5 F1VI4M001BH0HY FF1VI4M001BH0HY F1VI4M001C0ZC5 F1VI4M001DOF2X F1VI4M001AYNTS Every field in every record is tab separated There can be "n" columns. Problem: What I want to... (4 Replies)
Discussion started by: Deep9000
4 Replies

9. Shell Programming and Scripting

How can i delete the duplicates based on one column of a line

I have my data something like this (08/03/2009 22:57:42.414)(:) king aaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbb (08/03/2009 22:57:42.416)(:) John cccccccccccc cccccvssssssssss baaaaa (08/03/2009 22:57:42.417)(:) Michael ddddddd tststststtststts (08/03/2009 22:57:42.425)(:) Ravi... (11 Replies)
Discussion started by: rdhanek
11 Replies

10. UNIX for Dummies Questions & Answers

identifying duplicates line & reporting their line number

I need to find to find duplicate lines in a document and then print the line numbers of the duplicates The files contain multiple lines with about 100 numbers on each line I need something that will output the line numbers where duplicates were found ie 1=5=7, 2=34=76 Any suggestions would be... (5 Replies)
Discussion started by: stresslog
5 Replies
Login or Register to Ask a Question