How to remove matched rows from my file?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers How to remove matched rows from my file?
# 1  
Old 04-03-2017
How to remove matched rows from my file?

Hello,

I am new beginner, and just got help from this forum. The command line is :
Code:
awk  '($1, $2) in x {
    print x[$1, $2]
    print
    delete x[$1, $2]
    next
}
{    x[$2, $1] = $0
}' results>myfile

I got a output "myfile" from the orginal file 'results'. The quesion is I don't know how to get all rows that are not shown in output file, or i just want to do negative selection. I think there is a way to do that ,but I spent hours and still have no idea. My previous question was here:
How to select rows that have opposite values (A vs B, or B vs A) on first two columns?



Code:
Glyma.10G051100 Glyma.02G036000 89.91 228 23 0 1 228 1 228 1e-78 294
Glyma.10G051100 Glyma.09G023700 87.28 228 29 0 1 228 1 228 1e-68 261
Glyma.10G285200 Glyma.20G103800 96.33 1663 55 4 1 1657 1 1663 0.0 2728
Glyma.10G285200 Glyma.05G093700 95.02 321 16 0 406 726 1 321 8e-142 505
Glyma.10G212900 Glyma.17G186600 90.36 1338 129 0 1 1338 1 1338 0.0 1757
Glyma.10G212900 Glyma.05G089000 90.21 1338 131 0 1 1338 1 1338 0.0 1746
Glyma.10G212900 Glyma.16G068000 88.67 1341 146 5 1 1338 1 1338 0.0 1629
Glyma.10G212900 Glyma.19G052400 88.83 1325 148 0 1 1325 1 1325 0.0 1628
Glyma.10G212900 Glyma.05G114900 88.25 1328 156 0 1 1328 1 1328 0.0 1589
Glyma.10G212900 Glyma.19G078900 89.31 262 27 1 1074 1335 202 462 2e-88 327
Glyma.10G212900 Glyma.19G078900 89.71 204 21 0 790 993 1 204 2e-68 261
Glyma.10G296300 Glyma.20G246900 95.11 470 23 0 1 470 1 470 0.0 741
Glyma.10G296300 Glyma.20G246900 92.26 168 7 2 744 911 834 995 2e-60 233
Glyma.10G001700 Glyma.10G179600 83.45 701 113 1 44 741 50 750 0.0 649
Glyma.10G179600 Glyma.10G001700 83.45 701 113 1 50 750 44 741 0.0 649
Glyma.10G056500 Glyma.10G056300 89.27 261 24 2 41 300 61 318 4e-88 324
Glyma.10G056300 Glyma.10G056500 89.27 261 24 2 61 318 41 300 5e-88 324
Glyma.10G088600 Glyma.10G085100 97.13 522 15 0 1 522 1 522 0.0 881
Glyma.10G085100 Glyma.10G088600 97.13 522 15 0 1 522 1 522 0.0 881


Last edited by nengcheng; 04-04-2017 at 12:39 PM..
# 2  
Old 04-03-2017
Try this slight modification to your previous script. It produces two output files. The matched pairs of input lines will be written to the file named matched and the remaining input lines will be written to the file named unmatched:
Code:
awk '
($1, $2) in x {
	print x[$1, $2] > "matched"
	print > "matched"
	delete x[$1, $2]
	next
}
{	x[$2, $1] = $0
}
END {	for(key in x)
		print x[key] > "unmatched"
}' results

As with the code before, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 04-04-2017
Quote:
Originally Posted by Don Cragun
Try this slight modification to your previous script. It produces two output files. The matched pairs of input lines will be written to the file named matched and the remaining input lines will be written to the file named unmatched:
Code:
awk '
($1, $2) in x {
    print x[$1, $2] > "matched"
    print > "matched"
    delete x[$1, $2]
    next
}
{    x[$2, $1] = $0
}
END {    for(key in x)
        print x[key] > "unmatched"
}' results

As with the code before, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
Thank you for the information. The question is that the number of matched and unmatched lines are not equal to the total lines. I don't know where is wrong. I think the problem is that many A-B, B-A, pattern recurring many times but other values is different (in 3rd, 4th column etc). so the unmatched lines is significant lower than the rest of matched.

Last edited by nengcheng; 04-04-2017 at 11:54 AM..
# 4  
Old 04-04-2017
We don't either, unless you post sample input and output data and the code used (unless it's the same as posted above).
# 5  
Old 04-04-2017
Quote:
Originally Posted by RudiC
We don't either, unless you post sample input and output data and the code used (unless it's the same as posted above).
How could I upload my sample? It's a large dateset, more than 10 Mb.
# 6  
Old 04-04-2017
So, that's a catch22, isn't it? How about posting the smallest possible set of test data that shows the problem?
This User Gave Thanks to RudiC For This Post:
# 7  
Old 04-04-2017
Quote:
Originally Posted by RudiC
So, that's a catch22, isn't it? How about posting the smallest possible set of test data that shows the problem?
I upload a very small sample size. 19 in total. The command gives the 11, 6 , respectively. I don't know what's wrong. Maybe something wrong with my format?


oh, I realized that I didn't remove A-B, A-B pattern, the lines has the same value for first two columns.

Last edited by nengcheng; 04-04-2017 at 12:49 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove new line characters from data rows in a Pipe delimited file?

I have a file as below Emp1|FirstName|MiddleName|LastName|Address|Pincode|PhoneNumber 1234|FirstName1|MiddleName2|LastName3| Add1 || ADD2|123|000000000 2345|FirstName2|MiddleName3|LastName4| Add1 || ADD2| 234|000000000 OUTPUT : ... (1 Reply)
Discussion started by: styris
1 Replies

2. Shell Programming and Scripting

Shell to remove a newline char from selected rows in a file.

Greetings! Can we automate the process of removing a newline char from selected rows in a fixed width file using a shell? Input is like abcd1234 xyzd1234 abcd a1b2c3d4 abcd1234 xyzd1234 xx abcd1234 Expected output - abcd1234xyzd1234 abcda1b2c3d4abcd1234xyzd1234 xxabcd1234 ... (3 Replies)
Discussion started by: mailme0205
3 Replies

3. Shell Programming and Scripting

How to remove Blank rows in a csv file

Hi, I need help to remove blank rows at the end of file. Sample data: "Oslo, Symra kino",Oslo,130-7,Symra 1,130-7-91 "Tønsberg, Brygga Kino SF",Tønsberg,202-1,Tønsberg SF 4,202-1-4 ,,,, ,,,, ,,,, ,,,, Expected data: "Oslo, Symra kino",Oslo,130-7,Symra 1,130-7-91 "Tønsberg, Brygga... (6 Replies)
Discussion started by: cnraja
6 Replies

4. Shell Programming and Scripting

Remove 1st two rows and last 2 rows

Hi All, I need to remove 1st 2 line from head and last 2 line from last. I thought it would be possible by using the Head and tail command. But after i am using it is not possible by it. Example:Input file 1 2 3 4 5 Example: Output file 3 But my head and tail command are not... (12 Replies)
Discussion started by: kam786sim
12 Replies

5. Shell Programming and Scripting

Remove matched values and their related groups

For each value in file1 it has to check in file2 and file3. If value matched it has to delete that value and related group value in file2 and file3. In this example it takes A , deletes A and take related group value 1 and deletes E-1,then checks in file3 and deletes K-1.After that it takes D... (7 Replies)
Discussion started by: kanagaraj
7 Replies

6. Shell Programming and Scripting

To remove date and duplicate rows from a log file using unix commands

Hi, I have a log file having size of 48mb. For such a large log file. I want to get the message in a particular format which includes only unique error and exception messages. The following things to be done : 1) To remove all the date and time from the log file 2) To remove all the... (1 Reply)
Discussion started by: Pank10
1 Replies

7. UNIX for Dummies Questions & Answers

Remove rows from file

Hi to all,this is my first post here. I've a file as name 89 78 09 67 othername how I can remove the word name and othername from this file, and an eventually blank row in it?Thanks in advance. (2 Replies)
Discussion started by: cv313x
2 Replies

8. UNIX for Dummies Questions & Answers

find and remove rows from file where multi occurrences of character found

I have a '~' delimited file of 6 - 7 million rows. Each row should contain 13 columns delimited by 12 ~'s. Where there are 13 tildes, the row needs to be removed. Each row contains alphanumeric data and occasionally a ~ ends up in a descriptive field and therefore acts as a delimiter, resulting in... (1 Reply)
Discussion started by: kpd
1 Replies

9. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies
Login or Register to Ask a Question