comm -12 based on 1 column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting comm -12 based on 1 column
# 1  
Old 11-30-2010
comm -12 based on 1 column

I'd like to eliminate the rows in two files that do not share a common value in the first column. Here's my tortured logic that is way too inefficient to consider, but might show what i'm trying to do (assume the files have been sorted):
Code:
cut -f1 -d '|' file1 > file1.dat
cut -f1 -d '|' file2 > file2.dat
 
comm -12 file1.dat file2.dat > same.dat
 
grep -f same.dat file1.dat > file1_finished.dat
grep -f same.dat file2.dat > file2_finished.dat

Any thoughts on how to do this more efficiently? Thanks in advance!
Al

Last edited by Scott; 11-30-2010 at 06:57 PM.. Reason: Please use code tags
# 2  
Old 11-30-2010
Code:
man join

By the way,even if it is not exactly the same problem, you can find some source of inspiration from :
https://www.unix.com/shell-programmin...ines-file.html
# 3  
Old 11-30-2010
Code:
awk -F \| 'NR==FNR{a[$1]++;next} a[$1]' file2 file1 > file1_finished.dat


awk -F \| 'NR==FNR{a[$1]++;next} a[$1]' file1 file2 > file2_finished.dat

# 4  
Old 11-30-2010
Code:
awk -F'|' '{print"^"$1FS}' f1 f2 | sort | uniq -d | fgrep - f1 >f1.done
awk -F'|' '{print"^"$1FS}' f1 f2 | sort | uniq -d | fgrep - f2 >f2.done


Last edited by ctsgnb; 12-03-2010 at 11:07 AM..
# 5  
Old 12-03-2010
Thanks folks, what I eventually ended up with was:
Code:
awk -F'|' 'NR==FNR{++a[$1];next} $1 in a' file1 file2> first.dat
awk -F'|' 'NR==FNR{++a[$1];next} $1 in a' file2 file1> second.dat
 
comm -13 second.dat first.dat > final.dat

I should add that the various options involving grep -f were too time consuming given the size of the files, something I should have mentioned at the outset.

Thanks again.

Last edited by Scott; 12-07-2010 at 11:38 AM.. Reason: Code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies

2. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Hello everyone, I am using ksh on Solaris 10 and I'm gathering data in a CSV file that looks like this: 20170628-23:25:01,1,0,0,1,1,1,1,55,55,1 20170628-23:30:01,1,0,0,1,1,1,1,56,56,1 20170628-23:35:00,1,0,0,1,1,2,1,57,57,2 20170628-23:40:00,1,0,0,1,1,1,1,58,58,2... (6 Replies)
Discussion started by: ejianu
6 Replies

3. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

4. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

5. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

6. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

7. Shell Programming and Scripting

to add special tag to a column based on column condition

Hi All, I have following html code <TR><TD>9</TD><TD>AR_TVR_TBS </TD><TD>85000</TD><TD>39938</TD><TD>54212</TD><TD>46</TD></TR> <TR><TD>10</TD><TD>ASCV_SMY_TBS </TD><TD>69880</TD><TD>33316</TD><TD>45698</TD><TD>47</TD></TR> <TR><TD>11</TD><TD>ARC_TBS ... (9 Replies)
Discussion started by: ckwan
9 Replies

8. Shell Programming and Scripting

Comm compare, but column specific

I'm looking to compare two delimited files: file1 one|xxx two|xxx three|xxx file2 four|xxx five|xxx six|xxx one|yyy Where the result is the the file2 row whose first field does NOT appear in file1. I.e., the correct result would be: result four|xxx (3 Replies)
Discussion started by: tiggyboo
3 Replies

9. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ... (13 Replies)
Discussion started by: script_op2a
13 Replies

10. Shell Programming and Scripting

How do you print column 1 from comm output to a file?

Hi guys, I have a script, which after running for 20 minutes, produces a bunch of IPs. Due to a DHCP scope, some of these IPs are not useable, so I would like to eliminate them from the final list. I have used comm to do this, but am unable to extract the first column, and redirect it to a... (1 Reply)
Discussion started by: Bloke
1 Replies
Login or Register to Ask a Question