How to compare two columns and retrieve data


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to compare two columns and retrieve data
# 1  
Old 07-25-2011
How to compare two columns and retrieve data

I am a newbie to Unix and slowly learning it. I have a large data set with 8 different columns. I want to compare two columns and retrieve data if the two columns have similar number.

I have attached the example. There are two columns (S-Contig and N-Contig). I want to retrieve the data from rows where these two columns have a same number. For example, both these columns have 3205 and I want to write the data from all the columns for 3205 into a new file.

Code:
S-Contig    S_D    S_P    S_PD        N-Contig    N_D    N_P    N_PD
3205          1        1      1.00000        3196         2        1            0.50000
3254          51      42     0.82353        3205        1        2           2.00000
3259          28      12     0.42857        3216        4        5           1.25000

I have been trying to figure out this question for couple of hours now using awk and I am confused. I really appreciate if some one can help me with this.

Thank you

Last edited by bjorngill; 07-25-2011 at 12:36 PM.. Reason: Need to edit the attachment
# 2  
Old 07-25-2011
Please post, wrapped by CodeTags, a sample of the text file.
Many people do not want to download external files here.
This User Gave Thanks to joeyg For This Post:
# 3  
Old 07-25-2011
Your file doesn't have unix newlines. Change them and try this:
Code:
awk '{ a[$1] = $1 } $5 == a[$5]' problem.txt

# 4  
Old 07-25-2011
Thank you for your reply. I converted the file to unix newlines and tried awk command. I am having trouble in getting the output.
$ file problem.txt
problem.txt: ASCII text, with CR line terminators
$ sed -e 's/\r$//' problem.txt > problem2.txt
$ file problem2.txt
problem2.txt: ASCII text, with CR, LF line terminators
$awk '{ a[$1] = $1 } $5 == a[$5]' problem2.txt > ans.txt

# This gives me an empty ans.txt file. I would like to write all the data for these matching columns as an output.

Thank you
# 5  
Old 07-26-2011
Code:
% cat >problem.txt
S-Contig    S_D    S_P    S_PD        N-Contig    N_D    N_P    N_PD
3205          1        1      1.00000        3196         2        1            0.50000
3254          51      42     0.82353        3205        1        2           2.00000
3259          28      12     0.42857        3216        4        5           1.25000
jazu@uf3 ~/tmp 
% awk '{ a[$1] = $1 } $5 == a[$5]' problem.txt
3254          51      42     0.82353        3205        1        2           2.00000

You need the file with only LF terminators. Try (GNU sed):
Code:
sed 's/\r/\n/g' problem.txt

But I'm afraid my solution was wrong. Try this one:

Code:
awk '{a[$1]=$1; b[$1]=$0} $5==a[$5] {print b[$5]; print}'


Last edited by yazu; 07-26-2011 at 12:45 AM..
# 6  
Old 07-26-2011
Thank you, Yazu for your help. I tried your code and it still doesn't give me a the correct output.
Here is the code I tried.

Code:
$ file problem.txt
problem.txt: ASCII text, with CR, LF line terminators
$ sed 's/\r/\n/g' problem.txt
3401ntig54      46_D    0.8000003368D    1345ntig170     1.00000 0.64286 NBH_PD
$ sed -e 's/\r$//' problem.txt > problem2.txt
$ file problem2.txt
problem2.txt: ASCII text, with CR, LF line terminators
$ perl -pe 's/\r\n|\n|\r/\n/g' problem.txt > problem2.txt
$ file problem2.txtproblem2.txt: ASCII text
$ awk '{a[$1]=$1; b[$1]=$0} $5==a[$5] {print b[$5]; print}' problem2.txt

3205    1    1    1.00000    3196    2    1    0.50000
3254    51    42    0.82353    3205    1    2    2.00000
3254    51    42    0.82353    3205    1    2    2.00000
3301    7    7    1.00000    3254    82    130    1.58537
3259    28    12    0.42857    3216    4    5    1.25000
3305    28    47    1.67857    3259    14    33    2.35714
3277    2    8    4.00000    3225    1    1    1.00000
3311    44    46    1.04545    3277    5    3    0.60000
3301    7    7    1.00000    3254    82    130    1.58537
3323    29    31    1.06897    3301    3    14    4.66667
3305    28    47    1.67857    3259    14    33    2.35714
3335    5    6    1.20000    3305    19    34    1.78947
3311    44    46    1.04545    3277    5    3    0.60000
3340    6    5    0.83333    3311    56    50    0.89286
3322    32    36    1.12500    3300    3    4    1.33333
3342    7    24    3.42857    3322    18    29    1.61111
3323    29    31    1.06897    3301    3    14    4.66667
3345    17    18    1.05882    3323    57    58    1.01754
3330    10    17    1.70000    3303    2    1    0.50000
3346    23    17    0.73913    3330    4    7    1.75000
3335    5    6    1.20000    3305    19    34    1.78947
3363    33    43    1.30303    3335    5    4    0.80000
3338    10    10    1.00000    3309    10    19    1.90000
3378    1    4    4.00000    3338    8    5    0.62500
3340    6    5    0.83333    3311    56    50    0.89286
3379    3    12    4.00000    3340    2    4    2.00000
3342    7    24    3.42857    3322    18    29    1.61111
3384    3    1    0.33333    3342    9    33    3.66667
3345    17    18    1.05882    3323    57    58    1.01754
3390    2    26    13.00000    3345    14    9    0.64286
3346    23    17    0.73913    3330    4    7    1.75000
3391    34    26    0.76471    3346    4    6    1.50000
3363    33    43    1.30303    3335    5    4    0.80000
3399    5    8    1.60000    3363    23    67    2.91304

I am not sure why some of rows are repeating. I am looking for an output like this:

Code:
3205    1    1    1.00000     3205    1    2    2.00000
3254    51    42    0.82353 3254    82    130    1.58537

Thank you for your help! I really appreciate it.
# 7  
Old 07-27-2011
???
Code:
awk '$1 == $5' problem.txt

Quote:
Code:
3205 1 1 1.00000 3205 1 2 2.00000
3254 51 42 0.82353 3254 82 130 1.58537
But you haven't such strings in your input file. What exactly kind of processing do you need?
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

2. Shell Programming and Scripting

Using shell script to compare files and retrieve connections

Hello, I want to use shell script to generate network files (I tried with python but its taking too long). I have a list of nodes: node.txt LOC_Os11g37970 LOC_Os01g07760 LOC_Os03g19480 LOC_Os11g45740 LOC_Os06g08290 LOC_Os07g02800 I have an edge-list as well: edge.txt Source_node ... (2 Replies)
Discussion started by: Sanchari
2 Replies

3. Shell Programming and Scripting

Compare columns of two files and retrieve data

Hi guys, I need your help. I have two files: file1 1 3 5 file2 1,XX 2,AA 3,BB 4,CC 5,DD I would like to compare the first column and where they are equal to write that output in a new file: 1,XX 3,BB (7 Replies)
Discussion started by: apenkov
7 Replies

4. UNIX for Dummies Questions & Answers

Retrieve data from Remote machine

Hello Please I ask if it is possible to recover data that is stored on a remote machine that I access via ssh on a usb ? if so, how? Thank you so much (5 Replies)
Discussion started by: chercheur857
5 Replies

5. Shell Programming and Scripting

how to retrieve lines that the first 4 columns have different values

Hi, all: I am not familiar with unix,and just started awk scripts. I want to retrieve lines that have the first 4 columns with different values. For example, the input is like this (tab delimited file with one header) r1 A A A A x r2 A B B A x r3 B B B B x the output should be (header is... (15 Replies)
Discussion started by: new2awkin2011
15 Replies

6. Shell Programming and Scripting

More time to retrieve data from DB

Hi All, It takes around one hour to retrieve 3 lakhs data from DB. I feel this can be still more reduced, please help me in improvising the below code, to get it retrieve faster, atleast 30 to 45 minutes. sqlplus -s ${OCAU_DB_UNAME}/${OCAU_DB_UPSWD}@${OCAU_DB_NAME} > /apps/data/filedata.txt... (4 Replies)
Discussion started by: pattamuthu
4 Replies

7. Ubuntu

How to compare two columns and fetch the common data with additional column

Dear All, I am new to this forum and please ignore my little knowledge :p I have two types of data (a subset is given below) data version 1: 439798 2 1 451209 1 2 508696 2 1 555760 2 1 582757 1 2 582889 1 2 691827... (2 Replies)
Discussion started by: evoll
2 Replies

8. Shell Programming and Scripting

Compare data and columns script

Hello all, I have below SQLs to compare data between 2 identical tables.in my database. Can any body help me to convert this into a script db2 "select *from TAB1 where PK IN (select PK from TAB2) order by BEDG_NR". Note: These SQL should run for any given 2 tables as input TAB1... (4 Replies)
Discussion started by: kanakaraju
4 Replies

9. Shell Programming and Scripting

How to retrieve data using awk command

I have a txt file with below data (textfile1.txt) select col1, col2 from Schema_Name.Table_Name1 select * from Schema_Name.Table_Name2 select col1, col2, col3 from Schema_Name.Table_Name3 select col1 from Schema_Name.Table_Name4 My output should look like Table_Name1 Table_Name2... (5 Replies)
Discussion started by: prasad4004
5 Replies

10. Shell Programming and Scripting

Retrieve data from a file

Hello guys I want to retrieve two data from a file, like this: bash-2.03$ cat numtest 123456 123457 bash-2.03$ more ./test_num #!/bin/bash num1= num2= cnt=1 while read x do num${cnt}=$x cnt=$(($cnt+1)) done <$1 echo $num1 "\n" $num2 But when i executed this script, error... (2 Replies)
Discussion started by: tpltp
2 Replies
Login or Register to Ask a Question