Compare two files and extract info Post: 302935620

Sponsored Content

Top Forums Shell Programming and Scripting Compare two files and extract info Post 302935620 by nans on Wednesday 18th of February 2015 07:08:31 AM

02-18-2015

Registered User

Compare two files and extract info

Hello,
I have two files which look like this
File 1

Code:

Name    test1    status    P
Gene1    0.00236753    1    1.00E-01
Gene2    0.134187    2    2.00E-01
Gene3    0.000608716    2    3.00E-01
Gene4    0.0016234    1    4.00E-01
Gene5    0.000665868    2    5.00E-01

and file 2

Code:

No    Pos    rsid    a1    a2    geneid    categ    wgt    P
1    100    SNP1    a1    a2    Gene1    HIGH    -0.67249    6.91E-01
2    200    SNP2    a1    a2    Gene1    HIGH    -0.719    8.49E-01
3    300    SNP3    a1    a2    Gene1    MEDIUM    2.09    1.70E-01
4    400    SNP4    a1    a2    Gene1    HIGH    -0.122172    6.91E-01
5    500    SNP5    a1    a2    Gene1    HIGH    -0.906466    8.49E-01
6    600    SNP6    a1    a2    Gene1    HIGH    -0.02618    9.88E-01
7    700    SNP7    a1    a2    Gene2    HIGH    -0.999206    6.34E-01
8    800    SNP8    a1    a2    Gene2    HIGH    -0.998448    8.67E-01
9    900    SNP9    a1    a2    Gene3    HIGH    -0.059699    2.94E-01
10    1000    SNP10    a1    a2    Gene4    MEDIUM    2.19    4.79E-01
11    2000    SNP11    a1    a2    Gene4    VERY HIGH    2.3    7.19E-02
12    3000    SNP12    a1    a2    Gene4    HIGH    -0.992672    1.55E-01
13    4000    SNP13    a1    a2    Gene4    HIGH    -0.791565    3.50E-01
14    5000    SNP14    a1    a2    Gene5    LOW    0.860334608    6.67E-02
15    6000    SNP15    a1    a2    Gene5    LOW    0.805402062    2.09E-02
16    7000    SNP16    a1    a2    Gene5    VERY HIGH    0.430167304    6.67E-02
17    8000    SNP17    a1    a2    Gene5    VERY HIGH    0.727742605    7.53E-01
18    9000    SNP18    a1    a2    Gene5    HIGH    -0.999286    5.41E-01

I would like to count the "SNPs" under column "rsid" from file 2 for each corresponding "Name" in file 1 and would like to output the lowest value "P" with the corresponding categ and rs ID from file 2. So from the example above, I require an output that looks like this

Code:

Name    test1    status    P    no of SNPs    Top rs ID    Top categ    Top P
Gene1    0.00236753    1    1.00E-01    6    SNP3    MEDIUM    1.70E-01
Gene2    0.134187      2    2.00E-01    2    SNP7    HIGH    6.34E-01
Gene3    0.000608716   2    3.00E-01    1    SNP9    HIGH    2.94E-01
Gene4    0.0016234     1    4.00E-01    4    SNP11  VERY HIGH    7.19E-02
Gene5    0.000665868   2    5.00E-01    5    SNP15   LOW    2.09E-02

Is it possible to do this with shell script ? Any help would be appreciated.

Many thanks

nans

View Public Profile for nans

Find all posts by nans

10 More Discussions You Might Find Interesting

1. AIX

need to extract info from log files

hi guys i need to extract information from log files generated by an application. log file has the following lines for each process.. ---------------------------------------------- Fri Aug 03 12:06:43 WST 2007 INFO: Running project PROJECT1 Fri Aug 03 12:06:43 WST 2007 INFO: Source Files...

2. AIX

Extract info

Anyone have a better idea to automate extraction of info like ... "uname" "ifconfig" "ps efl" "netstat -ao" etc. from several hundred aix, solaris, red hat boxes? without logging into each box and manually performing these tasks and dumping them to individual files? thanks for any input

3. Shell Programming and Scripting

Compare Records between to files and extract it

I am not an expert in awk, SED, etc... but I really hope there is a way to do this, because I don't want to have to right a program. I am using C shell. FILE 1 FILE 2 H0000000 H0000000 MA1 MA1 CA1DDDDDD CA1AAAAAA MA2 ...

4. Shell Programming and Scripting

compare 2 files and extract the data which is not present in other file with condition

I have 2 files whose data's are as follows : fileA 00 lieferungen 00 attractiop 01 done 02 forness 03 rasp 04 alwaysisng 04 funny 05 done1 fileB alwayssng dkhf fdgdfg dfgdg sdjkgkdfjg funny rasp

5. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones...

6. Shell Programming and Scripting

Compare files & extract column awk

I have two tab delimited files as given below: File_1: PV16 E1 865 2814 1950 PV16 E2 2756 3853 1098 PV16 E4 3333 3620 288 PV16 E5 3850 4101 252 PV16 E6 83 559 477 PV16 E7 562 858 297 PV16 L2 4237 5658 ...

7. Shell Programming and Scripting

Script to extract/compare from two files.

I have two files : Alpha and Beta. The files are as follows (without arrow marks.) Alpha: A 1 D 90 G 11 B 24 C 15 Beta: B 24 C 0 <-- G 11 D 20 <-- A 4 <-- E 777 <-- Expected output of the script : Alpha:

8. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'...

9. Shell Programming and Scripting