Sponsored Content
Top Forums Shell Programming and Scripting Compare two files and extract info Post 302935620 by nans on Wednesday 18th of February 2015 07:08:31 AM
Old 02-18-2015
Compare two files and extract info

Hello,
I have two files which look like this
File 1
Code:
Name    test1    status    P
Gene1    0.00236753    1    1.00E-01
Gene2    0.134187    2    2.00E-01
Gene3    0.000608716    2    3.00E-01
Gene4    0.0016234    1    4.00E-01
Gene5    0.000665868    2    5.00E-01

and file 2
Code:
No    Pos    rsid    a1    a2    geneid    categ    wgt    P
1    100    SNP1    a1    a2    Gene1    HIGH    -0.67249    6.91E-01
2    200    SNP2    a1    a2    Gene1    HIGH    -0.719    8.49E-01
3    300    SNP3    a1    a2    Gene1    MEDIUM    2.09    1.70E-01
4    400    SNP4    a1    a2    Gene1    HIGH    -0.122172    6.91E-01
5    500    SNP5    a1    a2    Gene1    HIGH    -0.906466    8.49E-01
6    600    SNP6    a1    a2    Gene1    HIGH    -0.02618    9.88E-01
7    700    SNP7    a1    a2    Gene2    HIGH    -0.999206    6.34E-01
8    800    SNP8    a1    a2    Gene2    HIGH    -0.998448    8.67E-01
9    900    SNP9    a1    a2    Gene3    HIGH    -0.059699    2.94E-01
10    1000    SNP10    a1    a2    Gene4    MEDIUM    2.19    4.79E-01
11    2000    SNP11    a1    a2    Gene4    VERY HIGH    2.3    7.19E-02
12    3000    SNP12    a1    a2    Gene4    HIGH    -0.992672    1.55E-01
13    4000    SNP13    a1    a2    Gene4    HIGH    -0.791565    3.50E-01
14    5000    SNP14    a1    a2    Gene5    LOW    0.860334608    6.67E-02
15    6000    SNP15    a1    a2    Gene5    LOW    0.805402062    2.09E-02
16    7000    SNP16    a1    a2    Gene5    VERY HIGH    0.430167304    6.67E-02
17    8000    SNP17    a1    a2    Gene5    VERY HIGH    0.727742605    7.53E-01
18    9000    SNP18    a1    a2    Gene5    HIGH    -0.999286    5.41E-01

I would like to count the "SNPs" under column "rsid" from file 2 for each corresponding "Name" in file 1 and would like to output the lowest value "P" with the corresponding categ and rs ID from file 2. So from the example above, I require an output that looks like this

Code:
Name    test1    status    P    no of SNPs    Top rs ID    Top categ    Top P
Gene1    0.00236753    1    1.00E-01    6    SNP3    MEDIUM    1.70E-01
Gene2    0.134187      2    2.00E-01    2    SNP7    HIGH    6.34E-01
Gene3    0.000608716   2    3.00E-01    1    SNP9    HIGH    2.94E-01
Gene4    0.0016234     1    4.00E-01    4    SNP11  VERY HIGH    7.19E-02
Gene5    0.000665868   2    5.00E-01    5    SNP15   LOW    2.09E-02

Is it possible to do this with shell script ? Any help would be appreciated.

Many thanks
 

10 More Discussions You Might Find Interesting

1. AIX

need to extract info from log files

hi guys i need to extract information from log files generated by an application. log file has the following lines for each process.. ---------------------------------------------- Fri Aug 03 12:06:43 WST 2007 INFO: Running project PROJECT1 Fri Aug 03 12:06:43 WST 2007 INFO: Source Files... (7 Replies)
Discussion started by: kirantalla
7 Replies

2. AIX

Extract info

Anyone have a better idea to automate extraction of info like ... "uname" "ifconfig" "ps efl" "netstat -ao" etc. from several hundred aix, solaris, red hat boxes? without logging into each box and manually performing these tasks and dumping them to individual files? thanks for any input (1 Reply)
Discussion started by: chm0dvii
1 Replies

3. Shell Programming and Scripting

Compare Records between to files and extract it

I am not an expert in awk, SED, etc... but I really hope there is a way to do this, because I don't want to have to right a program. I am using C shell. FILE 1 FILE 2 H0000000 H0000000 MA1 MA1 CA1DDDDDD CA1AAAAAA MA2 ... (2 Replies)
Discussion started by: jclanc8
2 Replies

4. Shell Programming and Scripting

compare 2 files and extract the data which is not present in other file with condition

I have 2 files whose data's are as follows : fileA 00 lieferungen 00 attractiop 01 done 02 forness 03 rasp 04 alwaysisng 04 funny 05 done1 fileB alwayssng dkhf fdgdfg dfgdg sdjkgkdfjg funny rasp (7 Replies)
Discussion started by: rajniman
7 Replies

5. Shell Programming and Scripting

Compare multiple files, and extract items that are common to ALL files only

I have this code awk 'NR==FNR{a=$1;next} a' file1 file2 which does what I need it to do, but for only two files. I want to make it so that I can have multiple files (for example 30) and the code will return only the items that are in every single one of those files and ignore the ones... (7 Replies)
Discussion started by: castrojc
7 Replies

6. Shell Programming and Scripting

Compare files & extract column awk

I have two tab delimited files as given below: File_1: PV16 E1 865 2814 1950 PV16 E2 2756 3853 1098 PV16 E4 3333 3620 288 PV16 E5 3850 4101 252 PV16 E6 83 559 477 PV16 E7 562 858 297 PV16 L2 4237 5658 ... (10 Replies)
Discussion started by: vaibhavvsk
10 Replies

7. Shell Programming and Scripting

Script to extract/compare from two files.

I have two files : Alpha and Beta. The files are as follows (without arrow marks.) Alpha: A 1 D 90 G 11 B 24 C 15 Beta: B 24 C 0 <-- G 11 D 20 <-- A 4 <-- E 777 <-- Expected output of the script : Alpha: (2 Replies)
Discussion started by: linuxadmin
2 Replies

8. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

9. Shell Programming and Scripting

Compare two files and extract

Assume we have two files - FileA and FileB. Content of files are as shown below : FileA:1001,value1,value4,value8,value9 1002,value4,value32,value46,value33 1503,value5,value45,value68,value53 1605,value4,value67,value56,value57 1073,value5,value45,value68,value53... (3 Replies)
Discussion started by: alnhk
3 Replies

10. Shell Programming and Scripting

Compare 2 files and extract the data which is present in other file - awk is not working

file2 content f1file2 content f1,1,2,3,4,5 f1,2,4,6,8,10 f10,1,2,3,4,5 f10,2,4,6,8,10 f5,1,2,3,4,5 f5,2,4,6,8,10awk 'FNR==NR{a;next}; !($1 in a)' file2 file1output f10,1,2,3,4,5 f10,2,4,6,8,10 f5,1,2,3,4,5 f5,2,4,6,8,10awk 'FNR==NR{a;next}; ($1 in a)' file2 file1output nothing... (4 Replies)
Discussion started by: gksenthilkumar
4 Replies
SPINE(1)							   User Commands							  SPINE(1)

NAME
SPINE - High-speed polling agent for cacti SYNOPSIS
spine [options] [firstid lastid] DESCRIPTION
SPINE 0.8.7c Copyright 2002-2008 by The Cacti Group OPTIONS
-h/--help Show this brief help listing -f/--first=X Start polling with host X -l/--last=X End polling with host X -p/--poller=X Poller ID = X -C/--conf=F Read Spine configuration from file F -O/--option=S:V Override DB settings 'set' with value 'V' -R/--readonly This Spine run is readonly with respect to the database -S/--stdout Logging is performed to the standard output -V/--verbosity=V Set logging verbosity to <V> --snmponly Only do SNMP polling: no script stuff Either both of --first/--last must be provided, or neither can be, and in their absence, all hosts are processed. Without the --conf parameter, spine searches for its spine.conf file in the usual places. Verbosity is one of NONE/LOW/MEDIUM/HIGH/DEBUG or 1..5 Runtime options are read from the 'settings' table in the Cacti database, but they can be overridden with the --option=S:V parameter. Spine is distributed under the Terms of the GNU Lessor General Public License Version 2.1. (http://www.gnu.org/licenses/lgpl.txt) For more information, see http://www.cacti.net SPINE 0.8.7c Copyright 2002-2008 by The Cacti Group March 2009 SPINE(1)
All times are GMT -4. The time now is 02:40 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy