extract data from 2 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting extract data from 2 files
# 1  
Old 02-10-2012
extract data from 2 files

file 1
Code:
WASH7P        17232,18267,18500,20564            17368,18362,18554,21139

file 2


Code:
chr1	14969	15038	Exon	WASH7P
chr1	17232	17368	Exon	WASH7P
chr1	17258	17368	Exon	WASH7P
chr1	17605	17742	Exon	WASH7P
chr1	18267	18362	Exon	WASH7P
chr1	18267	18366	Exon	WASH7P
chr1	18267	18369	Exon	WASH7P
chr1	18267	18379	Exon	WASH7P
chr1	18496	18554	Exon	WASH7P
chr1	18500	18554	Exon	WASH7P
chr1	18912	19139	Exon	WASH7P

What I need to check is for each gene if column 2 of file 2 is present in column 2 of file 1 and column 3 of file 2 is present in column 3 of file 1 . If so then these two are supposed to be placed in column 4 and 5 of file 1. I just explained one example here. I have 15000 entries with different genes(column5 of file 2). so a for loop should be there to iterate..

o/p

Code:
 WASH7P        17232,18267,18500,20564            17368,18362,18554,21139  17232,18267,18500   17232,18267,18500

Thanks,

Last edited by Franklin52; 02-11-2012 at 08:59 AM.. Reason: Adding code tags
# 2  
Old 02-15-2012
Try this:
Please let us know the performance like
Code:
#! /bin/sh

awk '
# only process file1 and store all the mappings
FNR==NR {
        # store mapping of column 2 in file1
        n=split($2,a,",")
        for(i=1;i<=n;++i) {
                map2[$1,a[i]]=1
        }

        # store mapping of column 3 in file1
        n=split($3,a,",")
        for(i=1;i<=n;++i) {
                map3[$1,a[i]]=1
        }

        # store original file
        file1[$1]=$0

        next
}

# start to process file2
# if column2 in file2 exists in file1, concatenate the index to string str2
# if column3 in file2 exists in file1, concatenate the index to string str3
{
        if ( map2[$5,$2]==1 && match2[$5,$2]!=1 ) {
                str2[$5]=sprintf("%s,%s",str2[$5],$2)
                match2[$5,$2]=1
        }
        if ( map3[$5,$3]==1 && match3[$5,$3]!=1 ) {
                str3[$5]=sprintf("%s,%s",str3[$5],$3)
                match3[$5,$3]=1
        }
}
END {
        for ( i in file1 ) {
                print file1[i], substr(str2[i],2), substr(str3[i],2)
        }
}' file1 file2


Last edited by chihung; 02-15-2012 at 10:25 PM..
# 3  
Old 02-16-2012
Works perfect!
# 4  
Old 02-16-2012
Hi,

Firstly thank you for the code..

I checked the code.. It works for the example file I posted, but in real situation it is not perfect.. What I mean is if column 2 and 3 of file 2 match the column 2 and 3 of file 1 then those are supposed to be outputted to file 1.. But according to the written code even if one of them matches it is pulling those records..

Thanks,
# 5  
Old 02-16-2012
The other way around:
Code:
awk 'NR==FNR{A[$2,$3]=1;next}
     { m=split($2,S,/,/);split($3,T,/,/);
       for(i=1;i<=m;i++) 
         if(A[S[i],T[i]]){
            $4=$4 ($4?",":x) S[i]
            $5=$5 ($5?",":x) T[i]
         }
     }1' file2 file1

# 6  
Old 02-16-2012
Thanks a lot... It worked..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match and extract data using two files

Hello, Using the information in file 1, I would like to extract from file2 all rows which matchs in column 3. file 1 1233 1230 1231 1232 file2 65733.00 19775.00 1220 65733.00 19793.00 1220 65733.00 19801.00 1220 65733.00 19809.00 1231 65733.00 19817.00 ... (2 Replies)
Discussion started by: jiam912
2 Replies

2. Shell Programming and Scripting

Extract data in tabular format from multiple files

Hi, I have directory with multiple files from which i need to extract portion of specif lines and insert it in a new file, the new file will contain a separate columns for each file data. Example: I need to extract Value_1 & Value_3 from all files and insert in output file as below: ... (2 Replies)
Discussion started by: belalr
2 Replies

3. UNIX for Dummies Questions & Answers

Extract common data out of multiple files

I am trying to extract common list of Organisms from different files For example I took 3 files and showed expected result. In real I have more than 1000 files. I am aware about the useful use of awk and grep but unaware in depth so need guidance regarding it. I want to use awk/ grep/ cut/... (7 Replies)
Discussion started by: macmath
7 Replies

4. Shell Programming and Scripting

How to extract information from two files with data range

Hi, I want to make a query about extracting data from two files that both have data ranges. the data that i want to extract; when there is matching between file1 column 2 is equal to file2 column2 , and file1 column 3 and column 4 is within the range of file2 columns 3 and 4. I would like rows... (1 Reply)
Discussion started by: houkto
1 Replies

5. Shell Programming and Scripting

Extract data with awk and write to several files

Hi! I have one file with data that looks like this: 1 data data data data 2 data data data data 3 data data data data . . . 1 data data data data 2 data data data data 3 data data data data . . . I would like to have awk to write each block to a separate file, like this: 1... (3 Replies)
Discussion started by: LinWin
3 Replies

6. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies

7. Shell Programming and Scripting

How to extract data from indexed files (ISAM files) maintained in an unix server.

Hi, Could someone please assist on a quick way of How to extract data from indexed files (ISAM files) maintained in an UNIX(AIX) server.The file data needs to be extracted in flat text file or CSV or excel format . Usually we have programs in microfocus COBOL to extract data, but would like... (2 Replies)
Discussion started by: devina
2 Replies

8. UNIX for Dummies Questions & Answers

AWK, extract data from multiple files

Hi, I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag. I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Discussion started by: Liverpaul09
8 Replies

9. Shell Programming and Scripting

extract the relevant data files for a quarter

CTB_KT_OllyotLvos_20081204_164352_200811.txt CTB_KT_LN_utahfwd_20081204_164352_200811.txt CTB_KT_LN_utahfwd_Summ_20081204_164352_200811.txt CTB_KT_PML_astdt_prFr_20081204_210153_200811.txt CTB_KT_PML_astdt_prOt_20081204_210153_200811.txt CTB_KT_PML_astdt_Nopr_20081204_210153_200811.txt... (7 Replies)
Discussion started by: w020637
7 Replies

10. Shell Programming and Scripting

Perl script for extract data from xml files

Hi All, Prepare a perl script for extracting data from xml file. The xml data look like as AC StartTime="1227858839" ID="88" ETime="1227858837" DSTFlag="false" Type="2" Duration="303" /> <AS StartTime="1227858849" SigPairs="119 40 98 15 100 32 128 18 131 23 70 39 123 20 120 27 100 17 136 12... (3 Replies)
Discussion started by: allways4u21
3 Replies
Login or Register to Ask a Question