Match ids and print original file | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Match ids and print original file

Shell Programming and Scripting


Tags
all entries, matching data, print

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-08-2013
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 24 March 2014, 12:50 PM EDT
Posts: 9
Thanks: 3
Thanked 0 Times in 0 Posts
Match ids and print original file

Hello,

I have two files

Original: ( 5000 entries)
Chr Position
chr1 879108
chr1 881918
chr1 896874 ...

and a file with allele freq ( 2000 entries)
Chr Position MAF
chr1 881918 0.007
chr1 979748 0.007
chr1 1120377 0.007
chr1 1178925 0.036

I would like the original file matched with the allele freq and print out the output file with 5000 entries.
Chr Position MAF
chr1 879108 NULL
chr1 881918 0.007
chr1 896874 NULL
...

Any help is appreciated. Thank you.

Last edited by nans; 03-08-2013 at 03:59 AM..
Sponsored Links
    #2  
Old 03-08-2013
busyboy busyboy is offline
Registered User
 
Join Date: Jan 2010
Last Activity: 19 March 2014, 3:51 AM EDT
Posts: 191
Thanks: 2
Thanked 9 Times in 9 Posts
what's the matching point between both files? your post doesn't clear the requirement..can you please mentioned something that is actually needed?
Sponsored Links
    #3  
Old 03-08-2013
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 24 March 2014, 12:50 PM EDT
Posts: 9
Thanks: 3
Thanked 0 Times in 0 Posts
The common column with both the files is the "position" which is the second column.
    #4  
Old 03-08-2013
busyboy busyboy is offline
Registered User
 
Join Date: Jan 2010
Last Activity: 19 March 2014, 3:51 AM EDT
Posts: 191
Thanks: 2
Thanked 9 Times in 9 Posts
if you are looking for something like matching between both files based on 2nd columne,


Code:
awk 'FNR==NR &&  NR>2 { a[$2]=$2; next } { if( $2 in a) { print  } }' original allelefreq
chr1 881918 0.007

Sponsored Links
    #5  
Old 03-08-2013
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 24 March 2014, 12:50 PM EDT
Posts: 9
Thanks: 3
Thanked 0 Times in 0 Posts
Thank you but that only prints the positions which match with the original file.
The desired output is to print all 5000 entries from the original file whether or not it has a 3rd value.

Eg:
chr1 12345 0.07
chr1 6789 NULL
chr1 13456 0.78
.....
chr22 465546 0.12
chr22 6757657 NULL
Sponsored Links
    #6  
Old 03-08-2013
busyboy busyboy is offline
Registered User
 
Join Date: Jan 2010
Last Activity: 19 March 2014, 3:51 AM EDT
Posts: 191
Thanks: 2
Thanked 9 Times in 9 Posts
reverse the filename order then


Code:
awk 'FNR==NR &&  NR>2 { a[$2]=$2; next } { if( $2 in a) { print  } }' allelefreq original

and let me know if this what you wanted
Sponsored Links
    #7  
Old 03-08-2013
nans nans is offline
Registered User
 
Join Date: Mar 2013
Last Activity: 24 March 2014, 12:50 PM EDT
Posts: 9
Thanks: 3
Thanked 0 Times in 0 Posts
Well, this gives me exactly all the entries common with original and allele freq file without the MAF values
chr1 979748
chr1 1120377
chr1 1178925
chr1 1222958
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Match and print columns in second file newpro Shell Programming and Scripting 3 04-23-2012 06:16 PM
Match values/IDs from column and text files ad23 UNIX for Dummies Questions & Answers 1 02-23-2012 05:18 PM
print when column match with other file attila Shell Programming and Scripting 1 02-17-2012 02:06 AM
uuencode then uudecode; results don't match original 33% of the time. charles_n_may UNIX for Advanced & Expert Users 6 05-12-2010 10:56 PM
awk: read file 1, search file 2, sum on match, print Bubnoff Shell Programming and Scripting 6 01-30-2010 07:16 PM



All times are GMT -4. The time now is 11:43 PM.