Column comparison between two files: moved from another post


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Column comparison between two files: moved from another post
# 15  
Old 10-04-2010
Thanks every one Smilie (Danmero, Bartus11, Pravin27 (the Awk folks)!!! Kurumi (Ruby Wrangler!!) ygemici (Sed-lovers) !!...

@Danmero, although my problem is solved I am intrigued that your code doesn't work for my sample files (the files I posted first i.e. Freq.txt and Pval.txt. Just so I understand better, could you please explain the code? I had made my sample files quickly on windows notepad. I then remade them using vi editor and still I had the same problems running your code. I tried to debug myself but never figured it out. I do realize it must be something very mundane. Your code works fine for my actual files though!!

@kurumi....I have some problems running your Ruby script, but I will get back to you after I do some debugging myself Smilie

@ygemici..Your sed script also gave me some problems but again...I will get back to you after I try some tweaking myself Smilie



csn

---------- Post updated 10-04-10 at 10:52 AM ---------- Previous update was 10-03-10 at 05:03 PM ----------

Quote:
Originally Posted by cs_novice
@Danmero, although my problem is solved I am intrigued that your code doesn't work for my sample files (the files I posted first i.e. Freq.txt and Pval.txt. Just so I understand better, could you please explain the code? I had made my sample files quickly on windows notepad. I then remade them using vi editor and still I had the same problems running your code. I tried to debug myself but never figured it out. I do realize it must be something very mundane. Your code works fine for my actual files though!!


csn
I think I figured out what the awk one liner under discussion does....

Code:
awk 'NR==FNR{a[$4]=$5;next}a[$1]{print $0"\t"a[$1]}' file1 file2

We set NR = FNR i.e., the current count (ordinal number) of record of 1st input file and second input file are same. Then when we say
Quote:
{a[$4]=$5}a[$1]
we are capturing the elements of field5 (i.e., of file1, $5), provided field 4 (i.e. file1;$4) matches the first field of file 2 ($1), in an associative array a[$1] ( the 'next' does not let the program do anything else with file1).

We then print all fields in the file 2 and of course separated by a tab the associative array that contains the elements of $5 (from file 1).

Quote:
{print $0"\t"a[$1]}
Since we set NR==FNR the array only has as many lines as in file2 (I think this statement of mine is wrong but I am not sure)

Even as a biologist I am beginning to get interested in the nitty gritty of programing: Smilie I think I like it. I hope to able to do the same with the sed and ruby script, but that is for another day. Even a simple awk code has squeezed the maximum out of me. Smilie So much power packed in one little statement.

please feel free to correct my understanding of this.

have a good day
csn

Last edited by cs_novice; 10-03-2010 at 07:15 PM..
# 16  
Old 10-04-2010
You got most of it right, except the "NR==FNR" part. It is not setting any of these variables. Instead it is comparing them, which is true only when processing 1st file (as FNR gets reset to 1 with each file). This allows us to build associative array "a" based on contents of the 1st file, and then use it to compare values with the 2nd file.
# 17  
Old 10-04-2010
Column comparision between two files

Code:
while read line2
do
a=`echo $line2 |awk '{print $1}'`
d=`echo $line2 |awk '{print $2}'`
while read line1
do
b=`echo $line1|awk '{print $1}'`
c=`echo $line1|awk '{print $2}'`
if [ $a -eq $b ] ;
then
echo $a $c $d >>mtch
echo "matching"
else
echo "not matching"
fi
done < x
done <y


Last edited by Scott; 10-04-2010 at 02:52 PM.. Reason: Code tags
This User Gave Thanks to lnviyyapu For This Post:
# 18  
Old 10-06-2010
Quote:
Originally Posted by bartus11
You got most of it right, except the "NR==FNR" part. It is not setting any of these variables. Instead it is comparing them, which is true only when processing 1st file (as FNR gets reset to 1 with each file). This allows us to build associative array "a" based on contents of the 1st file, and then use it to compare values with the 2nd file.
Thanks...I now understand this a lot better!! These are all logic statements...it is either true or false!!
will keep updating as I get better at this.
csn
# 19  
Old 10-10-2010
a bug(?) in the awk one liner for column comparison

Hello Friends
I have been using this awk one liner
Code:
           awk 'NR==FNR{a[$4]=$5}a[$1]{print $0"\t"a[$1]}' Gene_Count.txt Pval.txt

to compare field 4 of the file Gene_Count.txt to field 1 of Pval.txt and extract field 5 of Gene_Count.txt and print it along side all columns of Pval.txt.

I know that I have already discussed quite a bit about the files, however for sake of completeness I have included the slightly modified files to illustrate the problem.
Gene_Count.txt
Code:
CHR    START    END    Transc_ID    READ_COUNT    BASES_COV
    
chr1      268430147    268436813    ID=GRMZM2G015073_T01      362   4027
chr1      16776238      16779559    ID=GRMZM2G445588_T01      0     0
chr1      109050562     109054042    ID=GRMZM2G356344_T01      85    123
chr1      243260011     243280610    ID=GRMZM2G044740_T01      77    1480
chr1      260039640     260047849    ID=GRMZM2G420436_T01      13    1447
chr1      15724186      15728999    ID=GRMZM2G119852_T01      1032    1906
chr1      19922021      19924137    ID=AC166636.1_FGT010      3    89

Pval.txt (note this now also includes the ID that has a zero count (in field 5 of Gene_count)
Code:
Transc_ID    DP    Pval.cross
ID=GRMZM2G015073_T01    23.6044288292005    0.0206790394438121
ID=GRMZM2G445588_T01    2.42080832941224    0.566356492613311
ID=GRMZM2G356344_T01    31.0575268969536    0.489032543538082
ID=GRMZM2G044740_T01    8.33858514064342    0.125869127182036
ID=GRMZM2G420436_T01    4.08274762082918    0.0214579269824967
ID=GRMZM2G119852_T01    59.7782287606723    0.0372160593886689
ID=AC166636.1_FGT010    1.18004103601881    0.0180008630009030
ID=GRMZM2G100242_T02    61.4167813736184    0.0142003131557532
ID=GRMZM2G180458_T01    19.7051930517752    0.0643166007561127

on using the awk command given above I get this out put:

Pval_count
Code:
Transc_ID    DP    Pval.cross    READ_COUNT
ID=GRMZM2G015073_T01    23.6044288292005    0.0206790394438121    362
ID=GRMZM2G356344_T01    31.0575268969536    0.489032543538082    85
ID=GRMZM2G044740_T01    8.33858514064342    0.125869127182036    77
ID=GRMZM2G420436_T01    4.08274762082918    0.0214579269824967    13
ID=GRMZM2G119852_T01    59.7782287606723    0.0372160593886689    1032
ID=AC166636.1_FGT010    1.18004103601881    0.0180008630009030    3

This is great until I noticed that row #2 of Gene_count whose $5 (Field5) is '0' is thrown out:

Code:
chr1      16776238      16779559    ID=GRMZM2G445588_T01      0     0

I have noticed that all records where the field 5 of 'Gene_Count.txt' is '0' is thrown out although this seems to defy logic (at least as far as I understand). I need the records even if the field 5 value is '0'.

Could anyone please help me with this?

thanks
CSN
# 20  
Old 10-10-2010
Try this one:
Code:
awk 'NR==FNR{a[$4]=$5_}a[$1]{print $0"\t"a[$1]}' Gene_Count.txt Pval.txt

This User Gave Thanks to danmero For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to count the number of files moved?

I'm writing a script for searching substring in file content and then moving found files. So far I've wrote script shown below grep -lir 'stringtofind' $1 | xargs mv -t $2 How can i count number of files moved? (4 Replies)
Discussion started by: Kadikis
4 Replies

2. Shell Programming and Scripting

Need help in column comparison & adding extra line to files

Hi, I wanted to check whether the x,y,z coordinates of two files are equal or not. At times, when one file is converted to another suitable file extension , there are some chances that the data mismatch would happen during the conversion. In order to avoid the data misfit, i would like to... (6 Replies)
Discussion started by: b@l@ji
6 Replies

3. Linux

Possible Cause of Files Not Being Moved?

Hi ULF, Good day! I'm working on a LINUX Suse server and I have an entry in CRON which looks like this below: 0 5 * * * /usr/bin/find /opt/nsfw/var/partition-all/ -name "RCV_SASN*" -exec mv '{}' /opt/nsfw/var/rcv-archive/ \; This tool runs everyday at 5am and it will just move the files... (7 Replies)
Discussion started by: rymnd_12345
7 Replies

4. Shell Programming and Scripting

column value comparison in a file

Hi, Can any one help with my below requirement. i need to compare each line by line and in each line i have to compare some columns values with previous line column values in perl script. Can any one help me........! its very urgent. Thanks (3 Replies)
Discussion started by: jam_prasanna
3 Replies

5. Shell Programming and Scripting

List moved files in text file

Hi. I am actually doing all of this on OSX, but using unix apps and script. I have built my own transparent rsync/open directory/mobility/etc set of scripts for the firm I work at, and it is all almost complete except for ONE THING. I have the classic problem with rsync where if a user... (0 Replies)
Discussion started by: Ashtefere
0 Replies

6. UNIX for Advanced & Expert Users

How to know the user who moved the files to other dir

Hi, I want to know the user ID who moved a file from one directory to another Directory. Example: File1 created by user A is present in dirA then some one has moved it to dirB using "mv" command I want to know the user ID who moved the file to dirB. As far as i know "ls -lrt" command... (1 Reply)
Discussion started by: srilaxmi
1 Replies

7. Solaris

files updated in last 10 hours should be moved

Hi, I would like to move all files that are updated in last 10 hrs. to some temporary folder. Please help. (3 Replies)
Discussion started by: sanjay1979
3 Replies

8. UNIX for Dummies Questions & Answers

Showing Moved Files

Hi everyone, In a directory I have files with various extensions. I would like to move all the files ending in .L2 into a directory: ~/test. But I would also like to show which files are being moved. Of course I could type: $ ls *.L2 $ mv *.L2 ~/test Is there a way I can combine these two... (5 Replies)
Discussion started by: msb65
5 Replies

9. UNIX for Dummies Questions & Answers

rsync, which files where moved?

Hello, I am using rsync to make sure that my folder "local" mirrors the remote directory "remote". When a file is copied from "remote" to "local", I need to apply a bash script to it. What would be a neat way to do that? Thanks ps: is there a way to edit the title of the thread (I am a bit... (5 Replies)
Discussion started by: JCR
5 Replies

10. Shell Programming and Scripting

Getting a list of filenames of moved files

I'm moving a list of files of some extension and I wish to output the moved filenames into a text file, I tried using the command below, but after all the files are moved, I got a blank file. find /abc/temp -type f -mtime +365 \( -name "*.bak" -o -name "*.log" \) -exec mv -f {} /junk \; >>... (3 Replies)
Discussion started by: chengwei
3 Replies
Login or Register to Ask a Question