Column comparison between two files: moved from another post


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Column comparison between two files: moved from another post
# 8  
Old 10-03-2010
Quote:
=danmero;302458930]Works for meSmilie..............
Please provide more info about your system.
Hi I work on a Linux multi cluster (Red Hat latest version) through Secure Shell logging from my PC at home. I will try this out on a mac and let you know.

csn

---------- Post updated 10-03-10 at 12:41 AM ---------- Previous update was 10-02-10 at 05:14 PM ----------

Quote:
Originally Posted by cs_novice
Hi I work on a Linux multi cluster (Red Hat latest version) through Secure Shell logging from my PC at home. I will try this out on a mac and let you know.

csn
Hi danmero
I checked on a stand alone mac book and I still have the same problem with my sample files. I don't understand what is happening especially since you say it works for you. I am sure I got the syntax right.

Just so I make myself clear....I am a biologist and recently started working with sequence data. I have a big table containing P values of differential gene expression, much like my sample Pval.txt (only with 30000 rows and a few more columns). I also have several "frequency files" or "count files" one for each treatment (like the sample freq.txt). However each of these files have about 130000 rows. I need to match the gene IDs for the 30000 genes on Pval.txt from the frequency files and want a table that gives the respective gene frequency (count) side by side with P value.

csn
# 9  
Old 10-03-2010
You should post some of the real rows from Pval.txt and from frequency files.
# 10  
Old 10-03-2010
Quote:
Originally Posted by bartus11
You should post some of the real rows from Pval.txt and from frequency files.
Hi bartus
I am posting the first 10 lines of the two files as under:
Gene_Pval.txt
Code:
Transc_ID    DP    Pval.cross
ID=GRMZM2G015073_T01    23.6044288292005    0.0206790394438121
ID=GRMZM2G465579_T01    2.42080832941224    0.566356492613311
ID=GRMZM2G356344_T01    31.0575268969536    0.489032543538082
ID=GRMZM2G044740_T01    8.33858514064342    0.125869127182036
ID=GRMZM2G420436_T01    4.08274762082918    0.0214579269824967
ID=GRMZM2G119852_T01    59.7782287606723    0.0372160593886689
ID=AC166636.1_FGT010    1.18004103601881    0.0180008630009030
ID=GRMZM2G100242_T02    61.4167813736184    0.0142003131557532
ID=GRMZM2G180458_T01    19.7051930517752    0.0643166007561127

Gene_Count.txt
Code:
CHR    START    END    Transc_ID    READ_COUNT    BASES_COV
    
chr1      268430147      268436813      ID=GRMZM2G015073_T01      362      4027
chr1      16776238      16779559      ID=GRMZM2G445588_T01      0      0
chr1      92273742      92275613      ID=GRMZM2G465579_T01      11      251
chr1      109050562      109054042      ID=GRMZM2G356344_T01      85      123
chr1      243260011      243280610      ID=GRMZM2G044740_T01      77      1480
chr1      260039640      260047849      ID=GRMZM2G420436_T01      13      1447
chr1      15724186      15728999      ID=GRMZM2G119852_T01      1032      1906
chr1      19922021      19924137      ID=AC166636.1_FGT010      3      89

So I need to compare field 1 (Transc_ID) of Gene_Pval.txt to field 4 (Transc_ID) of Gene_Count.txt when they match extract the READ_Count (field 5 of Gene_Count.txt) and append it as a new column in Gene_Pval.txt.

thanks
csn
# 11  
Old 10-03-2010
Try this,

Code:
awk 'NR==FNR{a[$4]=$5;next} a[$1] { print $0,"\t",a[$1]}' Gene_Count.txt Gene_Pval.txt

This User Gave Thanks to pravin27 For This Post:
# 12  
Old 10-03-2010
Code:
awk 'NR==FNR{a[$4]=$5;next}$1 in a{print $0" "a[$1]}' freq.txt Pval.txt

This User Gave Thanks to bartus11 For This Post:
# 13  
Old 10-03-2010
Quote:
Originally Posted by cs_novice
Hi bartus
I am posting the first 10 lines of the two files as under:
Next time do that from the beginning and the problem will be solved faster Smilie
Personally I'll opt for:
Code:
awk 'NR==FNR{a[$4]=$5}a[$1]{print $0"\t"a[$1]}' Gene_Count.txt Gene_Pval.txt

These 2 Users Gave Thanks to danmero For This Post:
# 14  
Old 10-03-2010
Code:
 #!/usr/bin/env ruby -w
gc=File.read("genecount").scan(/(ID=.[^ \t]*)[[:space:]]*(\d+)/).flatten!
h = Hash[*gc]
File.readlines("genepval").each do |line|
print "{line.chomp} #{h[line.split[0]]}\n" if line =~ /^ID=/
end


Last edited by kurumi; 10-03-2010 at 01:22 PM..
These 2 Users Gave Thanks to kurumi For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to count the number of files moved?

I'm writing a script for searching substring in file content and then moving found files. So far I've wrote script shown below grep -lir 'stringtofind' $1 | xargs mv -t $2 How can i count number of files moved? (4 Replies)
Discussion started by: Kadikis
4 Replies

2. Shell Programming and Scripting

Need help in column comparison & adding extra line to files

Hi, I wanted to check whether the x,y,z coordinates of two files are equal or not. At times, when one file is converted to another suitable file extension , there are some chances that the data mismatch would happen during the conversion. In order to avoid the data misfit, i would like to... (6 Replies)
Discussion started by: b@l@ji
6 Replies

3. Linux

Possible Cause of Files Not Being Moved?

Hi ULF, Good day! I'm working on a LINUX Suse server and I have an entry in CRON which looks like this below: 0 5 * * * /usr/bin/find /opt/nsfw/var/partition-all/ -name "RCV_SASN*" -exec mv '{}' /opt/nsfw/var/rcv-archive/ \; This tool runs everyday at 5am and it will just move the files... (7 Replies)
Discussion started by: rymnd_12345
7 Replies

4. Shell Programming and Scripting

column value comparison in a file

Hi, Can any one help with my below requirement. i need to compare each line by line and in each line i have to compare some columns values with previous line column values in perl script. Can any one help me........! its very urgent. Thanks (3 Replies)
Discussion started by: jam_prasanna
3 Replies

5. Shell Programming and Scripting

List moved files in text file

Hi. I am actually doing all of this on OSX, but using unix apps and script. I have built my own transparent rsync/open directory/mobility/etc set of scripts for the firm I work at, and it is all almost complete except for ONE THING. I have the classic problem with rsync where if a user... (0 Replies)
Discussion started by: Ashtefere
0 Replies

6. UNIX for Advanced & Expert Users

How to know the user who moved the files to other dir

Hi, I want to know the user ID who moved a file from one directory to another Directory. Example: File1 created by user A is present in dirA then some one has moved it to dirB using "mv" command I want to know the user ID who moved the file to dirB. As far as i know "ls -lrt" command... (1 Reply)
Discussion started by: srilaxmi
1 Replies

7. Solaris

files updated in last 10 hours should be moved

Hi, I would like to move all files that are updated in last 10 hrs. to some temporary folder. Please help. (3 Replies)
Discussion started by: sanjay1979
3 Replies

8. UNIX for Dummies Questions & Answers

Showing Moved Files

Hi everyone, In a directory I have files with various extensions. I would like to move all the files ending in .L2 into a directory: ~/test. But I would also like to show which files are being moved. Of course I could type: $ ls *.L2 $ mv *.L2 ~/test Is there a way I can combine these two... (5 Replies)
Discussion started by: msb65
5 Replies

9. UNIX for Dummies Questions & Answers

rsync, which files where moved?

Hello, I am using rsync to make sure that my folder "local" mirrors the remote directory "remote". When a file is copied from "remote" to "local", I need to apply a bash script to it. What would be a neat way to do that? Thanks ps: is there a way to edit the title of the thread (I am a bit... (5 Replies)
Discussion started by: JCR
5 Replies

10. Shell Programming and Scripting

Getting a list of filenames of moved files

I'm moving a list of files of some extension and I wish to output the moved filenames into a text file, I tried using the command below, but after all the files are moved, I got a blank file. find /abc/temp -type f -mtime +365 \( -name "*.bak" -o -name "*.log" \) -exec mv -f {} /junk \; >>... (3 Replies)
Discussion started by: chengwei
3 Replies
Login or Register to Ask a Question