Hello all, I am not a programmer, but I require a little help with a project I am doing. I did read several posts and looks like awk or python may help me, though I know very little about using them. Here is my question: I have first file with 6 column.
Code:
CHR SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346
My second file is very similar to the first one, but it may or may not have the same Column 2(SNP). I suspect that the columns Col 3 and 4 (A1 & A2) may be different as well.
What I require is to get an output file with columns 1,2,3, 4, 5, 6 from the first file and the corresponding line that matches column 2 (SNP) of the FIRST file with the columns 2,3,4,5,6 (SNP, A1, A2, MAF, NCHROBS) from SECOND file at positions 7,8,9,10,11,2. The output file will hence have 11 columns; the first 6 from file 1.txt and the matching last five from file 2.txt
Code:
CHR SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1773 5346 rs2980300 T C 0.1273 4546
1 rs6603781 A G 0.1149 5346 rs6603781 G A 0.0249 4546
$ cat f1
CHR SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346
$ cat f2
CHR SNP A1 A2 MAF NCHROBS
1 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1273 4546
1 rs6603781 G A 0.0249 4546
Try this:
Code:
$ awk 'NR==FNR{k[$2]=sprintf(" %s %s %s %s %s",$2,$3,$4,$5,$6);next}{print $1,$2,$3,$4,$5,$6 k[$2]}' f2 f1
CHR SNP A1 A2 MAF NCHROBS SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1773 5346 rs2980300 T C 0.1273 4546
1 rs6603781 A G 0.1149 5346 rs6603781 G A 0.0249 4546
$
$ cat file1
CHR SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346
$
$
$ cat file2
CHR SNP A1 A2 MAF NCHROBS
0 SMP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5349
1 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346
$
$ ##
$ perl -lne 'chomp; if ($.>1) {if($ARGV eq "file1"){$x{substr($_,3)}=substr($_,3)}
> else {print $_,$x{substr($_,3)}}}' file1 file2
CHR SNP A1 A2 MAF NCHROBS
0 SMP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5349
1 rs2980319 A T 0.1503 5362 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346 rs6603781 A G 0.1149 5346
$
$
Thanks a lot Ripat and tyler_durden.
The awk command worked like a charm.
Can you help me with one more little detail.
Here is my sample file from the previous step:
Code:
1 rs4075116 G A 0.2857 546 rs4075116 C T 0.2646 2732
1 rs11260595 T G 0.02451 612 rs11260595 A C 0.02668 2774
1 rs6604968 C T 0.1672 616 rs6604968 G A 0.137 2810
1 rs11260554 A C 0.09547 618 rs11260554 T G 0.1153 2810
1 rs6603781 G A 0.1234 608 rs6603781 A G 0.1196 2810
I want the awk command to read col 3 and then look for that value in Col 8 and 9 on the same row . If it does not find the value in Col 8 and 9, then write the value of Col 2 to the output file output.txt
I am trying to learn the NR==FNR thingy.. until I grasp that.. kindly help.
This is what I came up with, but not sure if it is correct!
# cat f1
CHR SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346
# cat f2
CHR SNP A1 A2 MAF NCHROBS
1 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1273 4546
1 rs6603781 G A 0.0249 4546
# # awk 'NR==FNR{$1=$1;a[$2]=$0;b[$2]=$3;next}b[$2]==$3||b[$2]==$4{print $2 > "nonambig.txt"}$1!~"[A-Z]"{$1=a[$2];print}' f1 f2
1 rs2980319 A T 0.1503 5362 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1773 5346 rs2980300 T C 0.1273 4546
1 rs6603781 A G 0.1149 5346 rs6603781 G A 0.0249 4546
# cat nonambig.txt
SNP
rs2980319
rs2980300
rs6603781
Last edited by danmero; 09-28-2009 at 10:10 PM..
Reason: OP changed content
# cat f1
CHR SNP A1 A2 MAF NCHROBS
0 SNP_A-8414268 A G 0.1522 5354
1 rs12565286 C G 0.04139 5340
1 rs2980319 A T 0.1503 5362
1 rs2980300 T C 0.1773 5346
1 rs6603781 A G 0.1149 5346
# cat f2
CHR SNP A1 A2 MAF NCHROBS
1 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1273 4546
1 rs6603781 G A 0.0249 4546
# # awk 'NR==FNR{$1=$1;a[$2]=$0;b[$2]=$3;next}b[$2]==$3||b[$2]==$4{print $2 > "nonambig.txt"}$1!~"[A-Z]"{$1=a[$2];print}' f1 f2
1 rs2980319 A T 0.1503 5362 rs2980319 A T 0.1503 4252
1 rs2980300 T C 0.1773 5346 rs2980300 T C 0.1273 4546
1 rs6603781 A G 0.1149 5346 rs6603781 G A 0.0249 4546
# cat nonambig.txt
SNP
rs2980319
rs2980300
rs6603781
Can you kindly help by interpreting this code, that will help me understand it. It would be very helpful and I would appreciate it very much.
Code:
awk 'NR==FNR{$1=$1;a[$2]=$0;b[$2]=$3;next}b[$2]==$3||b[$2]==$4{print $2 > "nonambig.txt"}$1!~"[A-Z]"{$1=a[$2];print}' f1 f2
I have two file as given below which shows the ACL permissions of each file. I need to compare the source file with target file and list down the difference as specified below in required output. Can someone help me on this ?
Source File
*************
# file: /local/test_1
# owner: own
#... (4 Replies)
I hope I can explain this correctly. I am using Bash-4.2 for my shell.
I have a group of file names held in an array. I want to compare the names in this array against the names of files currently present in a directory. If the file does not exist in the directory, that is not a problem.... (5 Replies)
Hi Friends,
I have file1.txt
file2.txt
I tried using the diff and comm but not getting the expected output..
I want where exactly the miss match occurs. probably the field.
Sourcevalue|Targetvalue|Linenumber|field
29123975|2923975|3|1
Please help. (6 Replies)
Hi Guys ,
we have one directory ...in that directory all files will be set on each day..
files must have header ,contents ,footer..
i wants to compare the header,contents,footer ..if its same means display an error message as 'files contents same' (7 Replies)
I've two files with data like below:
file1.txt:
AAA,Apples,123
BBB,Bananas,124
CCC,Carrot,125
file2.txt:
Store1|AAA|123|11
Store2|BBB|124|23
Store3|CCC|125|57
Store4|DDD|126|38
So,the field separator in file1.txt is a comma and in file2.txt,it is |
Now,the output should be... (2 Replies)
I really need help on creating a script that does the following:
I have one file (File 1) with lines in the following format:
Name.maf score1 score2
I have a second file (File 2) with lines in the following format:
label start end Name
What I need to do is compare File 1 and... (1 Reply)
I have a file called X, which contains the following:
10
100
200
300
I then have file Y, which containts the following:
10
200
500
800
I want to write a script that will compare the contents of Y with the contents of X and ONLY return values in Y that does not exist in X (output... (5 Replies)
Hi
I need to compare shadow file sizes with their real file counterparts. If the shadow file size differs form the realfile size then it must send a mail. My problem is that our system has over 1600 shadowfiles in different directories, with different names. the only consistancy is the .sh file... (4 Replies)