(N*6 table, where N is arbitary,in this case 6, where 2nd column is the name of SNP, and the 5th,6th are genotype data, where 0 means missing information)
There is another file called
file1.ped
this ped file is M*(N*2+2) table, where M is the number of individuals, and N is the number of SNPs.
First two columns are ID number, where first column and second column are identical
3,4th column correspond to the first SNP (rs1) in file1.bim file. and 5,6th column coresspond to the next SNP (rs3) in file1.bim file and so forth. Each two columns correspond to each SNP in the order of SNPs listed in the bim file.
So dimension of ped file will be (individuals)*(#of SNPS*2+2 columns of ids)
So what I would like to do first is this.
Look at the a pair of alleles expressed for the each SNP (rs1) in the bim file, I want to consider the first allele as 0, and second allele as 1. If first allele and second allele are the same, they both will be 0. If any allele is expressed as 0, it will be recoded as NA.
For instance, for the first SNP, G A are recorded. so G will be recoded as 0, and A will be recoded as 1.
Then, we apply this knowledge in ped file.
Keep in mind that first 3,4th columns correspond to the first SNP in bim file, and 5,6th columns to the second SNP, and so forth.
For the first SNP, where G is expressed as 0, and A is 1,
then we proceed this process for the rest of the SNP, then we would have
then the next step is to add each two columns together.
the final output will be
N*(M+2) table
So the ultimate output that I want is
final.txt
I have written a script for R, but I have trouble writing one in unix.
I appreciate your help in advance!
Moderator's Comments:
Please use code tags when posting data and code samples!
Last edited by johnkim0806; 08-14-2012 at 12:02 PM..
Reason: once again - code tags, PLEASE!
Beacuse 5th SNP has 0 T alleles. T is the second allele expressed for that SNP, so T will be expressed as 1 in the ped file. Hence T T will be 1 1 in the ped file.
On line 2 why does T T go to 1 1 for 2nd last allele and 0 0 on last allele?
[/QUOTE]
Hi!
I found and then adapt the code for my pipeline...
awk -F"," -vOFS="," '{printf "%0.2f %0.f\n",$2,$4}' xxx > yyy
I add -F"," -vOFS="," (for input and output as csv file) and I change the columns and the number of decimal...
It works but I have also some problems... here my columns
... (7 Replies)
Hi again. Sorry for all the questions — I've tried to do all this myself but I'm just not good enough yet, and the help I've received so far from bartus11 has been absolutely invaluable. Hopefully this will be the last bit of file manipulation I need to do.
I have a file which is formatted as... (4 Replies)
Hi, I was wondering if someone would be able to help with extrapolating information from a file and filling an existing matrix with that information.
I have made a matrix like this (file 1):
A B C D
1
2
3
4
I have another file with data like this (file 2):
1 A
1 C
3 C
4 B... (1 Reply)
Hi friends,
I'm very new to perl and got some requirement.
I've input numbers which has size of 17 characters like below:
-22500.0000000000
58750.00000000000
4944.000000000000
-900.000000000000
272.0000000000000
I need to convert these numbers from negative to positive and positive... (4 Replies)
Hello All,
I am having problem to find what is the smallest number from 90% of highest numbers from all numbers in file. I am having file with thousands of lines and hundreds of columns.
I am familiar mainly with bash but I am open to whatever suggestion witch will lead to the solutions.
If I... (11 Replies)
hey,
I have a file with numbers in US notation (1,000,000.00) as well as european notation (1.000.000,00)
i want all the numbers to be in european notation.
the numbers are in a text file, so to prevent that the regex also changes the commas in a sentence/text i thought of:
sed 's/,/\./'... (2 Replies)
Hello all,
I have a data file that needs some serious work...I have no idea how to implement the changes that are needed!
The file is a genotypic file with >64,000 columns representing genetic markers, a header line, and >1100 rows that looks like this:
ID 1 2 3 4 ... (7 Replies)
Howdy experts,
We have some ranges of number which belongs to particual group as below.
GroupNo StartRange EndRange
Group0125 935300 935399
Group2006 935400 935476
937430 937459
Group0324 935477 935549
... (6 Replies)
I have two files one (numbers file)contains the numbers(approximately 30000) and the other file(record file) contains the records(approximately 40000)which may or may not contain the numbers from that file.
I want to seperate the records which has the field 1=(any of the number from numbers... (15 Replies)