Hi yifangt, you are welcome. Here is an explanation:
awk -F'[ \t:;,]*'
Use zero or more repetitions of the characters in square brackets as field separators
split($0,T)
Split the record $0 into array T, using FS as field separator, effectively creating a copy of $1 to $NF (allowing the reuse of $1 to $NF for output..)
for(i=NF;i>=2;i--)
reading backwards from the last field number to the 2nd ..
if (T[i]~/m[0-9]/)
if the array copy of field number "i" contains "m" followed by a digit,
{sub(/m/,x,T[i])
remove the letter m from that field.
$(T[i]+1)=c
Store the character contained in variable c into the field number contained in T[i] + 1. If for example T[i] contains 4 than store in $5
else c=T[i]
if the array copy of field number "i" does not contain m followed by a digit, it must be a new value which gets stored in variable c
NF=11
Cut off fields $12 until $NF, so that 11 fields remain
1
Print every record
OFS="\t"
Use tab as output field separator
With your actual raw data what is the required output?
S.
Thanks!
The output format is the same as your first reply, which means the header is: SNP, chromosome, the species variants (96) Locus, position. Totally 100 columns.
The body is
column 1: SNP name = BKN00000xx,
column 2: chromosome= 1, or 2 or 3 or 4 or 5
column 3~98: single nucleotide: A/T/C/G/- under each species
column 99: like At1g12300
column 100: 127861
The awk script seems the right thing. Thanks again.
Yifangt
My reply got mixed up with those to Ludwig who was trying the perl script on this. Here I copy my reply that was wrongly sent to him. Following is the output example where the most of the columns in the middle were omitted.
Your awk script is very impressive to me.
SNP chromosome Ag-0 An-1 Bay-0 Bil-5 Zdr-1 Zdr-6 Locus location
BKN000000001 1 C C C C C C AT1G01280 112482
BKN000000002 1 G G G G - - AT1G01280 112561
BKN000000003 1 G A A A A A AT1G01280 112771
.
Thank you again!
Yifang
---------- Post updated at 11:18 PM ---------- Previous update was at 11:00 PM ----------
Quote:
Originally Posted by m.d.ludwig
In your data sample:
There are six columns of data but four column headers.
Are the first three data columns the "SNP-Name"?
And the last two the "value", the 'A', 'B', 'C', 'D' in your example?
---------- Post updated at 10:51 AM ---------- Previous update was at 10:35 AM ----------
My initial implementation to generate a CSV file:
Thanks!
Tried your code.
1) The output format is close to right: 97 columns with the species as the header;
2) Each cell is not right, I want A, C, T, G or - for each column corresponding to the header;
3) The output of your code in each cell is the Locus and Location (repeated 96 times!) not the A/T/C/G/-;
Not sure what is the problem. My output file should be pretty big: 100 column x 12281 rows. Quite nervous with it.
Hi everyone,
I have a microbial diversity table in the format ;k__kingdom; p__phylum, etc, somer rows have descriptions before the :k__ (like the af028349.1 below) is there a way I can get rid of this text (which is different every time) and keep all the other columns?
Thanks a bunch!
;... (1 Reply)
Hi,
I have text file with comma seprater shown below
lu8yh,n,Fri,Feb,7,2014,16:5
deer4
deer4,n,Tue,Aug,21,,2012,on
r43ed
r43ed,n,Tue,Nov,12,2013,12:
e43sd
e43sd,n,Tue,Jan,1,,2013,on,
I am using below code to load the text file into table
#!/bin/ksh... (16 Replies)
I have this input and want output like below, how can I achieve that through awk:
Input:
CAT1 FRY-01
CAT1 FRY-04
CAT1 DRY-03
CAT1 FRY-02
CAT1 DRY-04
CAT2 FRY-03
CAT2 FRY-02
CAT2 DRY-01
FAT3 DRY-12
FAT3 FRY-06
Output:
category CAT1
item FRY-01 (7 Replies)
Hi,
I am trying to show my list, from a simple list format to a table (row and column formatted table)
Currently i have this format in my output (the formart it will always be like this ) >> first 3 lines must be on the same line aligned, and the next 3 shud be on 2nd line....:
INT1:... (10 Replies)
Hi,
I have a pipe separated text file.
Can some someone tell me how to convert it to a table?
Text File contents.
|Activities|Status1|Status2|Status3|
||NA|$io_running2|$io_running3|
|Replication Status|NA|$running2|$running3|
||NA|$master2|$master3|... (1 Reply)
I am definitely not an expert with awk, and I want to reformat a text file like the following. This is probably a very easy one for an expert out there. I would like to keep the lines in the same order, but move the heading to only be listed once above the lines.
This is what the text file... (7 Replies)
Hi All,
I need to BCP out a table into a text file along with the table headers. Normal BCP out command only bulk copies the data, and not the headers.
I am using the following command: bcp database1..table1 out file1.dat -c -t\| -b1000 -A8192 -Uuser -Ppassword -efile.dat.err
Regards,... (0 Replies)