awk or perl to parse file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk or perl to parse file
# 1  
Old 03-10-2015
awk or perl to parse file

I have an input file attached that I am trying to parse in tab-delimanted format:
The chromosomal variant column contains all the information:
parse rules:
1. 4 zeros after the NC_ and the digits before the .
2. digits after the g. repeated twice separated by a tab
3. letter before the >
4. letter after the >

Code:
Example input:
NC_000013.10:g.20763477C>G
NC_00001.10:g.20763477C>G

Desired output:
13     20763477     20763477     C     G 
1     20763477     20763477     C     G

I'm not sure if this is right or if there is a better way.
Code:
 awk 'NR > 1 { for (i = 6; i <= NF; i++) if ($i < 100) $i = "NA" }; 1' OFS="\t"  ${id}.txt > ${id}_parse.txt

Code:
 perl -lane 'map{$_="NA" if $_<100}@F[5..$#F] if $.>1; print join "\t", "@F"' ${id}.txt > ${id}_parse.txt

${id} = text file attached

Thank you Smilie.

Last edited by Don Cragun; 03-11-2015 at 01:28 AM.. Reason: Change HTML tags to CODE tags.
# 2  
Old 03-10-2015
something along these lines: awk -f cmc.awk OFS='\t' GJB-2.txt where cmc.awk is:
Code:
FNR > 1{
   for(i=1;i<=NF;i++)
     if ($i ~ /^NC_0000/) {
       n=split($i,a, "[.:>_]")
       print a[2]+0, a[5]+0, a[5]+0, substr(a[5],length(a[5])), a[n]
     }
}

# 3  
Old 03-11-2015
I have put that in a bash menu below, but the syntax is not correct as the menu opens and closes right away. Thank you Smilie.

Code:
Code:
 convert() {
    printf "\n\n"
    cd 'C:\Users\cmccabe\Desktop\annovar'
    awk -f FNR > 1{           for(i=1;i<=NF;i++)          if ($i ~ /^NC_0000/) {        n=split($i,a, "[.:>_]")        print a[2]+0, a[5]+0, a[5]+0substr(a[5],length(a[5])), a[n]      } }  OFS='\t' ${id}.txt
        *) convert ;;
    esac
}

${id}.txt is GJB-2.txt, but since the name of that file can change I used a variable to represent the file. Thanks.
# 4  
Old 03-11-2015
Quote:
Originally Posted by cmccabe
I have put that in a bash menu below, but the syntax is not correct as the menu opens and closes right away. Thank you Smilie.

Code:
Code:
 convert() {
    printf "\n\n"
    cd 'C:\Users\cmccabe\Desktop\annovar'
    awk -f FNR > 1{           for(i=1;i<=NF;i++)          if ($i ~ /^NC_0000/) {        n=split($i,a, "[.:>_]")        print a[2]+0, a[5]+0, a[5]+0substr(a[5],length(a[5])), a[n]      } }  OFS='\t' ${id}.txt
        *) convert ;;
    esac
}

${id}.txt is GJB-2.txt, but since the name of that file can change I used a variable to represent the file. Thanks.
Does the script work in the stand-alone mode?
You need to revisit your shell/awk manuals - that's not the way integrate awk in a shell script.
Thanks.
# 5  
Old 03-11-2015
As a stand-alone I get this error:

Code:
 
awk -f FNR > 1{for(i=1;i<=NF;i++) if ($i ~ /^NC_0000/) {n=split($i,a, "[.:>_]") print a[2]+0,a[5]+0,a[5]+0,substr(a[5],length(a[5])), a[n]} } OFS='\t' GJB-2.txt > parse.txt
-bash: syntax error near unexpected token `('

Can I make GJB-2.txt ${id}.txt, since the name of the file changes frequently but ${id}.txt should save this change each time? Thank you Smilie.
# 6  
Old 03-11-2015
Quote:
Originally Posted by cmccabe
As a stand-alone I get this error:

Code:
 
awk -f FNR > 1{for(i=1;i<=NF;i++) if ($i ~ /^NC_0000/) {n=split($i,a, "[.:>_]") print a[2]+0,a[5]+0,a[5]+0,substr(a[5],length(a[5])), a[n]} } OFS='\t' GJB-2.txt > parse.txt
-bash: syntax error near unexpected token `('

Can I make GJB-2.txt ${id}.txt, since the name of the file changes frequently but ${id}.txt should save this change each time? Thank you Smilie.
This is not what I suggested.
Yes, you can make the input file whatever you want and however you want.
Thanks
# 7  
Old 03-11-2015
I am just a scientist learning to program. In post 2 I substitued the cmc.awk with the code and used the join line function in notepad++.

Code:
 awk -f cmc.awk OFS='\t' GJB-2.txt

Did I read it wrong? Thank you Smilie.


join line in notepad++
Code:
 FNR > 1{
   for(i=1;i<=NF;i++)
     if ($i ~ /^NC_0000/) {
       n=split($i,a, "[.:>_]")
       print a[2]+0, a[5]+0, a[5]+0, substr(a[5],length(a[5])), a[n]
     }
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parse through a txt file PERL scripting

Below is a perl code I am trying. #!/usr/bin/perl #use strict; use warnings qw/ all FATAL /; use constant ENV_FILE => '/apps/env_data.txt'; $uenv = $ARGV; my $input = $uenv; open my $fh, '<', ENV_FILE or die sprintf qq{Unable to open "%s" for input: $!}, ENV_FILE; ... (2 Replies)
Discussion started by: Tuxidow
2 Replies

2. Shell Programming and Scripting

Using awk to Parse File

Hi all, I have a file that contains a good hundred of these job definitions below: Job Name Last Start Last End ST Run Pri/Xit ________________________________________________________________ ____________________... (7 Replies)
Discussion started by: atticuss
7 Replies

3. Shell Programming and Scripting

Parse a file using awk

Hi Experts, I am trying to parse the following file; FILEA a|b|c|c|c|c a|b|d|d|d|d e|f|a|a|a|a e|f|b|b|b|boutput expected: a<TAB>b <TAB><TAB>c<TAB>c<TAB>c<TAB>c<TAB> <TAB><TAB>d<TAB>d<TAB>d<TAB>d<TAB> e<TAB>f <TAB><TAB>a<TAB>a<TAB>a<TAB>a<TAB> <TAB><TAB>b<TAB>b<TAB>b<TAB>b<TAB>*... (7 Replies)
Discussion started by: rajangupta2387
7 Replies

4. Shell Programming and Scripting

Parse a file with awk?

Hi guys (and gals). I need some help. I'm running an IVR purely on Asterisk where I capture the DTMFs. After pulsing each DTMF I have Asterisk write to a file with whatever was dialed (mostly used for record-keeping) and at the end of the survey I write all variables in a single line to a... (2 Replies)
Discussion started by: tulf210
2 Replies

5. Shell Programming and Scripting

Perl: Parse Hex file into fields

Hi, I want to split/parse certain bits of the hex data into another field. Example: Input data is Word1: 4f72abfd Output: Parse bits (5 to 0) into field word1data1=0x00cd=205 decimal Parse bits (7 to 6) into field word1data2=0x000c=12 decimal etc. Word2: efff3d02 Parse bits (13 to... (1 Reply)
Discussion started by: morrbie
1 Replies

6. Shell Programming and Scripting

AWK - Parse a big file

INPUT SAMPLE Symmetrix ID : 000192601507 Masking View Name : TS00P22_13E_1 Last updated at : 05:10:18 AM on Tue Mar 22,2011 Initiator Group Name : 10000000c960b9cd Host Initiators { WWN : 10000000c960b9cd } Port Group Name :... (8 Replies)
Discussion started by: greycells
8 Replies

7. Shell Programming and Scripting

Shell script (not Perl) to parse xml with awk

Hi, I have to make an script according to these: - I have couples of files like: xxxxxxxxxxxxx.csv xxxxxxxxxxxxx_desc.xml - every xml file has diferent fields, but keeps this format: ........ <defaultName>2011-02-25T16:43:43.582Z</defaultName> ........... (2 Replies)
Discussion started by: Pluff
2 Replies

8. Shell Programming and Scripting

Parse file contents in perl...

Hi, I have the file like this: #Contents of file 1 are: Dec 10 12:33:44 User1 Interface: Probe Dec 10 12:33:47 uSER1 SOME DATA Dec 10 12:33:47 user1 Interface: MSGETYPE Dec 10 12:34:48 user1 ID: 10. Dec 10 12:33:55 user1 Interface: MSGTYPE Dec 10 12:33:55 user1 Id: 9 ... (1 Reply)
Discussion started by: vanitham
1 Replies

9. Shell Programming and Scripting

Parse file using awk and work in awk output

hi guys, i want to parse a file using public function, the file contain raw data in the below format i want to get the output like this to load it to Oracle DB MARWA1,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 MARWA2,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 this the file raw format: Number of... (6 Replies)
Discussion started by: dagigg
6 Replies

10. Shell Programming and Scripting

CSV File parse help in Perl

Folks, I have a bit of an issue trying to obtain some data from a csv file using PERL. I can sort the file and remove any duplicates leaving only 4 or 5 rows containing data. My problem is that the data contained in the original file contains a lot more columns and when I try ro run this script... (13 Replies)
Discussion started by: lodey
13 Replies
Login or Register to Ask a Question