Sponsored Content
Top Forums Shell Programming and Scripting awk script to parse case with information in two fields of file Post 302940723 by cmccabe on Wednesday 8th of April 2015 01:16:13 PM
Old 04-08-2015
awk script to parse case with information in two fields of file

The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT).

Code:
Parse Rules:
The header is skipped  and
1. 4 zeros after the NC_  (not always the case) and the digits before the . 
2. g. ### (before underscore)  _### (# after the _)
3. letters after the "del" until the “ins”
4. letters after the "ins"

Desired output: 13     20763686     20763687     GG     T

Code as is so far:
Code:
 awk 'NR>1 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" out_position.txt > out_parse.txt

Thank you Smilie.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

To parse through the file and print output using awk or sed script

suppose if u have a file like that Hen ABCCSGSGSGJJJJK 15 Cock ABCCSGGGSGIJJJL 15 * * * * * * : * * * . * * * : Hen CFCDFCSDFCDERTF 30 Cock CHCDFCSDHCDEGFI 30 * . * * * * * * * : * * :* : : . The output shud be where there is : and . It shud... (4 Replies)
Discussion started by: cdfd123
4 Replies

2. Shell Programming and Scripting

Trying to Parse Version Information from Text File

I have a file name version.properties with the following data: major.version=14 minor.version=234 I'm trying to write a grep expression to only put "14" to stdout. The following is not working. grep "major.version=(+)" version.properties What am I doing wrong? (6 Replies)
Discussion started by: obfunkhouser
6 Replies

3. Shell Programming and Scripting

awk script to (un)/concatenate fields in file

Hi everyone, I'm trying to use the "join" function for more than 1 field. Since it's not possible as it is, I want to take my input files and concatenate the joining fields as 1 field (separated by "|"). I wrote 2 awk script to do and undo it (see below). However I'm new to awk and I'm certain I... (5 Replies)
Discussion started by: anthony.cros
5 Replies

4. Shell Programming and Scripting

how to parse with awk (using different fields), then group by a field?

When parsing multiple fields in a file using AWK, how do you group by one of the fields and parse by delimiters? to clarify If a file had tom | 223-2222-4444 , randofield ivan | 123-2422-4444 , random filed ... | and , are the delimiters ... How would you group by the social security... (4 Replies)
Discussion started by: Josef_Stalin
4 Replies

5. Shell Programming and Scripting

Perl: Parse Hex file into fields

Hi, I want to split/parse certain bits of the hex data into another field. Example: Input data is Word1: 4f72abfd Output: Parse bits (5 to 0) into field word1data1=0x00cd=205 decimal Parse bits (7 to 6) into field word1data2=0x000c=12 decimal etc. Word2: efff3d02 Parse bits (13 to... (1 Reply)
Discussion started by: morrbie
1 Replies

6. Shell Programming and Scripting

awk script need to act on same file on matced case

Hello, I have a log file , i want to delete the lines of the log file which is match with 1st and 5th field with different patterns. Once it will meet with that condition it will delete that line from the log . I dont want to create any temp file over there. Successfully able to retrieve the... (1 Reply)
Discussion started by: posix
1 Replies

7. Shell Programming and Scripting

AWK script to parse a data in a file

Hi Unix gurus.. I have a file which has below data, It has several MQ Queue statistics; QueueName= 'TEST1' CreateDate= '2009-10-30' CreateTime= '13.45.40' QueueType= Predefined QueueDefinitionType= Local QMinDepth= 0 QMaxDepth= 0 QueueName= 'TEST2' CreateDate= '2009-10-30'... (6 Replies)
Discussion started by: dd_psg
6 Replies

8. Shell Programming and Scripting

awk special parse case

I have a special case that awk could be used but I do not have the skill. Trying to create a final output file (indel_parse.txt) that is created from using some information from each of the two files (attached). parse rules: The header is skipped FNR>1 1. 4 zeros after the NC_ (not... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. UNIX for Advanced & Expert Users

Script to parse and compare information in two fields of file

Hello, I am working parsing a large input file1(field CFA) I have to compare the the file1 field(CFA byte 88-96) with the content of the file2(It contains only one field) and and insert rows equal in another file. Here is my code and sample input file: ... (7 Replies)
Discussion started by: GERMANOS
7 Replies

10. Shell Programming and Scripting

Parse file for fields and specific text

I have a file of ~500,000 entries in the following: file.txt chr1 11868 12227 ENSG00000223972.5 . + HAVANA exon . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type... (17 Replies)
Discussion started by: cmccabe
17 Replies
Locale::Codes::LangExt(3)				User Contributed Perl Documentation				 Locale::Codes::LangExt(3)

NAME
Locale::Codes::LangExt - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangExt; $lext = code2langext('acm'); # $lext gets 'Mesopotamian Arabic' $code = langext2code('Mesopotamian Arabic'); # $code gets 'acm' @codes = all_langext_codes(); @names = all_langext_names(); DESCRIPTION
The "Locale::Codes::LangExt" module provides access to standard codes used for identifying language extensions, such as those as defined in the IANA language registry. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language registry codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language extensions. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langext('acm','alpha'); $lext = code2langext('acm',LOCALE_LANGEXT_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from the IANA language registry, such as 'acm' for Mesopotamian Arabic. This is the default code set. ROUTINES
code2langext ( CODE [,CODESET] ) langext2code ( NAME [,CODESET] ) langext_code2code ( CODE ,CODESET ,CODESET2 ) all_langext_codes ( [CODESET] ) all_langext_names ( [CODESET] ) Locale::Codes::LangExt::rename_langext ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangExt::add_langext ( CODE ,NAME [,CODESET] ) Locale::Codes::LangExt::delete_langext ( CODE [,CODESET] ) Locale::Codes::LangExt::add_langext_alias ( NAME ,NEW_NAME ) Locale::Codes::LangExt::delete_langext_alias ( NAME ) Locale::Codes::LangExt::rename_langext_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangExt::add_langext_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangExt::delete_langext_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.16.3 2013-02-27 Locale::Codes::LangExt(3)
All times are GMT -4. The time now is 11:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy