awk to change value in field according to another Post: 303025705

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

change field content awk

I have a line like this: I want to move HTTP/1.1 200 OK to the next line and put a blank line between the two lines i.e. How can i get it using awk? Thanks in advance

2. Shell Programming and Scripting

dynamically change awk Field Separator FS

Hi All, I was wondering if anyone knew how to dynamically change the FS in awk to accept vairiable containing a field separator. the current code is as below and does not work when i introduce the dynamic FS change :-( validate_source_file() { source_file=$1 ...

3. Shell Programming and Scripting

awk,cut fields by change field format

Hi Everyone, # cat 1.txt 1321631,77770132976455,19,20091001011859,20091001011907 1321631,77770132976455,19,20091001011859,20091001011907 1321631,77770132976455,19,20091001011859,20091001011907 # cat 1.txt | awk -F, '{OFS=",";print $1,$3,$4,$5}' 1321631,19,20091001011859,20091001011907...

4. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they...

5. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1...

6. Shell Programming and Scripting

awk or sed? change field conditional on key match

Hi. I'd appreciate if I can get some direction in this issue to get me going. Datafile1: -About 4000 records, I have to update field#4 in selected records based on a match in the key field (Field#1). -Field #1 is the key field (servername) . # of Fields may vary # comment server1 bbb ccc...

7. UNIX for Dummies Questions & Answers

change field separator only from nth field until NF

Hi ! input: 111|222|333|aaa|bbb|ccc 999|888|777|nnn|kkk 444|666|555|eee|ttt|ooo|ppp With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records). In order to get: 111|222|333|aaa; bbb; ccc 999|888|777|nnn; kkk...

8. Shell Programming and Scripting

awk :how to change delimiter without giving all field name

Hi Experts, i need to change delimiter from tab to "," sample test file cat test A0000368 A29938511 072569352 5 Any 2 for �1.00 BUTCHERS|CAT FOOD|400G Sep 12 2012 12:00AM Jan 5 2014 11:59PM Sep 7 2012 12:00AM M 2.000 group 5 ...

9. Shell Programming and Scripting

awk to change value of field using multiple conditions

In the below awk in the first step I default Classification NF-1 to VUS. Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used...

10. Shell Programming and Scripting

awk to change contents of field based on condition in same file

In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output...

LEARN ABOUT DEBIAN

sim4

SIM4(1) 						      General Commands Manual							   SIM4(1)

NAME

       sim4 - align an expressed DNA sequence with a genomic sequence

SYNOPSIS

       sim4 seqfile1 seqfile2 {[WXKCRDAPNB]=value}

DESCRIPTION

       sim4  is  a  similarity-based  tool  for aligning an expressed DNA sequence (EST, cDNA, mRNA) with a genomic sequence for the gene. It also
       detects end matches when the two input sequences overlap at one end (i.e., the start of one sequence overlaps the end  of  the  other).	If
       seqfile2 is a database of sequences, the sequence in seqfile1 will be aligned with each of the sequences in seqfile2.

       sim4  employs  a  blast-based technique to first determine the basic matching blocks representing the "exon cores". In this first stage, it
       detects all possible exact matches of W-mers (i.e., DNA words of size W) between the two sequences and extends them to maximal scoring gap-
       free  segments.	In the second stage, the exon cores are extended into the adjacent as-yet-unmatched fragments using greedy alignment algo-
       rithms, and heuristics are used to favor configurations that conform to the splice-site recognition signals (GT-AG, CT-AC).  If	necessary,
       the process is repeated with less stringent parameters on the unmatched fragments.

       By  default,  sim4 searches both strands and reports the best match, measured by the number of matching nucleotides found in the alignment.
       The R command line option can be used to restrict the search to one orientation (strand) only.

       Currently, five major alignment display options are supported, controlled by the A option. By default (A=0), only  the  endpoints,  overall
       similarity,  and  orientation of the introns are reported. An arrow sign (`->' or `<-') indicates the orientation of the intron (`+' or `-'
       strand), when the signals flanking the intron have three or more position matches with either the GT-AG or  the	CT-AC  splice  recognition
       signals.  When the same number of matches is found for both orientations, the intron is reported as ambiguous, and represented by `--'. The
       sign `==' marks the absence from the alignment of a cDNA fragment starting at that position. Alternative formats (lav-block  format,  text,
       PipMaker-type `exons file', or certain combinations of these options) can be requested by specifying a different value for A.

       If the P option is specified with a non-zero value, sim4 will remove any 3'-end poly-A tails that it detects in the alignment.

       Occasionally,  sim4  may miss an internal exon when surrounded by very large introns, typically longer than 100 Kb. When this is suspected,
       the H option can be used to reset the exons' weight to compensate for the intron gap penalty.

       Ambiguity codes are by default allowed in sequence data, but sim4 treats them non-differentially. If desired,  the  B  command  option  can
       restrict the set of acceptable characters to A,C,G,T,N and X only.

       sim4  compares the lengths of the input sequences to distinguish between the cDNA (`short') and the genomic (`long') components in the com-
       parison. When seqfile2 contains a collection of sequences, the first entry in the file will be used to determine the type of this  and  all
       subsequent comparisons.

       In  the	description  below, the term MSP denotes a Maximal Segment Pair, that is, a pair of highly similar fragments in the two sequences,
       obtained during the blast-like procedure by extending a W-mer hit by matches and perhaps a few mismatches.

OPTIONS

       The algorithm parameters (included in the first two sections below) have already been tuned and do not normally require adjustment  by  the
       user.

       Parameters internal to the blast-like procedure:

       W      Sets  the  word size for blast hits in the first stage of the algorithm. The default value is 12, but it can be increased for a more
	      stringent search or decreased to find weaker matches.

       X      Controls the limits for terminating word extensions in the blast-like stage of the algorithm. The default value is 12.

       K      Sets the threshold for the MSP scores when determining the basic `exon cores', during the first stage of	the  algorithm.  (If  this
	      option  is  not  specified, the threshold is computed from the lengths of the sequences, using statistical criteria.) For example, a
	      good value for genomic sequences in the range of a few hundred Kb is 16. To avoid spurious matches, however, a larger value  may	be
	      needed for longer sequences.

       C      Sets  the  threshold  for  the MSP scores when aligning the as-yet-unmatched fragments, during the second stage of the algorithm. By
	      default, the smaller of the constant 12 and a statistics-based threshold is chosen.

       Additional algorithm parameters:

       D      Sets the bound for the "diagonal" distance within consecutive MSPs in an exon. The default value is 10.

       Context parameters:

       R      Specifies the direction of the search. If R=0, only the "+" (direct) strand is searched. If R=1, only the "-"  (reverse  complement)
	      matches  are  sought.  By  default  (R=2), sim4 searches both strands and reports the best match, measured by the number of matching
	      pairs in the alignment.

       A      Specifies the format of the output: exon endpoints only (A=0), exon endpoints and boundaries of  the  coding  region  (CDS)  in  the
	      genomic  sequence,  when specified for the input mRNA (A=5), alignment text (A=1), alignment in lav-block format (A=2), or both exon
	      endpoints and alignment text (A=3 or A=4). If a reverse complement match is found, A=0,1,2,3,5 will give its  position  in  the  "+"
	      strand  of the longer sequence and the "-" strand of the shorter sequence. A=4 will give its position in the "+" strand of the first
	      sequence (seqfile1) and the "-" strand of the second sequence (seqfile2), regardless of which sequence is longer. The A=5 option can
	      be used with the S command line option to specify the endpoints of the CDS in the mRNA, and produces output in the `exons file' for-
	      mat required by PipMaker.

       P      Specifies whether or not the program should report the fragment of the alignment containing the poly-A tail (if found).  By  default
	      (P=0)  the  alignment  is  displayed as computed, but specifying a non-zero value will request sim4 to remove the poly-A tails. When
	      this feature is enabled, all display options produce additional lav alignment headers.

       H      Resets the MSPs' weight to compensate for very large introns. The default value is H=500, but some introns larger than  100  Kb  may
	      require higher values, typically between 1000 and 2500. This option should be used cautiously, generally in cases where an unmatched
	      internal portion of the cDNA may disguise a missed exon within a very large intron. It is not recommended for ESTs, where  they  may
	      produce spurious exons.

       N      Requests	an additional search for small marginal exons (N=1) guided by the splice-site recognition signals. This option can be used
	      when a high accuracy match is expected. The default value is N=0, specifying no additional search.

       B      Controls the set of characters allowed in the input  sequences.  By  default  (B=1),  ambiguity  characters  (ABCDGHKMNRSTVWXY)  are
	      allowed. By specifying B=0, the set of acceptable characters is restricted to A,C,G,T,N and X only.

       S      Allows the user to specify the endpoints of the CDS in the input mRNA, with the syntax: S=n1..n2. This option is only available with
	      the A=5 flag, which produces output in the format required by PipMaker. Alternatively, the CDS coordinates could appear  in  a  con-
	      struct CDS=n1..n2 in the FastA header of the mRNA sequence. When the second file is an mRNA database, the command line specification
	      for the CDS will apply to the first sequence in the file only.

EXAMPLES

       sim4 est genomic

       sim4 genomic estdb

       sim4 est genomic A=1 P=1

       sim4 est1 est2 R=1

       sim4 mRNA genomic A=5 S=123..1020

       sim4 mouse_cDNA human_genomic K=15 C=11 A=3 W=10

AUTHORS

       sim4 was written by Liliana Florea <florea@gwu.edu> and Scott Schwartz.

       This  manual  page  was	 written   by	Nelson	 A.   de   Oliveira   <naoliv@gmail.com>,   based   on	 the   online	documentation	at
       http://globin.cse.psu.edu/html/docs/sim4.html, for the Debian project (but may be used by others).

							  Wed, 03 Aug 2005 18:40:58 -0300						   SIM4(1)