awk regular expression Post: 302785031

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expression query in AWK

I have a varable(var1) in a AWK script that contain data in the following format - I need to extract timestamp,priority and log message.I can extract these by using split function but i don't want to use it, since i want to extract it in one go. I have some difficulties in doing it using...

2. Shell Programming and Scripting

awk and regular expression

Ive got a file with words and also numbers. Bla BLA 10 10 11 29 12 89 13 35 And i need to change "10,29,89,25" and also remove anything that contains actually words...

3. UNIX for Dummies Questions & Answers

regular expression and awk

I can print a line with an expression using this: awk '/regex/' I can print the line immediately before an expression using this: awk '/regex/{print x};{x=$0}' How do I print the line immediately before and then the line with the expression?

4. Shell Programming and Scripting

need help guys for Regular expression in awk

Hello Experts, Please help me to cope with the following problem I ve patterens like Input Noptx(5) // remain the same -*Nop(3); Nop(9); --Nop(8); // remain the same d3 **---Nop(7); //remain the same d3 **---Nop(7); *--Nop(6); --**Nop(5); -Nop(4); Nop(3); - represents a space...

5. Shell Programming and Scripting

Regular expression query in AWK

Hi, I have a string like this-->"After Executing service For 10 Request" in this string i need to extract "10". the contents of the string is variable and "10" appears before "For" and after "Request" i.e, in this format "For x Request" I need to extract the value of x. How to do this in AWK?...

6. UNIX for Advanced & Expert Users

Regular Expression Error in AWK

I have a file "fwcsales_filenames.txt" which has a list of file names that are supposed to be copied to another directory. In addition to that, I am trying to extract the date part and write to the log. I am getting the regular expression error when trying to strip the date part using the "ll"...

7. Shell Programming and Scripting

Regular expression in AWK

Hello world, I was wondering if there is a nicer way to write the following code (in AWK): awk ' FNR==NR&&$1~/^m$/{tok1=1} FNR==NR&&$1~/^m10$/{tok1=1} ' my_file In fact, it looks for m2, m4, m6, m8 and m10 and then return a positive flag. The problem is how to define 10 thanks...

8. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print...

9. Shell Programming and Scripting

Problem with Regular expression in awk

Hi, I have a file with two fields in it as shown below 14,30 28,30 16,30 22,30 21,30 3,30 Fields are separated by comma ",". I've been trying to validate the file based on the condition "each field must be a numeric value" I am using HP-UX OS. I have tried the following awk...

10. Shell Programming and Scripting

awk regular expression search

Hi All, I would like to search a regular expression by passing as an i/p variableto AWK. For Example :: 162.111.101.209.9516 162.111.101.209.41891 162.111.101.209.9516 162.111.101.209.9517 162.111.101.209.41918 162.111.101.209.9517 162.111.101.209.41937 162.111.101.209.41951...

LEARN ABOUT DEBIAN

cdhit-est

CD-HIT-EST(1)							   User Commands						     CD-HIT-EST(1)

NAME

       cdhit-est - run CD-HIT algorithm on RNA/DNA sequences

SYNOPSIS

       cdhit-est [Options]

DESCRIPTION

	      ====== CD-HIT version 4.6 (built on Apr 26 2012) ======

       Options

       -i     input filename in fasta format, required

       -o     output filename, required

       -c     sequence	identity threshold, default 0.9 this is the default cd-hit's "global sequence identity" calculated as: number of identical
	      amino acids in alignment divided by the full length of the shorter sequence

       -G     use global sequence identity, default 1 if set to 0, then use local sequence identity, calculated as :  number  of  identical  amino
	      acids  in  alignment  divided  by  the length of the alignment NOTE!!! don't use -G 0 unless you use alignment coverage controls see
	      options -aL, -AL, -aS, -AS

       -b     band_width of alignment, default 20

       -M     memory limit (in MB) for the program, default 800; 0 for unlimitted;

       -T     number of threads, default 1; with 0, all CPUs will be used

       -n     word_length, default 10, see user's guide for choosing it

       -l     length of throw_away_sequences, default 10

       -d     length of description in .clstr file, default 20 if set to 0, it takes the fasta defline and stops at first space

       -s     length difference cutoff, default 0.0 if set to 0.9, the shorter sequences need to be at least 90% length of the	representative	of
	      the cluster

       -S     length  difference  cutoff  in  amino acid, default 999999 if set to 60, the length difference between the shorter sequences and the
	      representative of the cluster can not be bigger than 60

       -aL    alignment coverage for the longer sequence, default 0.0 if set to 0.9, the alignment must covers 90% of the sequence

       -AL    alignment coverage control for the longer sequence, default 99999999 if set to 60, and the length of the sequence is 400,  then  the
	      alignment must be >= 340 (400-60) residues

       -aS    alignment coverage for the shorter sequence, default 0.0 if set to 0.9, the alignment must covers 90% of the sequence

       -AS    alignment  coverage control for the shorter sequence, default 99999999 if set to 60, and the length of the sequence is 400, then the
	      alignment must be >= 340 (400-60) residues

       -A     minimal alignment coverage control for the both sequences, default 0 alignment must cover >= this value for both sequences

       -uL    maximum unmatched percentage for the longer sequence, default 1.0 if set to 0.1, the unmatched region (excluding leading and tailing
	      gaps) must not be more than 10% of the sequence

       -uS    maximum  unmatched percentage for the shorter sequence, default 1.0 if set to 0.1, the unmatched region (excluding leading and tail-
	      ing gaps) must not be more than 10% of the sequence

       -U     maximum unmatched length, default 99999999 if set to 10, the unmatched region (excluding leading and tailing gaps) must not be  more
	      than 10 bases

       -B     1  or  0, default 0, by default, sequences are stored in RAM if set to 1, sequence are stored on hard drive it is recommended to use
	      -B 1 for huge databases

       -p     1 or 0, default 0 if set to 1, print alignment overlap in .clstr file

       -g     1 or 0, default 0 by cd-hit's default algorithm, a sequence is clustered to the first cluster that meet the  threshold  (fast  clus-
	      ter).  If  set  to 1, the program will cluster it into the most similar cluster that meet the threshold (accurate but slow mode) but
	      either 1 or 0 won't change the representatives of final clusters

       -r     1 or 0, default 1, by default do both +/+ & +/- alignments if set to 0, only +/+ strand alignment

       -mask  masking letters (e.g. -mask NX, to mask out both 'N' and 'X')

       -match matching score, default 2 (1 for T-U and N-N)

       -mismatch
	      mismatching score, default -2

       -gap gap opening score, default -6

       -gap-ext
	      gap extension score, default -1

       -bak write backup cluster file (1 or 0, default 0)

       -h     print this help

	      Questions, bugs, contact Limin Fu at l2fu@ucsd.edu, or Weizhong Li at liwz@sdsc.edu For updated  versions  and  information,  please
	      visit: http://cd-hit.org

	      cd-hit web server is also available from http://cd-hit.org

	      If you find cd-hit useful, please kindly cite:

	      "Clustering  of  highly  homologous  sequences  to reduce thesize of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam
	      Godzik. Bioinformatics, (2001) 17:282-283 "Cd-hit: a fast program for clustering and comparing large sets of protein  or	nucleotide
	      sequences", Weizhong Li & Adam Godzik. Bioinformatics, (2006) 22:1658-1659

cd-hit-est 4.6-2012-04-25					    April 2012							     CD-HIT-EST(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular expression query in AWK

Discussion started by: omprasad

2. Shell Programming and Scripting

awk and regular expression

Discussion started by: maskot

3. UNIX for Dummies Questions & Answers

regular expression and awk

Discussion started by: nickg

4. Shell Programming and Scripting

need help guys for Regular expression in awk

Discussion started by: user_prady

5. Shell Programming and Scripting

Regular expression query in AWK

Discussion started by: omprasad

6. UNIX for Advanced & Expert Users

Regular Expression Error in AWK

Discussion started by: madhunk

7. Shell Programming and Scripting

Regular expression in AWK

Discussion started by: jolecanard

8. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Discussion started by: jessy83

9. Shell Programming and Scripting

Problem with Regular expression in awk

Discussion started by: meetsriharsha

10. Shell Programming and Scripting

awk regular expression search

Discussion started by: Girish19

LEARN ABOUT DEBIAN

cdhit-est