AWK - number of specified characters in a string Post: 302488352

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers AWK - number of specified characters in a string Post 302488352 by Olly on Sunday 16th of January 2011 11:54:49 PM

01-17-2011

Registered User

Thankyou once again m1xram for your detailed & insightful ideas.

Your code does a great job and actually outputs much more information that I had expected - but thats because of my superficial problem description. Nonetheless, it will be a valuable resource for me.

In the interim I did some tinkering with my crude code, and it seems to work too - though outputting much less. What I ultimately was trying to achieve was to insert a new field that calculates the frequency of "," or "." in my string relative to "[a-zA-Z]" - BUT once all those same characters with either $ or ~^ next to them had been removed (those extra characters act as indicators/modifiers of their adjacent character).

Despite what I'd thought, split does seem to search for groups of characters if you enclose them in parentheses.

This is my input data:

Code:

153-0	29	A	M	85	85	60	6	CC..,.	gggggg 0.667 
153-0	37	A	W	83	83	60	6	TT..,.	geggdg 0.667 
153-0	85	G	R	80	80	60	6	AA..,.	aggggg 0.667 
153-0	98	G	R	129	129	60	6	A$A$A.,.	`geggg 0.500 
176-0	48	A	W	82	82	60	7	.$TT,..,	ggggegg 0.714

$8 is the stringof interest. $7 is the number of "characters" in $8, where a character with a modifier like A$ is treated as one single character. $9 is the frequency of . or , in the $8 string.

The code I used was:

Code:

{consends = split($8, a, "((\\.\\$)|(\\,\\$)|(\\^~\\.)|(\\^~,))");
allends = split($8, a, "[\\$]|[\\^~]");
consall = split($8, a, "[\\.,]");
readnotend = $7-(allends-1);
if (readnotend == 0.000) {printf ("%s %s %s\n", $0, "1.000", readnotend)} else {printf ("%s %3.3f %s\n", $0, ((consall-1)-(consends-1))/($7-(allends-1)), readnotend);
}
}

Which gave me an outputs of:

Code:

153-0	29	A	M	85	85	60	6	CC..,.	gggggg 0.667 0.667 6
153-0	37	A	W	83	83	60	6	TT..,.	geggdg 0.667 0.667 6
153-0	85	G	R	80	80	60	6	AA..,.	aggggg 0.667 0.667 6
153-0	98	G	R	129	129	60	6	A$A$A.,.	`geggg 0.500 0.750 4
176-0	48	A	W	82	82	60	7	.$TT,..,	ggggegg 0.714 0.667 6

This adds the new frequencies, and the number of "non-modified" characters in the $8 string. You can see that where there are $ in the string the new & old frequencies differ as does the number of "characters" in the string, but otherwise they remain unchanged.

Cheers,

Olly

Last edited by Franklin52; 01-17-2011 at 04:56 AM.. Reason: Please use code tags and indent your code

Olly

View Public Profile for Olly

Find all posts by Olly

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting the number of occurances of all characters (a-z) in a string

Hi, I am trying out different scripts in PERL. I want to take a line/string as an input from the user and count the number of occurrances of all the alphabets (a..z) in the string. I tried doingit like this : #! /opt/exp/bin/perl print "Enter a string or line : "; $string = <STDIN>; chop...

2. Programming

Count the number of repeated characters in a given string

i have a string "dfasdfasdfadf" i want to count the number of times each character is repeated.. For instance, d is repeated 4 times, f is repeated 4 times.. can u give a program in c

3. Shell Programming and Scripting

number of characters in a string

Hi there, I have some user input in a variable called $VAR, and i need to ensure that the string is 5 or less characters .... does anybody know how i can count the characters in the variables ? any help would be great, cheers

4. Shell Programming and Scripting

help: Awk to control number of characters per line

Hello all, I have the following problem: My input is two sorted files: file1 >1_19_130_F3 T01220131330230213311013000000110000 >1_23_69_F3 T01200211300200200010000001000000 >1_24_124_F3 T010203113002002111111200002010 file2 >1_19_130_F3 24 18 9 18 23 4 11 4 5 9 5 8 15 20 4 4 7 4...

5. Shell Programming and Scripting

Awk to extract lines with a defined number of characters

This is my problem, my file (file A) contains the following information: Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID: Then, I need to compare the entries and determine their frequency. Thus, I...

6. Shell Programming and Scripting

How to truncate a string to x number characters?

Hello: I have a large file which contains lines like the following: 1/t123ab, &Xx:1:1234:12345:123456@ABCDEFG... at -$100.00% /t is a tab, spaces are as indicated the string "&Xx:1:1234:12345:123456$ABCDEFG..." has a slightly variable number of numbers and letters, but it always starts...

7. Shell Programming and Scripting

Help awk/sed: putting a space after numbers:to separate number and characters.

Hi Experts, How to sepearate the list digit with letters : with a space from where the letters begins, or other words from where the digits ended. file 52087mo(enbatl) 52049mo(enbatl) 52085mo(enbatl) 25051mo(enbatl) The output should be looks like: 52087 mo(enbatl) 52049...

8. Shell Programming and Scripting

Replace characters in string with awk gsub

Hi I have a source file that looks like a,b,c,d,e,f,g,h,t,DISTI(USD),MSRP(USD),DIST(EUR),MSRP(EUR),EMEA-DISTI(USD),EMEA-MSRP(USD),GLOBAl-DISTI(USD),GLOBAL-MSRP(USD),DISTI(GBP), MSRP(GBP) I want to basically change MSRP(USD) to MSRP,USD and DIST(EUR) to DIST,EUR and likewise for all i'm using...

9. Shell Programming and Scripting

awk to print column number while ignoring alpha characters

I have the following script that will print column 4 ("25") when column 1 contains "123". However, I need to ignore the alpha characters that are contained in the input file. If I were to ignore the characters my output would be column 3. What is the best way to print my column of interest...

10. UNIX for Beginners Questions & Answers

Concatenate a string and number and compare that with another string in awk script

I have below code inside my awk script if ( $0 ~ /SVC IN:/ ) { svc_in=substr( $0,23 , 3); if (msg_start == 1 && msg_end == 0) { msg_arr=$0; } } else if ( $0 ~ /^SVC OUT:/ ) { svc_out=substr( $0, 9, 3); if (msg_start == 1 && msg_end == 0) ...

LEARN ABOUT CENTOS

split

split(n)						       Tcl Built-In Commands							  split(n)

__________________________________________________________________________________________________________________________________________________

NAME

       split - Split a string into a proper Tcl list

SYNOPSIS

       split string ?splitChars?
_________________________________________________________________

DESCRIPTION

       Returns a list created by splitting string at each character that is in the splitChars argument.  Each element of the result list will con-
       sist of the characters from string that lie between instances of the characters in splitChars.  Empty list elements will  be  generated	if
       string contains adjacent characters in splitChars, or if the first or last character of string is in splitChars.  If splitChars is an empty
       string then each character of string becomes a separate element of the result list.  SplitChars defaults to the standard white-space  char-
       acters.

EXAMPLES

       Divide up a USENET group name into its hierarchical components:
	      split "comp.lang.tcl.announce" .
		    -> comp lang tcl announce

       See how the split command splits on every character in splitChars, which can result in information loss if you are not careful:
	      split "alpha beta gamma" "temp"
		    -> al {ha b} {} {a ga} {} a

       Extract the list words from a string that is not a well-formed list:
	      split "Example with {unbalanced brace character"
		    -> Example with {unbalanced brace character

       Split a string into its constituent characters
	      split "Hello world" {}
		    -> H e l l o { } w o r l d

   PARSING RECORD-ORIENTED FILES
       Parse a Unix /etc/passwd file, which consists of one entry per line, with each line consisting of a colon-separated list of fields:
	      ## Read the file
	      set fid [open /etc/passwd]
	      set content [read $fid]
	      close $fid

	      ## Split into records on newlines
	      set records [split $content "
"]

	      ## Iterate over the records
	      foreach rec $records {

		 ## Split into fields on colons
		 set fields [split $rec ":"]

		 ## Assign fields to variables and print some out...
		 lassign $fields 
		       userName password uid grp longName homeDir shell
		 puts "$longName uses [file tail $shell] for a login shell"
	      }

SEE ALSO

       join(n), list(n), string(n)

KEYWORDS

       list, split, string

Tcl																	  split(n)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting the number of occurances of all characters (a-z) in a string

Discussion started by: rsendhilmani

2. Programming

Count the number of repeated characters in a given string

Discussion started by: pgmfourms

3. Shell Programming and Scripting

number of characters in a string

Discussion started by: rethink

4. Shell Programming and Scripting

help: Awk to control number of characters per line

Discussion started by: DerSeb

5. Shell Programming and Scripting

Awk to extract lines with a defined number of characters

Discussion started by: Xterra

6. Shell Programming and Scripting

How to truncate a string to x number characters?

Discussion started by: Tectona

7. Shell Programming and Scripting

Help awk/sed: putting a space after numbers:to separate number and characters.

Discussion started by: rveri

8. Shell Programming and Scripting

Replace characters in string with awk gsub

Discussion started by: r_t_1601

9. Shell Programming and Scripting

awk to print column number while ignoring alpha characters

Discussion started by: ncwxpanther

10. UNIX for Beginners Questions & Answers

Concatenate a string and number and compare that with another string in awk script

Discussion started by: bhagya123

LEARN ABOUT CENTOS

split