Sponsored Content
Top Forums UNIX for Dummies Questions & Answers AWK - number of specified characters in a string Post 302487887 by m1xram on Friday 14th of January 2011 03:01:41 AM
Old 01-14-2011
instances of delimiters

There are some issues with the match strings you supplied. For one some match strings are substrings of others. If you wanted to know occurrences of ".$" and "." you'd have to look at the count of long strings - count of short strings to determine unique short strings. Order is important.

Code:
BEGIN {
        m = "^~. .$ . A$ A";
        acnt = split(m, astr, " ");
}
{
        print $0;
        for (i = 1; i <= acnt; ++i) {
                mtch = astr[i];
                c = split($5, a, mtch);
                print "Match " i ": " astr[i] " = " c
        }
}

Output:

Code:
Match 1: ^~. = 1
Match 2: .$ = 2
Match 3: . = 8
Match 4: A$ = 1
Match 5: A = 3

Matches that have a count of 1 essentially didn't do a split. If you change "c" to "c - 1" in the program it will show the number of matches that caused splits. You can see that matches 1 and 4 didn't do anything, match 1 probably because it was at the end.

If you take match 3 - match 2 you'd have the unique matches for ".". The same is true for match 4 and 5 relative to "$".

Ok, enough of that, lets talk about actual delimiters.

Suppose you wrote a regular expression (RE) to split $5. That RE could be used in the AWK match() function which would return starting position and length. Those values could be used to pull substrings from $5 which include the split values and the actual delimiters. From the delimiter strings you could build a frequency list per line and or per file. Is that what you really had in mind to do?

For me it would be a bit easier to create it in PERL. Here it is in AWK.

Input:

Code:
F1 F2 F3 F4 Aa$.c$A..,.$,.,$.^~.
F1 F2 F3 F4 one^~.two.$threeA$four.five$six

Code:
BEGIN {
    myRE = "(\\^~\\.)|(A\\$)|(\\.\\$)|[\\.\\$]";
    m = "^~. .$ . A$ A";
    acnt = split(m, astr, " ");
}
{
    print $0;
    c = split($5, a, myRE);
    for (i = 1; i <= c; ++i) {
        print "SPLIT " i ": " a[i]
    }
    str = $5;
    p = match(str, myRE, a);
    i = 1;
    while (p) {
        strsplit = substr(str, 1, RSTART - 1);
        strdelim = substr(str, RSTART, RLENGTH);
        if (strdelim in freq) {
            ++freq[strdelim];
        } else {
            freq[strdelim] = 1;
        }
        str = substr(str, RSTART + RLENGTH);
        print "Match " i ": /" strsplit "/ /" strdelim "/ /" str "/";
        ++i;
        p = match(str, myRE, a);
    }
    print "Match " i ": /" str "/";
}
END {
    for (x in freq) {
        xstr = "/" x "/";
        printf("%10s %3d\n", xstr, freq[x]);
    }
}

Output:

Code:
F1 F2 F3 F4 Aa$.c$A..,.$,.,$.^~.
SPLIT 1: Aa
SPLIT 2: 
SPLIT 3: c
SPLIT 4: A
SPLIT 5: 
SPLIT 6: ,
SPLIT 7: ,
SPLIT 8: ,
SPLIT 9: 
SPLIT 10: 
SPLIT 11: 
Match 1: /Aa/ /$/ /.c$A..,.$,.,$.^~./
Match 2: // /./ /c$A..,.$,.,$.^~./
Match 3: /c/ /$/ /A..,.$,.,$.^~./
Match 4: /A/ /./ /.,.$,.,$.^~./
Match 5: // /./ /,.$,.,$.^~./
Match 6: /,/ /.$/ /,.,$.^~./
Match 7: /,/ /./ /,$.^~./
Match 8: /,/ /$/ /.^~./
Match 9: // /./ /^~./
Match 10: // /^~./ //
Match 11: //
F1 F2 F3 F4 one^~.two.$threeA$four.five$six
SPLIT 1: one
SPLIT 2: two
SPLIT 3: three
SPLIT 4: four
SPLIT 5: five
SPLIT 6: six
Match 1: /one/ /^~./ /two.$threeA$four.five$six/
Match 2: /two/ /.$/ /threeA$four.five$six/
Match 3: /three/ /A$/ /four.five$six/
Match 4: /four/ /./ /five$six/
Match 5: /five/ /$/ /six/
Match 6: /six/
      /.$/   2
     /^~./   2
       /./   6
      /A$/   1
       /$/   4

Well that should do it. You could sort the frequency list of course but don't use AWK as it either whacks the indices or sorts the indices, both of which you don't want.
This User Gave Thanks to m1xram For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Counting the number of occurances of all characters (a-z) in a string

Hi, I am trying out different scripts in PERL. I want to take a line/string as an input from the user and count the number of occurrances of all the alphabets (a..z) in the string. I tried doingit like this : #! /opt/exp/bin/perl print "Enter a string or line : "; $string = <STDIN>; chop... (5 Replies)
Discussion started by: rsendhilmani
5 Replies

2. Programming

Count the number of repeated characters in a given string

i have a string "dfasdfasdfadf" i want to count the number of times each character is repeated.. For instance, d is repeated 4 times, f is repeated 4 times.. can u give a program in c (1 Reply)
Discussion started by: pgmfourms
1 Replies

3. Shell Programming and Scripting

number of characters in a string

Hi there, I have some user input in a variable called $VAR, and i need to ensure that the string is 5 or less characters .... does anybody know how i can count the characters in the variables ? any help would be great, cheers (2 Replies)
Discussion started by: rethink
2 Replies

4. Shell Programming and Scripting

help: Awk to control number of characters per line

Hello all, I have the following problem: My input is two sorted files: file1 >1_19_130_F3 T01220131330230213311013000000110000 >1_23_69_F3 T01200211300200200010000001000000 >1_24_124_F3 T010203113002002111111200002010 file2 >1_19_130_F3 24 18 9 18 23 4 11 4 5 9 5 8 15 20 4 4 7 4... (9 Replies)
Discussion started by: DerSeb
9 Replies

5. Shell Programming and Scripting

Awk to extract lines with a defined number of characters

This is my problem, my file (file A) contains the following information: Now, I would like to create a file (file B) containing only the lines with 10 or more characters but less than 20 with their corresponding ID: Then, I need to compare the entries and determine their frequency. Thus, I... (7 Replies)
Discussion started by: Xterra
7 Replies

6. Shell Programming and Scripting

How to truncate a string to x number characters?

Hello: I have a large file which contains lines like the following: 1/t123ab, &Xx:1:1234:12345:123456@ABCDEFG... at -$100.00% /t is a tab, spaces are as indicated the string "&Xx:1:1234:12345:123456$ABCDEFG..." has a slightly variable number of numbers and letters, but it always starts... (9 Replies)
Discussion started by: Tectona
9 Replies

7. Shell Programming and Scripting

Help awk/sed: putting a space after numbers:to separate number and characters.

Hi Experts, How to sepearate the list digit with letters : with a space from where the letters begins, or other words from where the digits ended. file 52087mo(enbatl) 52049mo(enbatl) 52085mo(enbatl) 25051mo(enbatl) The output should be looks like: 52087 mo(enbatl) 52049... (10 Replies)
Discussion started by: rveri
10 Replies

8. Shell Programming and Scripting

Replace characters in string with awk gsub

Hi I have a source file that looks like a,b,c,d,e,f,g,h,t,DISTI(USD),MSRP(USD),DIST(EUR),MSRP(EUR),EMEA-DISTI(USD),EMEA-MSRP(USD),GLOBAl-DISTI(USD),GLOBAL-MSRP(USD),DISTI(GBP), MSRP(GBP) I want to basically change MSRP(USD) to MSRP,USD and DIST(EUR) to DIST,EUR and likewise for all i'm using... (3 Replies)
Discussion started by: r_t_1601
3 Replies

9. Shell Programming and Scripting

awk to print column number while ignoring alpha characters

I have the following script that will print column 4 ("25") when column 1 contains "123". However, I need to ignore the alpha characters that are contained in the input file. If I were to ignore the characters my output would be column 3. What is the best way to print my column of interest... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

10. UNIX for Beginners Questions & Answers

Concatenate a string and number and compare that with another string in awk script

I have below code inside my awk script if ( $0 ~ /SVC IN:/ ) { svc_in=substr( $0,23 , 3); if (msg_start == 1 && msg_end == 0) { msg_arr=$0; } } else if ( $0 ~ /^SVC OUT:/ ) { svc_out=substr( $0, 9, 3); if (msg_start == 1 && msg_end == 0) ... (6 Replies)
Discussion started by: bhagya123
6 Replies
All times are GMT -4. The time now is 03:47 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy