input:
The string above is not separated (or FS="").
For clarity sake one could re-write the string by including a "|" as FS as follow:
Here, I am only interested in patterns (their numbers are variable between records) containing capital letters, i.e.:
2QW
5QWERT
A
B
Note that patterns with more than one capital letter is preceeded by a digit which equals the length of the pattern.
The output I am trying to obtain is:
What I tried so far:
Are the numbers specifying the length of a pattern limited to a single digit?
Does the string 12XYCCCCCCCCCCD contain 2 patterns: 12XYCCCCCCCCCC and D (each occurring once)? Or does it contain the pattern 2XY occurring once, the pattern C occurring 10 times, and the pattern D occurring once?
What happens if a digit is followed by fewer uppercase letters than are specified by that digit?
In the string 3D@C, is there one pattern (3D@C) or two patterns (D and C)?
Your code explicitly counts occurrences of A and B separately from counting patterns that might contain them. Is that what you want?
Should the string 4AABB just report one occurrence of the pattern 4AABB? Or, should it report the pattern 4AABB occurring once, two occurrences of the pattern A, and two occurrences of the pattern B?
Is the digit 1 special?
Should the string 1XX be treated as one occurrence of the pattern 1X and one occurrence of the pattern X? Or, should it be treated as two occurrences of the pattern X?
Last edited by Don Cragun; 12-26-2015 at 10:45 PM..
Reason: Add 4th question.
Are the numbers specifying the length of a pattern limited to a single digit?
1. Does the string 12XYCCCCCCCCCCD contain 2 patterns: 12XYCCCCCCCCCC and D (each occurring once)? Or does it contain the pattern 2XY occurring once, the pattern C occurring 10 times, and the pattern D occurring once?
No. The number specifying the length of the pattern is always ≥2.
In the example Don Cragun mentioned ('12XYCCCCCCCCCCD'), there are 2 patterns: '12XYCCCCCCCCCC' and 'D'.
'12' should be consider as the figure '12' and not as single digits '1' then '2'.
Quote:
2. What happens if a digit is followed by fewer uppercase letters than are specified by that digit?
In the string 3D@C , is there one pattern ( 3D@C ) or two patterns ( D and C )?
It cannot happen. A number is always followed by uppercase or lowercase letters only (no symbols, or other characters than uppercase or lowercase letters). The length of the pattern formed by these letters is always ≥ than the number that precedes them.
Quote:
Your code explicitly counts occurrences of A and B separately from counting patterns that might contain them. Is that what you want?
Should the string 4AABB just report one occurrence of the pattern 4AABB ? Or, should it report the pattern 4AABB occurring once, two occurrences of the pattern A , and two occurrences of the pattern B ?
Correct. '4AABB' is one pattern only, 'AABB', as defined by the number '4'.
Quote:
Is the digit 1 special?
Should the string 1XX be treated as one occurrence of the pattern 1X and one occurrence of the pattern X ? Or, should it be treated as two occurrences of the pattern X ?
There is never the figure '1' in the string. The only figures present in the string are always ≥2 (e.g. 2, 34, 2000...).
Quote:
What happens if a digit is followed by a mix of uppercase and lowercase letters? Eg. in the string "3AbCD", is the pattern "3ACD" or something else?
If '3AbCD' occurs, then we have 2 patterns: 'AbC' and 'D'.
A number X is always followed by letters that forms a X-long pattern. The case doesn't matter as soon as the characters are letters.
If a letter occurs directly after the X-long motif, it is considered a pattern itself.
example 1:
There are 4 patterns ('AvDf', 'QW', 'E' and 'R')
example 2:
There are 5 patterns ('BHu', 'I', RtYU', 'vG' and 'P')
example 3:
There are 4 patterns ('ABcdEf', 'yg', 'L' and 'K')
Last edited by beca123456; 12-26-2015 at 11:35 PM..
Now I have to work on the format of the output as mentioned in my original post, i.e. counting the number of occurrence of each pattern as follow (multiple-letter pattern in same field separated by "; " and single-letter patterns in one field;
example:
We have:
What I am trying to get is:
The order of the multiple-letters pattern within the field doesn't matter.
The order of the single-letter patterns doesn't matter too.
But it would be useful to have the single-letter pattern before the multiple-letter pattern like above.
For clarity I used "|" as FS, but I could change it as " " like in my original post.
Last edited by beca123456; 12-27-2015 at 01:41 AM..
I have a text file with many thousands of lines, a small sample of which looks like this:
InputFile:PS002,003 D -1 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 6 6 -1 -1 -1 -1 0 509 0
PS002,003 PSQ 0 1 7 18 1 0 -1 1 1 3 -1 -1 ... (5 Replies)
Input data as below (filetest.txt):
1|22 JAN Minimum Bal 20.00 | SAT
2|09 FEB Extract bal 168.00BR | REM
3|MIN BAL | LEX
Output should be:
( If there is Date & Month in 2nd field of Input file, It should be seperated else blank. If There is Decimal OR Decimal & Currency in last of the 2nd... (7 Replies)
Hi
I need to egrep patterns in a file and limit number of matches to print for each matched pattern.
-m10 option is not working out in my sun solaris 5.10
Please guide me the options to achieve.
if i do head -10 , i wont be getting all pattern match results as output since for a... (10 Replies)
Hello,
I have been trying to write a script where I could get awk to delete data before and after a matched pattern.
For eg
Raw data
Start
NAME = John
Age = 35
Occupation = Programmer
City = New York
Certification Completed = No
Salary = 80000
End
Start
NAME = Mary
Age = 25... (2 Replies)
I have a directory of files, each with a variable (though small) number of lines. I would like to go through each line in each file, and print the:
-file name
-line number
-number of matches to the pattern /comp/ for each line.
Two example files:
cat... (4 Replies)
To match range, the command is:
awk '/BEGIN/,/END/'
but what I want is the range is printed only if there is additional pattern that matches in the range itself? maybe like this:
awk '/BEGIN/,/END/ if only in that range there is /pattern/'
Thanks (8 Replies)
Hi,
I need some help on how to print the whole data for unmatched pattern. i have 2 different files that need to be checked and print out the unmatched patterns into a new file. My sample data as follows:-
File1.txt
Id Num Activity Class Type
309 1.1 ... (5 Replies)
I want to search a bunch of files and list only those containing a minimum number of pattern matches. So if I want to identify files containing 3 (or more) instances of the pattern "said:" and I have file1 that contains the lines:
He said:
She said:
and file2 that contains the lines:
He... (3 Replies)
Trying to sum field #6 when field #2 matches string as follows:
Input data:
2010-09-18-20.24.44.206117 UOWEXEC db2bp DB2XYZ hostname 1
2010-09-18-20.24.44.206117 UOWWAIT db2bp DB2XYZ hostname ... (3 Replies)
i have an awk statement which i am using to count the number of occurences of the number ,5, in the file:
awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}'
i know there is a total of 10 matches..what is wrong here?
thanks (16 Replies)