Give it a try with:-
Does this get you what you need?
I suppose you might have to consider another., another,, another!, etc. too.
If this is a worry, try:-
The plan with this one is to translate all punctuation to spaces, then translate all spaces to a new-line, then use grep to count the records (one word each by now) that contain the string from the first to the last character only.
It might be worth testing it out a small section first and think of as many variations as you can think of.
Your use of split() is redundant. AWK already split the line. The value that split() returns is the current value of NF. You can iterate through the fields using i<=NF and checking each $i.
Also, if there aren't any matches, count will be undefined and the END print statement will output an empty line. I would change its argument to count+0.
Quote:
Originally Posted by RudiC
With awk, try also
Beware of matching substrings triggering false positives.
Quote:
Originally Posted by rbatte1
Give it a try with:-
...<snip>...
The plan with this one is to translate all punctuation to spaces, then translate all spaces to a new-line, then use grep to count the records (one word each by now) that contain the string from the first to the last character only.
That's seems to me to be a reasonable approach, but neither of those pipelines actually implements it. As described, the approach would require a pipeline with two tr's. However, it can be done with one if you convert punctuation directly to newlines, which would be equivalent. In that case, you can modify your latter suggestion to:
Note the space after the punctuation character class. If one wanted to include any blank characters, the :blank: class could have been used instead.
Often times it's easier and safer to define what to include than what to exclude. Based on your approach, if we define a word as a sequence of [:alpha:] characters, the following portable solution can be used:
Two things. First, the -c option, which complements [:alpha:], so what is matched is everything that is not a member of [:alpha:].
Second, \n* must be bracketed. With the brackets, it represents as many newlines as it takes to match the length of the class in the previous argument. Without the brackets, it means a single newline followed by as many asterisks as it takes. So, your erroneous version would replace the first character in [:alpha:] with a newline and every subsequent character with an asterisk.
In my current locale, A is the first member of [:alpha:]. Note how in the first example 'A' is converted to a newline while 'a' becomes an asterisk:
... but there is the possibility of a portability issue. From POSIX tr:
Quote:
When string2 is shorter than string1, a difference results between historical System V and BSD systems. A BSD system pads string2 with the last character found in string2. Thus, it is possible to do the following:
tr 0123456789 d
which would translate all digits to the letter 'd'. Since this area is specifically unspecified in this volume of POSIX.1-2008, both the BSD and System V behaviors are allowed, but a conforming application cannot rely on the BSD behavior. It would have to code the example in the following way:
Hi All,
I am looking for a perl/awk/sed command to auto-increment the numbers line in file, P1.tcl:
run_build_model sparc_ifu_dec
run_drc
set_faults -model path_delay -atpg_effectiveness -fault_coverage
add_delay_paths P1
set_atpg -abort_limit 1000
run_atpg -ndetects 1000
I would like... (6 Replies)
I have a log file which looks like this:
<845185415165:STATUS:5/0:0:0:0:0|ghy59DI5zasldf87asdfamas8df9asd903tGUVSQx4GJVSQ==>
I have to extract DATE and number of times the keyword STATUS is shown on each date.
Input is : <1354625655744:STATUS:5/0:0:0:0:0|ghy59DI5ztGUVSQx4GJVSQ==>... (8 Replies)
Hi
I want to search for a specific pattern in file
Say
ABC;HELLO_UNIX_WORLD;PQR
ABC;HELLO_UNIX_WORLD_IS_NOT_ENOUGH;XYZ
ABC;HELLO_UNIX_FORUM;LMN
Pattern to search is : "HELLO_UNIX_*****" and not "HELLO_UNIX_***_***_"
I mean after "HELLO_UNIX" there can only be one word.In this case... (2 Replies)
Can any one help us in finding the the last word of each line from a text file and print it.
eg:
1st --> aaa bbbb cccc dddd eeee ffff ee
2nd --> aab ered er fdf ere ww ww f
the o/p should be a below.
ee
f (1 Reply)
Hi,
i have text file with ~ seperated columns. it is very huge size of file,
in the file sompulsary supposed to has 20 columns with ~ seperated.
so how can i find if the file has 20 column in the all rows...?
Sample file:
APA+VU~10~~~~~03~101~101~~~APA.N O 20081017 120.00... (1 Reply)
Hi Folks !!!!!!!!!!!!!!!!!!!
My Requirement is.............
i have a input file:
501,501.chan
502,502.anand
503,503.biji
504,504.raja
505,505.chan
506,506.anand
507,507.chan
and my o/p should be
chan->3
i.e. the word which occurs maximum number of times in a file should be... (5 Replies)
I want to count the number of occurence of perticular word from one text file.
Please tell me "less" command is work in ksh or not. If it is not working then instead of that which command will work. :confused: (40 Replies)
Greetings.
I am struggling with a shell script to make my life simpler, with a number of practical ways in which it could be used. I want to take a standard text file, and pull the 'n'th word from each line such as the first word from a text file.
I'm struggling to see how each line can be... (5 Replies)