Conditional identification of suffixes moving from right to left: revisited
Dear all,
I have a large database of names which I have sorted on reverse with a Perl Script. A sample is provided below
My problem is that I wish to identify the suffixes ( i.e. the possible identical longest strings moving from right to left) which are adjoined to the names and store such strings along with their frequency in a separate file with the following conditions
the suffix string should be at least between 3 and 5 characters in length
the suffix string should be repeated at least 10 times in the database.
Thus in the sample given above, the script would identify only the following suffixes along with their frequency
The suffix
Will not be identified since it is less than 10 times.
I had posted the query earlier, but at present I have tried to refine it with conditional constraints so that hopefully only the most pertinent suffixes will be identified. There could be a few false positives but I could weed them out.
I work in a Windows environment and PERL or AWK script would be helpful.
Many thanks and all good wishes for the New Year to all the folks who take their valuable time off to help people solve their problems
Many thanks. It worked very well. When I posted the request, I knew that there are chances of false positives, but a list of suffixes is easier to handle than wading through thousands of lines.
I can also tweak the awk script if I wish to set the range
Happy New Year and thanks once more
While the earlier method worked and I had to tweak a few suffixes manually, I have been rethinking the process of identification of suffixed names and after going through nearly 40 to 50 thousand names, I have identified a pattern. Very often, in nearly 95% of the cases,the name that is suffixed is also a name by itself as in the example below and comes first in my rev sort followed by names to which it is suffixed.
Could it be possible to extract such suffixes given that the suffix is a stand-alone name as in the case of
with the proviso that the standalone name is suffixed at least three times to another name. This would obviate the need for blind search and also false positives. I know that this could possibly miss out a few suffixes, but from my analysis, this could provide a more accurate solution.
Would it be possible to devise a PERL or AWK script to identify such cases.
Many thanks once again for all kind help.
Thanks a lot. It works well, all I had to do was trim off short words from the list and which in no way were suffixes, and I managed to get a pretty comprehensive lst of suffixes.
I have been studying the syntax of the script and there is one part which perplexes me. The rest I could grab
Could you please explain what this really does.
Thanks once again and a Happy New Year.
Hello,
I am interested in finding and identifying suffixes for Indian names through an awk script or a perl program. Suffixes normally are found at the end of a word as is shown in the sample given below.
What I need is a perl script which will identify suffixes of a defined lenght to be given in... (4 Replies)
hi all,
ive downloaded ,built and installed coreutils from sunfreeware.com,in my quest to get the color display when ls is used(linux style)...
After the pkg is installed,how do i use ls to get the color?
I know its installed because i get a host of cmds that have been updated,l
like this,
... (1 Reply)
Many readers have read the hype, experienced the Orwellian marketspeak, watched the positioning debates, and seen poorly managed software companies play the game of analyst-chasing (similar to ambulance chasing when you think about it). Finally, the up-to-date definitions, and hopefully a bit of... (0 Replies)
I have read through all documents in FAQ and have run into an issue with sending an email with body message text and an email attachment. I have included what I have thus far and I can get the message body to send in the email to work only. I cannot understand the uuencode even after I read the... (5 Replies)
I just installed solaris 9 on a sunblade 150(sparc), and have it partitioned. I've been using ufsrestore to restore bring the config from my old system, to the sunblade. I'm not having any luck. The root directory restore seems to work. When I try to restore /usr, I get an "/usr/sbin/fsck not... (4 Replies)
I have a ascii file with lines like this:
240|^M\ ^M\^M\ Old Port Marketing order recd $62,664.- to ship 6/22/99^M\
when this record gets loaded into my database, the \ is stored literally and so the user sees carriage return \ (hex 0D 5C) when what i need is carriage return line feed (hex 0D... (1 Reply)