Shell script to find longest phrase


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Shell script to find longest phrase
# 15  
Old 02-26-2009

Code:
phrases()
{
  phrase=$*
  while :
  do
    case $phrase in
      *\ *) phrases ${phrase#* } ;;
      *) break;;
    esac
  done

  case $* in
      *\ *) printf "%s\n" "$*" ;;
  esac
}

FILE=whatever.txt

while read line
do
  while :
  do
    phrases "$line"
    line=${line% *}
    case $line in *\ *);; *) break;; esac
  done
done < "$FILE" | sort | uniq -c |
 awk '$1 < 2 { next }
 length > max { max = length; phrase = $0; num = $1 }
END { print phrase } '

# 16  
Old 02-26-2009
I think I misread the question and thought you were looking to find the longest repeated phrase. If all you want is to find the longest phrase (or sentence) that is simple.

Code:
my $data = do{local $/; <DATA>};
$data =~ s/[\r\n]/ /g;
my @phrases = split(/[.!?]\s+/, $data);
@phrases = sort {length($b) <=> length($a)} @phrases;
print "The longest phrase is:\n $phrases[0]";

__DATA__
One idea that elite universities like Yale, sprawling public systems like Wisconsin and smaller private colleges like Lewis and Clark have shared for generations is that a traditional liberal arts education is, by definition, not intended to prepare students for a specific vocation. Rather, the critical thinking, civic and historical knowledge and ethical reasoning that the humanities develop have a different purpose: They are prerequisites for personal growth and participation in a free democracy, regardless of career choice.
But in this new era of lengthening unemployment lines and shrinking university endowments, questions about the importance of the humanities in a complex and technologically demanding world have taken on new urgency. Previous economic downturns have often led to decreased enrollment in the disciplines loosely grouped under the term “humanities” — which generally include languages, literature, the arts, history, cultural studies, philosophy and religion. Many in the field worry that in this current crisis those areas will be hit hardest.
Already scholars point to troubling signs. A December survey of 200 higher education institutions by The Chronicle of Higher Education and Moody’s Investors Services found that 5 percent have imposed a total hiring freeze, and an additional 43 percent have imposed a partial freeze.
In the last three months at least two dozen colleges have canceled or postponed faculty searches in religion and philosophy, according to a job postings page on Wikihost.org. The Modern Language Association’s end-of-the-year job listings in English, literature and foreign languages dropped 21 percent for 2008-09 from the previous year, the biggest decline in 34 years.
“Although people in humanities have always lamented the state of the field, they have never felt quite as much of a panic that their field is becoming irrelevant,” said Andrew Delbanco, the director of American studies at Columbia University.

The efficiency could possibly be increased by first making a list of cached keys to sort, commonly called a Sharwtzian Transform. Hard to say without testing and benchmarking.

There could also be some unusual combinations of punctuation that breaks a sentence into a smaller one than intended. It might pay to remove all but end of sentence punctuation ( . ! ?)

In your sample text this word "Wikihost.org." will be broken if you blindly break sentences on a period. I broke them on a period-exclamation-question_mark followed by space(s), but even that might not work in some unusual circumstances, like '...' to convey a pause in thinking or speaking.
# 17  
Old 02-26-2009
Quote:
Originally Posted by KevinADC
I think I misread the question and thought you were looking to find the longest repeated phrase.
That's what I assumed this line meant:

Quote:
the longest phrase that appears at least twice
# 18  
Old 02-26-2009
No your original understanding is correct - OP is looking for longest repeated phrase. CFAJ's solution gives this.
# 19  
Old 02-26-2009
ahh.... now I feel really dumb! Oh well, not the first time I have felt this way. Smilie
# 20  
Old 02-26-2009
cfajohnson .. Smilie .. That was AWESOME .. Smilie .. Thank you a million Smilie

One question though. It is only searching for phrases that are made up of 4 words and above. I don't see the loop init being limited to that, do you have any idea why that is happening ??

For example, in my test file (see below) .. It picks up the phrase, 'this was a long test' but not the phrase 'yahooo temp lala' .. Any thoughts ??

yahooo temp lala
yahooo temp lala
yahooo temp lala
yahooo temp lala
yahooo temp lala

This was long test
to see if long test works
this was a long test
I repeat
this was a long test
this was a long test .. yahooo
this was a long test


Thanks again. Smilie
SG
# 21  
Old 02-26-2009
Quote:
Originally Posted by stargazerr
One question though. It is only searching for phrases that are made up of 4 words and above. I don't see the loop init being limited to that, do you have any idea why that is happening ??

It will give the longest phrase of two or more words.
Quote:
For example, in my test file (see below) .. It picks up the phrase, 'this was a long test' but not the phrase 'yahooo temp lala' .. Any thoughts ??

Is that one file?

If so, then the longest phrase is 'this was a long test', and that's what you get.
Quote:
yahooo temp lala
yahooo temp lala
yahooo temp lala
yahooo temp lala
yahooo temp lala

This was long test
to see if long test works
this was a long test
I repeat
this was a long test
this was a long test .. yahooo
this was a long test

I have refactored the script a little, allowing one or more filenames to be specified on the command line, or to read standard input if no file is specified:

Code:
phrases()
{
  phrase=$*
  while :
  do
    case $phrase in
      *\ *) phrases ${phrase#* } ;;
      *) break;;
    esac
  done

  case $* in
      *\ *) printf "%s\n" "$*" ;;
  esac
}

listphrases()
{
  string="$*"
  while :
  do
    phrases "$string"
    string=${string% *}
    case $string in *\ *);; *) break;; esac
  done
}

multimax()
{
  sort | uniq -c |
  awk '$1 < 2       { next }
       length > max { max = length; phrase = $0 }
                END { print phrase }'
}

cat "$@" | while read line
do
  listphrases $line
done | multimax

I have measured length of phrase by number of characters rather than number of words. To change that, use this instead of length > max { ... }:

Code:
       NF > max { max = NF; phrase = $0 }

I'll leave it as an exercise for the reader to add a command-line option to switch between number of characters and number of words.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help script shell find fichier

Hello, I am looking for a shell script that can 1- take as input a variable, like "server.cpu" 2- do a search for that variable in a directory that contains subdirectories. The search will start at the last subdirectory working up to the top level if I can not find the file 3-... (7 Replies)
Discussion started by: georg2014
7 Replies

2. Shell Programming and Scripting

How to find a phrase and pull all lines that follow until the phrase occurs again?

I want to burst a report by using the page number value in the report header. Each section starts with *PAGE NO:* 1 Each section might have several pages, but the next section always starts back at 1. So I want to find the "*PAGE NO:* 1" value and pull all lines that follow until "*PAGE NO:* 1"... (4 Replies)
Discussion started by: Scottie1954
4 Replies

3. Shell Programming and Scripting

Shell script to find the sum of argument passed to the script

I want to make a script which takes the number of argument, add those argument and gives output to the user, but I am not getting through... Script that i am using is below : #!/bin/bash sum=0 for i in $@ do sum=$sum+$1 echo $sum shift done I am executing the script as... (3 Replies)
Discussion started by: mukulverma2408
3 Replies

4. Shell Programming and Scripting

How to find out the shell of the shell script?

Hello My question is: How to find out the shell of the shell script which we are running? I am writing a script, say f1.sh, as below: #!/bin/ksh echo "Sample script" From the first line, we can say this script will run in ksh. But, how can we prove it? Can we print anything inside... (6 Replies)
Discussion started by: guruprasadpr
6 Replies

5. Shell Programming and Scripting

Find longest string and print it

Hello all, I need to find the longest string in a select field and print that field. I have tried a few different methods and I always end up one step from where I need to be. Methods thus far: nawk '{if (length($1) > long) long=length($1); if(length($1)==long) print $1}' The above... (6 Replies)
Discussion started by: SEinT
6 Replies

6. Shell Programming and Scripting

Bash script find longest line/lines in several files

Hello everyone... I need to find out, how to find longest line or possibly lines in several files which are arguments for script. The thing is, that I tried some possibilities before, but nothing worked correctly. Example when i use: awk ' { if ( length > L ) { L=length ;s=$0 } }END{ print... (23 Replies)
Discussion started by: 1tempus1
23 Replies

7. Shell Programming and Scripting

shell script: longest match from right?

Return the position of matched string from right, awk match can do from left only. e.g return pos 7 for search string "service" from "AA-service" or return the matched string "service", then caculate the string length. Thanks!. (3 Replies)
Discussion started by: honglus
3 Replies

8. Shell Programming and Scripting

find PHRASE and PATH

I've got a script which finds *.txt files in directories and subdirectories after providing the path by the user and then searches in the files for phrase given by the user How to write script in such way that the paths to the found *.txt files and the phrase given by the user were both... (2 Replies)
Discussion started by: patrykxes
2 Replies

9. Shell Programming and Scripting

c shell script help with find

Okie here is my problem, 1. I have a directory with a ton of files. 2. I want to first get an input on how many days ago the files were created. 3. I will take those files and put it into another file 4. Then I will take the last # from each line and subtract by 1 then diff the line from the... (1 Reply)
Discussion started by: bigboizvince
1 Replies

10. Shell Programming and Scripting

Find the length of the longest line

Dear All, To find the length of the longest line from a file i have used wc -L which is giving the proper output... But the problem is AIX os does not support wc -L command. so is there any other way 2 to find out the length of the longest line using awk or sed ? Regards, Pankaj (1 Reply)
Discussion started by: panknil
1 Replies
Login or Register to Ask a Question