Changing "word" to "token" and "sentence" to "set" doesn't clarify anything. Changing two undefined terms to two other undefined terms still leaves us with no defined terms. If you refuse to explain what you want your code to do, there is no reason for any of us to waste our time trying to guess at your requirements, nor to try to write code when we don't know what the code is supposed to do. Why did you explicitly say that you had a "word list A" if there is no word list? Your original requirement was:
Quote:
The output format is based on the availability of all the words from the words list A. If the sentence matches all the words from the word list, then the sentence will be extracted. The condition is that at least seventy percent of the words in the sentence must be matched from the words list A.
which can now be restated as:
Quote:
The output format is based on the availability of all of the <undefined term1>s from the <undefined term1>s in a non-existent list. If the <undefined term2> matches all of the <undefined term1>s from the non-existent list, then the <undefined term2> will be extracted. The condition is that at least 70% of the <undefined term1>s in the <undefined term2> must be matched from the <undefined term1>s in the non-existent list.
With requirements like this, it looks like a homework assignment that you want us to complete for you.
If you already have a way to do this and just want to write it in a different language, show us the C code that you have written that you now want to translate to shell code. They we would be able to deduce your definitions from you C code and know what it is that we're trying to do. (But, don't claim that you are converting from C to shell to make the code faster; for any particular task, well crafted C will almost certainly be faster than a corresponding shell script. And, there is absolutely no reason to claim that C code can't be used in a pipeline. Almost all of the standard utilities on UNIX and Linux systems are written in C and many of them are perfectly capable of being used in a pipeline. Changing C code that can't be used as a filter to a shell script won't magically turn it into a filter.)
RudiC made a valiant effort to help you get a start on your problem, but it ignores the fact that you don't have a list, assumes that tokens (or words) include punctuation, assumes that <sentence> or <set> and <line in a text file> are synonymous, ignores the requirement to ignore uppercase <words> or <tokens> from you nonexistent list, and uses 80% instead of 70% as the threshold.