Search a text and return the text from file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search a text and return the text from file
# 1  
Old 07-25-2015
Search a text and return the text from file

Hi

I have a set of input strings in a pattern as given below

string1 string2 string3 string4 string5


I need to search this sequence of strings from a file in such a way that the first two strings (string1 and string2) and last two strings (string4 and string5) should match with the strings in the SECOND column of a text file (consisting of three columns) after the comparison of the numbers from the respective column.

So, the script will perform searching for the strings which matches string1, string2, string4 and string5 from a big text file called TEXTFILE.TXT . Then, it'll return the string which has the biggest number (-ve number) in FIRST column and the string which has the biggest -ve number in the third column.


A sample file format of the TEXTFILE.txt is given below. The FIRST and SECOND columns are separated by a tab and SECOND and THIRD columns are separated by a tab and a space. The strings in the SECOND columns are separated by a space. There are multiple entries in the SECOND column which may be of single string upto five strings.

For example, my input is

Code:
string1  string2  string3 string4  string5

hai        wafam   cherol   makha   palli

Now there are four entries which matches the input in the textfile.So, the output will be the two string sets :


Code:
hai wafam cherolna makha palli
hai wafam cherolduna makha palli

File format of TEXTFILE.TXT

Code:
-1.391722       hai wafam cherolna makha palli     -0.6328273
-2.922845       hai wafam cherolduna makha palli -0.1190167
-2.915667       hai wafam cherolsina makha palli  -0.5702463
-2.927181       hai wafam paochena makha palli  -0.1963889
-2.925497       hai wafam khangnaduna   -0.6328273
-2.855543       hai wafam ngasigi 
-2.926619       hai wafam thamkharabani
-1.635051       hai wafam thamlamle    -0.4567362
-1.078001       hai wafam thamlamli    -0.8960688
-1.023442       adubu madu makhada yaakhidre haikhre -0.1234433
-1.432234       adubu madu ma yaakhidre haikhre  -0.5432345

I need help to write a script to perform above task. Thanks in advance .
Moderator's Comments:
Mod Comment Please use CODE tags (not HTML) tags for all sample input, output, and sample code segments.

Last edited by my_Perl; 09-15-2015 at 09:17 PM.. Reason: Change HTML tags to CODE tags and add CODE tags.
# 2  
Old 07-25-2015
What operating system and shell are you using?

What tools do you want to use? (Does your forum user name mean you only want to use perl?)

What have you tried? Please show us what you have tried (in CODE tags). If you had shown us what you have tried, some of the questions below would already have been answered.

You say that the 2nd and 3rd fields in your file are separated by a tab and a space, and that the 1st and 2nd fields are separated by a tab. But, there aren't any tabs in your file and (even if you had converted your tabs to spaces when you pasted your sample TEXTFILE.TXT here, the 3rd field is not aligned as it would be if there had been a tab between fields???

How are the five strings found by your code? Are they operands passed to your script? Are they in another file? If so, what is the format of that file? Can any of the strings contain spaces? Can any of the strings contain any characters that are special in an extended regular expression, or do the strings just consist of alphanumeric characters?

You say that string1 and string2 and string4 and string5 should match field 2 in your file. Do they need to be in sequence within field 2 in the file (as they are in your example), or does each of those four strings just have to appear somewhere within field 2? Is string 3 also supposed to match (as it does in your example), or is your code just supposed to ignore string 3? Can the matches overlap, or does each of the four (or five) strings need to match unique substrings in field 2 (as they do in your example)?

Quote:
I need to search this sequence of strings from a file in such a way that the first two strings (string1 and string2) and last two strings (string4 and string5) should match with the strings in the SECOND column of a text file (consisting of three columns) after the comparison of the numbers from the respective column.

So, the script will perform searching for the strings which matches string1, string2, string4 and string5 from a big text file called TEXTFILE.TXT . Then, it'll return the string which has the biggest number (-ve number) in FIRST column and the string which has the biggest -ve number in the third column.
Does the above quote mean that you want the two matching fields that match all four strings and then from all of the fields matching that criteria choose the single matching field that also has the most negative value in field 1 and choose the single matching field that also has the most negative value in field 3?

Or, are you looking for the field matching string 1 and string 2 that has the most negative value in field 1 and for the field matching string 4 and string 5 that has the most negative value in field 3?

Are lines with non-negative values in fields 1 and/or 3 supposed to be ignored when matching the associated strings?
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 07-26-2015
Why does
Code:
-2.927181       hai wafam paochena makha palli  -0.1963889

not show up in your sample output? It has the most negative value of the four matches in the first field.
What be the order in which the output lines shall appear?
This User Gave Thanks to RudiC For This Post:
# 4  
Old 07-30-2015
Clarification

Yes,


-1.391722 hai wafam cherolna makha palli -0.6328273
-2.927181 hai wafam paochena makha palli -0.1963889

We assume that the -ve number -1.391722 is bigger than -2.927181.


Thanks a lot.
# 5  
Old 07-31-2015
Try
Code:
awk -vS1=$string1 -vS2=$string2 -vS4=$string4 -vS5=$string5 -F"\t" '
BEGIN                                   {MX1=MX3=-1E100
                                        }
$2 ~ "^" S1 " " S2 ".*" S4 " " S5 "$"   {if ($1 > MX1)  {MX1 = $1
                                                         T1  = $2
                                                        }
                                         if ($3 > MX3)  {MX3 = $3
                                                         T3  = $2
                                                        }
                                        }
END                                     {print T1
                                         print T3
                                        }
' file
hai wafam cherolna makha palli
hai wafam cherolduna makha palli

# 6  
Old 09-05-2015
Hi

I was trying to change in the script by reading the input sentences one after another from a text file called INPUT.txt and perform the above search operation.
INPUT.txt File format is here:
Code:
hai wafam cherol makha palli adubu madu makha yaakhidre haikhre tamlakle.
hairiba waridu cherol makhada pallina adubu madu makha yaakhidre haikhre tamlaklenasu hairi.
adubu madui saruk amuk hanna khannanaba wafam thangaatkhre hairi.
.....
.....

I need help in this part, if there is no match found at all at the first search operation, then continue the searching from the second string (string2) till the sixth strinig (string6) of the sentence considering five strings at a time. If match found and retrieved, I want to modify the script in such a way that the same search operation will be repeated for the next set of five strings starting from the sixth string till the end of the sentence since the first five input strings (string1, string2, string3, string4 and string5) are already done. Thus this search operation will continue till the end of the sentence.

Thanks in advance. Smilie

Last edited by Don Cragun; 09-05-2015 at 05:50 PM.. Reason: Change ICODE tags to CODE tags.
# 7  
Old 09-05-2015
You lost me. Please explain in smaller steps. What I understand is instead of five strings
Code:
string1  string2  string3 string4  string5
hai        wafam   cherol   makha   palli

and dropping the third you'll have "sentences" consisting of between 9 and 12 strings, of which none should be dropped for the comparison, and you want to compare 1 to 5, if no match then 2 to 6, and, if either matches, compare 6 to 10. What if there's only 9 strings in a sentence? If there's more, what to do with string11 and 12?
And, what to print for either match?
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Match text to lines in a file, iterate backwards until text or text substring matches, print to file

hi all, trying this using shell/bash with sed/awk/grep I have two files, one containing one column, the other containing multiple columns (comma delimited). file1.txt abc12345 def12345 ghi54321 ... file2.txt abc1,text1,texta abc,text2,textb def123,text3,textc gh,text4,textd... (6 Replies)
Discussion started by: shogun1970
6 Replies

2. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Hi guys, I have a text file named file1.txt that is formatted like this: 001 , ID , 20000 002 , Name , Brandon 003 , Phone_Number , 616-234-1999 004 , SSNumber , 234-23-234 005 , Model , Toyota 007 , Engine ,V8 008 , GPS , OFF and I have file2.txt formatted like this: ... (2 Replies)
Discussion started by: An0mander
2 Replies

3. Shell Programming and Scripting

Match text from file 1 to file 2 and return specific text

I hope this makes sense and is possible. I am trying to match $1 of panel_genes.txt with $3 of RefSeqGene.txt and when a match is found the value in $6 of RefSeqGene.txt Example: ACTA2 is $1 of panel_genes.txt ACTA2 NM_001613.2 ACTA2 NM_001141945.1 awk 'FNR==NR {... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. UNIX for Dummies Questions & Answers

Search String, Out matched text and input text for no match.

I need to search a string for some specific text which is no big deal using grep. My problem is when the search fails to find the text. I need to add text like "na" when my search does not match. I have tried this command but it does not work when I put the command in a loop in a bash script: ... (12 Replies)
Discussion started by: jojojmac5
12 Replies

5. Shell Programming and Scripting

bash: need to have egrep to return a text string if the search pattern has NOT been found

Hello all, after spending hours of searching the web I decided to create an account here. This is my first post and I hope one of the experts can help. I need to resolve a grep / sed / xargs / awk problem. My input file is just like this: ----------------------------------... (6 Replies)
Discussion started by: bash4ever
6 Replies

6. Shell Programming and Scripting

Search value in text file

Hi, I have the follwoing text file : db1;unrecoverable;0;20110728162548 db1;unrefreshed;1,NO_MV_VIEWS;20110728162548 xe1;Database;1;20110728162548 xe1;autoextensible;0;20110511112053 xe1;chk_offline_dbf;0;20110511112053 xe1;expiry_date;0;20110511112053 xe1;job;4;20110420111823... (2 Replies)
Discussion started by: yoavbe
2 Replies

7. Shell Programming and Scripting

search text file in file if this file contains necessary text (awk,grep)

Hello friends! Help me pls to write correct awk and grep statements for my task: I have got files with name filename.txt It has such structure: Start of file FROM: address@domen.com (12...890) abc DATE: 11/23/2009 on Std SUBJECT: any subject End of file So, I must check, if this file... (4 Replies)
Discussion started by: candyme
4 Replies

8. Shell Programming and Scripting

Search text from a file and print text and one previous line too

Hi, Please let me know how to find text and print text and its previous line. Please don't get irritated few days back I asked text and next line. I am using HP-UX 11.11 Thanks for your help. (6 Replies)
Discussion started by: kamranjalal
6 Replies

9. UNIX for Dummies Questions & Answers

search and replace a specific text in text file?

I have a text file with following content (3 lines) filename : output.txt first line:12/12/2008 second line:12/12/2008 third line:Y I would like to know how we can replace 'Y' with 'N' in the 3rd line keeping 1st and 2nd lines same as what it was before. I tried using cat output.txt... (4 Replies)
Discussion started by: santosham
4 Replies
Login or Register to Ask a Question