Find all lines in file such that each word on that line appears in at least n lines of the file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find all lines in file such that each word on that line appears in at least n lines of the file
# 1  
Old 06-14-2017
Find all lines in file such that each word on that line appears in at least n lines of the file

I have a file where every line includes four expressions with a caret in the middle (plus some other "words" or fields, always separated by spaces). I would like to extract from this file, all those lines such that each of the four expressions containing a caret appears in at least four different lines of the whole file. Could anyone help me?

Here is a section of my file:
Code:
5^4 + 32^1 = 6^3 + 21^2    (625, 32, 216, 441)
5^4 + 34^2 = 12^3 + 53^1    (625, 1156, 1728, 53)
5^4 + 40^2 = 13^3 + 28^1    (625, 1600, 2197, 28)
5^4 + 42^1 = 7^3 + 18^2    (625, 42, 343, 324)
5^4 + 53^2 = 15^3 + 59^1    (625, 2809, 3375, 59)
5^4 + 56^1 = 8^3 + 13^2    (625, 56, 512, 169)
5^4 + 66^2 = 17^3 + 68^1    (625, 4356, 4913, 68)
5^4 + 75^1 = 6^3 + 22^2    (625, 75, 216, 484)
5^5 + 6^4 = 65^1 + 66^2    (3125, 1296, 65, 4356)
5^5 + 7^1 = 6^3 + 54^2    (3125, 7, 216, 2916)
5^5 + 7^4 = 50^1 + 74^2    (3125, 2401, 50, 5476)
5^5 + 8^3 = 37^1 + 60^2    (3125, 512, 37, 3600)
5^5 + 9^3 = 10^1 + 62^2    (3125, 729, 10, 3844)
5^5 + 10^3 = 8^4 + 29^1    (3125, 1000, 4096, 29)
5^5 + 16^2 = 6^1 + 15^3    (3125, 256, 6, 3375)
5^5 + 17^2 = 15^3 + 39^1    (3125, 289, 3375, 39)
5^5 + 18^2 = 15^3 + 74^1    (3125, 324, 3375, 74)
5^5 + 19^1 = 14^3 + 20^2    (3125, 19, 2744, 400)
5^5 + 20^1 = 6^4 + 43^2    (3125, 20, 1296, 1849)
5^5 + 27^1 = 7^3 + 53^2    (3125, 27, 343, 2809)
5^5 + 32^2 = 8^4 + 53^1    (3125, 1024, 4096, 53)
5^5 + 32^2 = 16^3 + 53^1    (3125, 1024, 4096, 53)
5^5 + 33^1 = 13^3 + 31^2    (3125, 33, 2197, 961)
5^5 + 43^2 = 17^3 + 61^1    (3125, 1849, 4913, 61)
5^5 + 47^1 = 12^3 + 38^2    (3125, 47, 1728, 1444)
5^5 + 55^1 = 11^3 + 43^2    (3125, 55, 1331, 1849)
5^5 + 59^2 = 9^4 + 45^1    (3125, 3481, 6561, 45)
5^5 + 60^1 = 7^4 + 28^2    (3125, 60, 2401, 784)
5^5 + 60^1 = 14^3 + 21^2    (3125, 60, 2744, 441)
5^6 + 8^4 = 27^3 + 38^1    (15625, 4096, 19683, 38)
5^6 + 16^1 = 10^3 + 11^4    (15625, 16, 1000, 14641)
5^6 + 20^4 = 9^1 + 56^3    (15625, 160000, 9, 175616)
5^6 + 35^2 = 7^5 + 43^1    (15625, 1225, 16807, 43)
5^6 + 45^2 = 26^3 + 74^1    (15625, 2025, 17576, 74)

So in what I would like to extract from the file, the last line would only be included if each of "5^6", "45^2", "26^3" and "74^1" appears on at least four different lines of the entire file. Thanks for any help!
# 2  
Old 06-14-2017
Is this a homework assignment? Homework and coursework questions can only be posted in the Homework & Coursework Questions forum under special homework rules.

Please review the rules, which you agreed to when you registered, if you have not already done so.

If this post is not homework, please explain the company you work for and the nature of the problem you are working on. And, tell us what operating system and shell you're using, and show us what you have tried to do to solve this problem on your own.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

Last edited by Don Cragun; 06-15-2017 at 03:16 AM.. Reason: Fix typo: s/ this is not post/this post is not/
# 3  
Old 06-15-2017
Thanks for the friendly welcome Don. I haven't had any homework assignments for over 25 years. I'm a hobbyist working on a maths problem. I wrote a little C program to generate this data, and want to sort through it with shell tools as an intermediate step to solving the problem empirically (as a hint to myself, before I try to solve it mathematically). I am using Bash by default, since it is the default shell on my laptop running OS 10.6, but other shells are available. What I have done so far: stared at it and realised I don't know how to do this kind of multi-line search with the handful of shell commands I have taught myself over the last 30 years (and only used very infrequently, when such problems come up). I suppose I could also have tried to do this weeding out within my C program, but I can't see how to do it without having to hold everything in memory all at once (again, I write such programs very infrequently). So, it seems better to write it to a file then use some other tool in the shell to search that file. Hence my posting here. I'm sure there is a better way, but I break out my C and shell scripts about once every 6 months and at my age it's often easier to ask.

Is there anyone less suspicious who might be able to point me in a useful direction?
# 4  
Old 06-15-2017
No reason to become ironic. This forum has a high reputation of NOT helping students and / or candidates cheat their way through classwork or exams, so questions of that kind are adequate and accepted.

Still: welcome to the forum.

For your problem, try
Code:
awk '{CNT[$1]++; CNT[$3]++;CNT[$5]++; CNT[$7]++} END {for (c in CNT) if (CNT[c] > 3) print c, "occurs", CNT[c], "times."}' file
15^3 occurs 4 times.
5^4 occurs 8 times.
5^5 occurs 21 times.
5^6 occurs 5 times.

It doesn't check if terms occur twice in one line, but the chances of that happening are quite low, I believe.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 06-15-2017
Thank you Rudi. I should learn awk, shouldn't I. That is a good way to count the occurrences. Is there a way, having counted the occurrences, to echo an entire line, if and only if the 1st 3rd 5th and 7th field of that line all appear at least 4 times in the file? (For the smaller sample data I posted, it would find an answer if we searched for lines whose entries all appear at least twice, instead of four times.)

You are correct not to worry about repeats within a single line, this is ruled out by construction of the data.

P.s. apologies if I overreacted--I think what was irritating was not that someone would want to make sure my question wasn't homework (I agree that a forum can quickly become useless to experts if it is overrun by homework questions), but instead the order to "please explain the company you work for and the nature of the problem you are working on", not only because it is intrusive, but because it suggests that only people who work for a company with a work-related problem can legitimately ask for scripting assistance here. But: your forum, your rules, ok.
# 6  
Old 06-15-2017
If you don't mind reading the file twice, it is pretty simple with awk:
Code:
awk -v cnt=2 '
FNR == NR {
	c[$1]++
	c[$3]++
	c[$5]++
	c[$7]++
	next
}
c[$1] >= cnt && c[$3] >= cnt && c[$5] >= cnt && c[$7] >= cnt' file file

With cnt set to 4, you don't get any output with your posted sample data. With cnt set to 2, this produces the output:
Code:
5^5 + 18^2 = 15^3 + 74^1    (3125, 324, 3375, 74)
5^5 + 32^2 = 8^4 + 53^1    (3125, 1024, 4096, 53)
5^5 + 60^1 = 14^3 + 21^2    (3125, 60, 2744, 441)

You haven't told us what operating system you're using... If you're using a Solaris/SunOS system, you'll need to change awk in the above to /usr/xpg4/bin/awk or nawk.

Last edited by Don Cragun; 06-15-2017 at 03:14 PM.. Reason: Fix typo: s/cnt[$5]/c[$5]/ in last line of awk script.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 06-15-2017
I'm certain Don Cragun will accept the apologies. The forum maintainers' attitude is less to not to become useless - people in here REALLY like to help with also minor problems - but to keep up the quality of IT education. If a student fills in the homework form including institution, course and professor, s/he will be helped to develop in the right direction and find a solution of his/her own; c.f. https://www.unix.com/homework-and-coursework-questions/. By the way, vague comments on a person's company like "chemical" or "administration" would have sufficed, or even you telling us you're a hobbyist.

Back to your problem. Outputting the entire line that satisfies a condition means either keep ALL lines in memory (demanding for BIG files) or run through the input file twice - once for counting, once for printing. This is the approach in here:
Code:
awk 'NR == FNR {CNT[$1]++; CNT[$3]++;CNT[$5]++; CNT[$7]++; next} CNT[$1] > 1 && CNT[$3] > 1 && CNT[$5] > 1 && CNT[$7] > 1 ' file file
5^5 + 18^2 = 15^3 + 74^1    (3125, 324, 3375, 74)
5^5 + 32^2 = 8^4 + 53^1    (3125, 1024, 4096, 53)
5^5 + 60^1 = 14^3 + 21^2    (3125, 60, 2744, 441)

For increasing the count limit, set all the 1 s to 3 for the four comparisons in the second part.
And, yes, you're right: awk is a very powerful tool for text file analyses...
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Search for word in huge logfile and need to continue to print few lines from that line til find date

Guys i need an idea for one logic..in shell scripting am struggling with a logic...So the thing is... i need to search for a word in a huge log file and i need to continue to print few more lines from that line and the consecutive line has to end when it finds the line with date..because i know... (1 Reply)
Discussion started by: Prathi
1 Replies

2. UNIX for Advanced & Expert Users

How to find a string in a line in UNIX file and delete that line and previous 3 lines ?

Hi , i have a file with data as below.This is same file. But actual file contains to many rows. i want to search for a string "Field 039 00" and delete that line and previous 3 lines in that file.. Can some body suggested me how can i do using either sed or awk command ? Field 004... (7 Replies)
Discussion started by: vadlamudy
7 Replies

3. Shell Programming and Scripting

Read all lines after a string appears in the file.

Hi All, I want to read all lines after a perticular string {SET UP VALUES}apprears in the file. SET UP values contains direcory, number of days and file type. Step1: Read all lines below SET UP VALUES string. Step2: If set up values are not present in each record then read from default... (4 Replies)
Discussion started by: Nagaraja Akkiva
4 Replies

4. Shell Programming and Scripting

Get last lines of file after last line with word TEST

i need to get least lines of file after last word TEST in file, and send that lines to mail example of file structure: TEST 10.10.2010 jdfjdnjfndjfndnfkdk djfjdnfjkdjkfnjkdfk jdfjdjfnjdjnfjkdnfjk TEST 11.10.2010 jdjfnjdnfdkdfjdfjdnk jdnfjdnjkfndnfjdnfjk fjdnfjkndnfdfnjdnfjk TEST... (6 Replies)
Discussion started by: waso
6 Replies

5. UNIX for Dummies Questions & Answers

how to find a word in a file that appears next to a given keyword

Hi Experts, I have a file which contains some text. i need to print the word next to a given keyword. Please help. Ex: test.txt ===================== NEXT HOST ===================== AEADBAS001 access-list 1 permit xxxxxxxxxxxxxx ip access-list extended BLA_Outgoing_Filter... (6 Replies)
Discussion started by: mwrg
6 Replies

6. Shell Programming and Scripting

print lines from a file containing key word

i have a file containing over 1 million records,and i want to print about 300,000 line containing a some specific words. file has content. eg 1,rrt,234 3,fgt,678 4,crf,456 5,cde,drt 6,cfg,123 and i want to print the line with the word fgt,crf this is just an example,my file is so... (2 Replies)
Discussion started by: tomjones
2 Replies

7. UNIX for Dummies Questions & Answers

find uniq lines in file, using the first field of line

Hello all, new to unix and have just found the forum. I think I will be here quite often, and hope that in time i will be able to provide soem help, role on not being a newbie anymore :) I have a question which iI am hoping someone could help me with. If i have a file with lines in in thus... (8 Replies)
Discussion started by: grom
8 Replies

8. Shell Programming and Scripting

Unix help to find blank lines in a file and print numbers on that line

Hi, I would like to know how to solve one of my problems using expert unix commands. I have a file with occasional blank lines; for example; dertu frthu fghtu frtty frtgy frgtui frgtu ghrye frhutp frjuf I need to edit the file so that the file looks like this; (10 Replies)
Discussion started by: Lucky Ali
10 Replies

9. Shell Programming and Scripting

Find 5 lines and replace with 18 line in sql file where it contains multiple blocks.

My sql file xyz_abc.sql in this file there are multiple sql block in this block I need to find the following block rem Subset Rows (&&tempName.*) CREATE VIEW &&tempName.* AS SELECT * FROM &&tempName.* WHERE f is not null and replace with following code rem Subset Rows... (9 Replies)
Discussion started by: Zaheer.mic
9 Replies

10. Shell Programming and Scripting

shellscript to find a line in between a particular set of lines of a text file

i have a file a.txt and following is only one portion. I want to search <branch value="/dev36/AREA/" include="yes"></branch> present in between <template_file name="Approve External" path="core/approve/bin" and </template_file> where the no of lines containing "<branch value= " is increasing ... (2 Replies)
Discussion started by: millan
2 Replies
Login or Register to Ask a Question