Sponsored Content
Top Forums Shell Programming and Scripting Find all lines in file such that each word on that line appears in at least n lines of the file Post 302999219 by RudiC on Thursday 15th of June 2017 10:53:02 AM
Old 06-15-2017
I'm certain Don Cragun will accept the apologies. The forum maintainers' attitude is less to not to become useless - people in here REALLY like to help with also minor problems - but to keep up the quality of IT education. If a student fills in the homework form including institution, course and professor, s/he will be helped to develop in the right direction and find a solution of his/her own; c.f. https://www.unix.com/homework-and-coursework-questions/. By the way, vague comments on a person's company like "chemical" or "administration" would have sufficed, or even you telling us you're a hobbyist.

Back to your problem. Outputting the entire line that satisfies a condition means either keep ALL lines in memory (demanding for BIG files) or run through the input file twice - once for counting, once for printing. This is the approach in here:
Code:
awk 'NR == FNR {CNT[$1]++; CNT[$3]++;CNT[$5]++; CNT[$7]++; next} CNT[$1] > 1 && CNT[$3] > 1 && CNT[$5] > 1 && CNT[$7] > 1 ' file file
5^5 + 18^2 = 15^3 + 74^1    (3125, 324, 3375, 74)
5^5 + 32^2 = 8^4 + 53^1    (3125, 1024, 4096, 53)
5^5 + 60^1 = 14^3 + 21^2    (3125, 60, 2744, 441)

For increasing the count limit, set all the 1 s to 3 for the four comparisons in the second part.
And, yes, you're right: awk is a very powerful tool for text file analyses...
This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

shellscript to find a line in between a particular set of lines of a text file

i have a file a.txt and following is only one portion. I want to search <branch value="/dev36/AREA/" include="yes"></branch> present in between <template_file name="Approve External" path="core/approve/bin" and </template_file> where the no of lines containing "<branch value= " is increasing ... (2 Replies)
Discussion started by: millan
2 Replies

2. Shell Programming and Scripting

Find 5 lines and replace with 18 line in sql file where it contains multiple blocks.

My sql file xyz_abc.sql in this file there are multiple sql block in this block I need to find the following block rem Subset Rows (&&tempName.*) CREATE VIEW &&tempName.* AS SELECT * FROM &&tempName.* WHERE f is not null and replace with following code rem Subset Rows... (9 Replies)
Discussion started by: Zaheer.mic
9 Replies

3. Shell Programming and Scripting

Unix help to find blank lines in a file and print numbers on that line

Hi, I would like to know how to solve one of my problems using expert unix commands. I have a file with occasional blank lines; for example; dertu frthu fghtu frtty frtgy frgtui frgtu ghrye frhutp frjuf I need to edit the file so that the file looks like this; (10 Replies)
Discussion started by: Lucky Ali
10 Replies

4. UNIX for Dummies Questions & Answers

find uniq lines in file, using the first field of line

Hello all, new to unix and have just found the forum. I think I will be here quite often, and hope that in time i will be able to provide soem help, role on not being a newbie anymore :) I have a question which iI am hoping someone could help me with. If i have a file with lines in in thus... (8 Replies)
Discussion started by: grom
8 Replies

5. Shell Programming and Scripting

print lines from a file containing key word

i have a file containing over 1 million records,and i want to print about 300,000 line containing a some specific words. file has content. eg 1,rrt,234 3,fgt,678 4,crf,456 5,cde,drt 6,cfg,123 and i want to print the line with the word fgt,crf this is just an example,my file is so... (2 Replies)
Discussion started by: tomjones
2 Replies

6. UNIX for Dummies Questions & Answers

how to find a word in a file that appears next to a given keyword

Hi Experts, I have a file which contains some text. i need to print the word next to a given keyword. Please help. Ex: test.txt ===================== NEXT HOST ===================== AEADBAS001 access-list 1 permit xxxxxxxxxxxxxx ip access-list extended BLA_Outgoing_Filter... (6 Replies)
Discussion started by: mwrg
6 Replies

7. Shell Programming and Scripting

Get last lines of file after last line with word TEST

i need to get least lines of file after last word TEST in file, and send that lines to mail example of file structure: TEST 10.10.2010 jdfjdnjfndjfndnfkdk djfjdnfjkdjkfnjkdfk jdfjdjfnjdjnfjkdnfjk TEST 11.10.2010 jdjfnjdnfdkdfjdfjdnk jdnfjdnjkfndnfjdnfjk fjdnfjkndnfdfnjdnfjk TEST... (6 Replies)
Discussion started by: waso
6 Replies

8. Shell Programming and Scripting

Read all lines after a string appears in the file.

Hi All, I want to read all lines after a perticular string {SET UP VALUES}apprears in the file. SET UP values contains direcory, number of days and file type. Step1: Read all lines below SET UP VALUES string. Step2: If set up values are not present in each record then read from default... (4 Replies)
Discussion started by: Nagaraja Akkiva
4 Replies

9. UNIX for Advanced & Expert Users

How to find a string in a line in UNIX file and delete that line and previous 3 lines ?

Hi , i have a file with data as below.This is same file. But actual file contains to many rows. i want to search for a string "Field 039 00" and delete that line and previous 3 lines in that file.. Can some body suggested me how can i do using either sed or awk command ? Field 004... (7 Replies)
Discussion started by: vadlamudy
7 Replies

10. UNIX for Beginners Questions & Answers

Search for word in huge logfile and need to continue to print few lines from that line til find date

Guys i need an idea for one logic..in shell scripting am struggling with a logic...So the thing is... i need to search for a word in a huge log file and i need to continue to print few more lines from that line and the consecutive line has to end when it finds the line with date..because i know... (1 Reply)
Discussion started by: Prathi
1 Replies
pstat(8)						      System Manager's Manual							  pstat(8)

Name
       pstat - print system facts

Syntax
       /etc/pstat -aixvptufTk [ system ] [ corefile ]

Description
       The command interprets the contents of certain system tables. The contents of system tables can change while is running, so the information
       it gives is a snapshot taken at a given time.  If you specify system, gets the namelist from the named system's kernel. If you omit system,
       uses  the  namelist  in If you specify corefile, uses the tables in the core file. Otherwise, uses the tables in Use the -k option when you
       specify the system or corefile argument.

Options
       -a   When used with the -p option, displays all process slots, rather than just active ones.

       -f   Displays the open file table with the following headings:

	       LOC	 The core location of this table entry.

	       TYPE	 The type of object the file table entry points to.

	       FLG	 Miscellaneous state variables, encoded as follows:

			   R   Open for reading

			   W   Open for writing

			   A   Open for appending

			   S   Shared lock

			   X   Exclusive use

			   I   Asynchronous input and output notification

			   B   Block-if-in-use flag is set (shared line semaphore)

	       CNT	 Number of processes that know this open file.

	       GNO	 The location of the gnode table entry for this file.

	       OFFS/SOCK The file offset or the core address of the associated socket structure.  (See for information on file offsets.)

       -i   Displays the gnode table with the following headings:

	       LOC	 The core location of this table entry.

	       FLAGS	 Miscellaneous state variables, encoded as follows:

			   L   Locked.

			   U   Update time for the file system must be corrected. See the reference page for more information.

			   A   Access time must be corrected.

			   M   File system is mounted here.

			   W   Wanted by another process (L flag is on).

			   T   Contains a text file.

			   C   Changed time must be corrected.

			   S   Shared lock applied.

			   E   Exclusive lock applied.

			   Z   Someone waiting for an exclusive lock.

			   I   In-use flag is set (shared line semaphore).

	       CNT	 Number of open file table entries for this gnode.

	       DEV	 Major and minor device number of the file system in which this gnode resides.

	       RDC	 Reference count of shared locks on the gnode.

	       WRC	 Reference count of exclusive locks on the gnode. (This count can be >1 if, for example, a file  descriptor  is  inherited
			 across a fork.)

	       GNO	 I-number within the device.

	       MODE	 Mode bits. (See for information about mode bits.)

	       NLK	 Number of links to this gnode.

	       UID	 User identification (ID) of owner.

	       SIZ/DEV	 Number of bytes in an ordinary file or major and minor device of a special file.

       -k   Prevents  the process that is created from becoming too large, which can cause performance problems.  Use -k when you specify the sys-
	    tem or corefile argument.

       -p   Displays the process table for active processes with these headings:

	       LOC	 The core location of this table entry.

	       S	 Run state, encoded as follows:

			   0   No process

			   1   Waiting for some event

			   3   Able to be run

			   4   Being created

			   5   Being terminated

			   6   Stopped under trace

	       F	 Miscellaneous state variables, combined with a Boolean OR operation (hexadecimal):

			   00000001 Process is resident in memory.

			   00000002 System process:  swapper, pager, idle (RISC only), trusted path daemon.

			   00000004 Process is being swapped out.

			   00000008 Process requested swapout for page table growth.

			   00000010 Traced.

			   00000020 Used in tracing.

			   00000040 Locked in by a call.

			   00000080 Waiting for page-in to complete.

			   00000100 Protected from swapout while transferring resources to another process.

			   00000200 Used by a call.

			   00000400 Exiting.

			   00000800 Protected from swapout while doing physical input and output.

			   00001000 Process resulted from a call, which is not yet complete.

			   00002000 Parent has received resources returned by a child created with the call.

			   00004000 Process has no virtual memory because it is a parent in the context of the call.

			   00008000 Process is demand-paging data pages from its text gnode.

			   00010000 Process has advised of sequential memory access.

			   00020000 Process has advised of random memory access.

			   00080000 Process has indicated intent to execute data or stack (RISC only).

			   00100000 POSIX environment: no SIGCLD generated when children stop.

			   00200000 Process is owed a profiling tick.

			   00400000 Used by a call

			   00800000 A login process.

			   04000000 System V file lock applied.

			   08000000 Repair of unaligned accesses has been attempted (RISC only).

			   10000000 Process has called the system routine.

			   20000000 The idle process (RISC only).

	       POIP	 Number of pages currently being pushed out from this process.

	       PRI	 Scheduling priority. (See for information on priorities.)

	       SIGNAL	 Signals received (signals 1-32 coded in bits 0-31).

	       UID	 Real user ID.

	       SLP	 Amount of time the process has been blocked.

	       TIM	 Time resident in seconds; values greater than 127 are coded as 127.

	       CPU	 Weighted integral of CPU time, for scheduler.

	       NI	 Nice level. (See for information about nice levels.)

	       PGRP	 Process number of the root of the process group (the opener of the controlling terminal).

	       PID	 The process ID number.

	       PPID	 The process ID of the parent process.

	       ADDR	 If the process is in memory, identifies the user area page frame number of the page table entries.   If  the  process	is
			 swapped out, identifies the position in the swap area measured in multiples of 512 bytes.

	       RSS	 Resident set size minus the number of physical page frames allocated to this process.

	       SRSS	 RSS at last swap (0 if never swapped).

	       SIZE	 Virtual size of process image (data plus stack) in multiples of 512 bytes.

	       WCHAN	 Wait channel number of a waiting process.

	       LINK	 Link pointer in list of processes that can be run.

	       TEXTP	 If text is pure, pointer to location of text table entry.

	       CLKT	 Countdown  for  real  interval  timer,  measured in clock ticks (10 milliseconds). See the reference page for information
			 about the real interval timer.)

	       TTYP	 Address of controlling the terminal.

	       DMAP	 Address of data segment dmap structure.

	       SMAP	 Address of stack segment dmap structure.

       -s   Displays the following information about the pages used for swap space:

	       o    The number of pages reserved, but not necessarily allocated, by the system for currently executing processes.

	       o    The number of pages used (physically allocated), including the number used for text images.

	       o    The number of pages free, wasted, or missing. Free pages are pages that have not been allocated.  Missing  pages  are  usually
		    allocated to argdev.  Wasted pages indicate the amount of space lost because the swap space is fragmented.

	       o    The number of pages available, which indicates the amount of space available for swapping.

       -t   Displays the table for terminals with the following headings:

	       RAW	 Number of characters in the raw input queue.

	       CAN	 Number of characters in the canonic input queue.

	       OUT	 Number of characters in the output queue.

	       MODE	 Terminal mode, as described in

	       ADDR	 Physical device address.

	       DEL	 Number of delimiters (newlines) in the canonic input queue.

	       COL	 Calculated column position of the terminal.

	       STATE	 Miscellaneous state variables, encoded as follows:

			   T   Line is timed out.

			   W   Waiting for open to complete.

			   O   Open.

			   C   Carrier is on.

			   B   Busy doing output.

			   A   Process is awaiting output.

			   X   Open for exclusive use.

			   H   Hangup on close.

			   S   Output is stopped (ttstop).

			   I   In-use flag is set (shared line semaphore).

			   D   Open nodelay.

			   G   Ignore carrier.

			   N   Nonblocking input and output.

			   Z   Asychronous input and output notification.

			   L   Terminal line is in the process of closing.

			   Q   Output suspended for flow control.

	       PGRP	 Process group for which this is the controlling terminal.

	       DISC	 Line discipline; blank is old tty OTTYDISC, ntty for NTTYDISC, or termio for TERMIODISC.

       -T   Displays  the  number  of  used and free slots in the system tables.  This option is useful for determining how full the system tables
	    have become if the system is under a heavy load.

       -upid
	    Displays information about the specified user process. The pid argument is the process ID number as displayed  by  the  command.   The
	    process  must be in main memory, unless you specify the corefile argument on the command line. If you specify a core file, pid must be
	    0.

       -v   Displays a listing of all vector processes on the system. This option is valid only for processors that have the VAX vector  hardware.
	    The following list describes the headings in the display:

	       LOC	 The core location of this table entry

	       PPGRP	 The process number of the root of the process group (the opener of the controlling terminal)

	       PID	 The process ID number

	       PPID	 The process ID of the parent process

	       VSTAT	 One of the following vector process statuses:

			   WAIT   New vector process, which is waiting for a vector processor to be allocated to it.

			   LOAD   Process context is present in both vector and scalar processors.

			   SAVED  Process vector context is saved in memory.

			   LIMBO  A  vector  processor	has  been allocated to the process, but the vector context of the process has not yet been
				  loaded.

	       VERRS  Number of vector processor errors incurred by this process.

	       REFS   Number of times this process was refused scheduling into a vector processor.

	       CHPCXT Number of times the scaler context has been saved and restored, while the vector context remains resident in the vector pro-
		      cessor.

	       EPXCXT Number of times both the scalar and vector contexts have been saved and restored.

       -x   Displays the text table with the following headings:

	       LOC	 The core location of this table entry.

	       FLAGS	 Miscellaneous state variables encoded as follows:

			   T   A process called the system call.

			   W   Text has not yet been written on the swap device.

			   L   Loading is in progress.

			   K   Locked.

			   w   Wanted. (L flag is on.)

			   F   Text structure is on the freelist.

			   P   Resulted from demand-page-from-gnode execution format.  For further information, see

			   l   Locked from being paged or swapped.  For further information, see

			   B   All attached processes are being killed due to server write of an file.

	       DADDR	 Address of the text dmap structure in core.

	       CADDR	 Head of a linked list of loaded processes using this text segment.

	       SIZE	 Size of the text segment, measured in multiples of 512 bytes.

	       IPTR	 Core location of the corresponding gnode.

	       CNT	 Number of processes using this text segment.

	       CCNT	 Number of processes in core using this text segment.

	       LCNT	 Number of process locking this text segment.

	       POIP	 Number of pages currently being pushed out in this text segment.

	       CMAP	 The address of the last CMAP entry freed.

Files
       User process information

       Kernel memory

       System namelist

See Also
       ps(1), chmod(2), execve(2), getitimer(2), getpriority(2), lseek(2), plock(2), ptrace(2), stat(2), fs(5)

																	  pstat(8)
All times are GMT -4. The time now is 06:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy