Match multiple patterns sequentially in order - grep or awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match multiple patterns sequentially in order - grep or awk
# 8  
Old 07-19-2015
Quote:
Originally Posted by DSommers
Don, understood. Thank you for your help. Take care.
Cheers.

[SOLVED]
I'm glad you found a solution to problems 2 & 3. Would you please post your solution so other people reading this thread can learn from your experience?
# 9  
Old 07-20-2015
Some relevant feedback Don...
Now I remember why I stayed away from this forum.
I never understand people who try to guess what someone is thinking or intending. I approached this forum in a professional manner with a legitimate question. Just because an answer is given it doesn't mean it should be immediately understood. Non professionals cannot express and articulate the way you can or absorb your wisdom on the first try, there is always more than one way to approach a solution. If you create your own private forum for experts only and require all members to pass your test, it would save people who are not professional programmers from sanctimonious condescending implications. There were no expectations in my post, just a person asking for help who was stuck after many hours of searching on the net without resolution. I didn't say I found a solution. I moved on to a more friendly forum with people who are less aggressive and more helpful and do not micro focus on a typo or imperfectly formed question. Do what you do. I already marked the thread [Solved] If you feel a burning desire to have the last word again, feel free to tell it to someone else, I won't be back. Out.
# 10  
Old 07-20-2015
I'm sorry that you feel that way.

In this forum, we try to help people learn how to use the common tools provided on their system to do what they're trying to do; not just provide complete scripts.

When I make a suggestion on how to do something, I try to provide a script that will take sample input provided by the submitter (but there was none in this case) that will produce output that exactly matches the desired output specified by the submitter. While I could make up sample input data and write a script that would produce the sample output you provided, it would not match the description you provided for what you said you wanted to do. And the specification of what you wanted for output for missing fields (with no example output for that case) was ambiguous.

If you had answered any of my questions or had shown that you had tried to fix RudiC's suggested code to more exactly meet your requirements, I would have suggested that you try something like:
Code:
awk '
BEGIN {	# Set search pattern:
	pat = "^From |^From: |^Subject: |^Message-Id: |^Date: |^To: "
	# Extract mail message headers from search pattern...
	nh = split(pat, h, "|")
	for(i = 1; i <= nh; i++) {
		# Remove ">" and " " from the headers:
		gsub(/[ ^]/, "", h[i])
		# Set field to be printed for missing headers:
		b[i] = sprintf("%s \"Blank\"", h[i])
	}
}
function dump() {
	# Function to print headers from a mail message...
	printf("File: \"%s\" message #%d\n", FILENAME, ++msgcnt)
	for(i = 1; i <= nh; i++)
		if(h[i] in d) {
			printf("%s%s", d[h[i]], i == nh ? "\n" : "\t")
			delete d[h[i]]
		} else	printf("%s%s", b[i], i == nh ? "\n" : "\t")
}
FNR == 1 {
	# 1st line of new file found, print final results from previous file
	# and reset counters for this file.
	if(found)
		# Print headers from last mail message in previous file...
		dump()
	found = msgcnt = 0
}
/^From / && found++ {
	# Print headers from previous mail message...
	dump()
}
$0 ~ pat {
	# Gather data from current mail message...
	d[$1] = $0
}
END {	# Print headers from last mail message...
	if(found)
		dump()
}' inbox1 inbox2 inbox3...

Which produces output that I believe matches what you described in post #1 (except that it also outputs a line showing the file from which each message came and the sequence number within that file in case you want to process more than one file at a time) and makes a guess at the output you wanted for missing fields. I could also produce a 1-liner version of it, just to show that it can be done, but it wouldn't help people trying to learn how to write code to give them something that looks like it was intended to be an obfuscated code contest entry:
Code:
awk 'BEGIN{e="^From |^From: |^Subject: |^Message-Id: |^Date: |^To: ";n=split(e,h,"|");for(i=1;i<=n;i++){gsub(/[ ^]/,"",h[i]);b[i]=sprintf("%s \"Blank\"",h[i])}}function p(){printf("File: \"%s\" message #%d\n",FILENAME,++c);for(i=1;i<=n;i++)if(h[i] in d){printf("%s%s",d[h[i]],i==n?"\n":"\t");delete d[h[i]]}else printf("%s%s",b[i],i==n?"\n":"\t")}FNR==1{if(f)p();f=c=0}/^From /&&f++{p()}$0~e{d[$1]=$0}END{if(f)p()}' inbox1 inbox2 inbox3...

As I said before, if there someone showed me code like the above 1-liner and asked me to help them fix it; I would tell them to find someone else to clean up their mess.

As always, with either of these scripts, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
# 11  
Old 07-20-2015
Adapted RudiC solution - seems to be faster:
Code:
#!/bin/sh
awk '
function prtit() {
  # print the VALS[ ], separated by \t characters
  sep=""; for (i=1; i<=MX; i++) { printf "%s%s", sep, VALS[KEYS[i]]; sep="\t" }
  # print a newline
  printf "\n"
}
function clearit() {
  # clear the VALS[ ] with "Blank"
  for (i=1; i<=MX; i++) { VALS[KEYS[i]]="Blank" }
}
BEGIN {
  # initialize array KEYS[1..MX] from a string
  MX=split ("From From: To: Subject: Message-ID: Date:", KEYS)
  # initialize the hashed array VALS["From","From:",...]
  clearit()
  # now we can lookup a key with ("key" in VALS)
}
# main loop; this runs on every line
# if there is no leading space and the first word $1 is in VALS[ ]
/^[^[:space:]]/ && ($1 in VALS) {
  # "From" that is KEYS[1] starts a new chapter(mail)
  if ($1==KEYS[1]) {
    # print a previous chapter (and clear the VALS[ ] in this chapter)
    if (cfound) { prtit(); clearit() } else { cfound=1 }
  }
  VALS[$1]=$0
}
END {
  # at the end (after the last input line)
  if (cfound) prtit()
}
' "$@"

If you want a one-command invocation, save this as an executable script. (The "$@" passes the arguments to awk.)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

2. Shell Programming and Scripting

How to use grep with multiple patterns?

I am trying to grep a variable with multiple lines with multiple patterns below is the pattern list in a variable called "grouplst", each pattern is speerated by "|" grouplst="example1|example2|example3|example4|example5|example6|example7" I need to use the patterns above to grep a... (2 Replies)
Discussion started by: ajetangay
2 Replies

3. Shell Programming and Scripting

Grep from multiple patterns multiple file multiple output

Hi, I want to grep multiple patterns from multiple files and save to multiple outputs. As of now its outputting all to the same file when I use this command. Input : 108 files to check for 390 patterns to check for. output I need to 108 files with the searched patterns. Xargs -I {} grep... (3 Replies)
Discussion started by: Diya123
3 Replies

4. Shell Programming and Scripting

Match multiple patterns in a file and then print their respective next line

Dear all, I need to search multiple patterns and then I need to print their respective next lines. For an example, in the below table, I will look for 3 different patterns : 1) # ATC_Codes: 2) # Generic_Name: 3) # Drug_Target_1_Gene_Name: #BEGIN_DRUGCARD DB00001 # AHFS_Codes:... (3 Replies)
Discussion started by: AshwaniSharma09
3 Replies

5. Shell Programming and Scripting

print lines which match multiple patterns

Hi, I have a text file as follows: 11:38:11.054 run1_rdseq avg_2-5 999988.0000 1024.0000 11:50:52.053 run3_rdrand 999988.0000 1135.0 128.0417 11:53:18.050 run4_wrrand avg_2-5 999988.0000 8180.5833 11:55:42.051 run4_wrrand avg_2-5 999988.0000 213.8333 11:55:06.053... (2 Replies)
Discussion started by: annazpereira
2 Replies

6. Shell Programming and Scripting

grep for multiple patterns

I have a file with many rows. I want to grep for multiple patterns from the file. For eg: XX=123|YY=222|ZZ=566 AA=123|EE=222|GG=566 FF=123|RR=222|GG=566 DD=123|RR=222|GG=566 I want the lines which has both XX and ZZ. I know I can get it like this. grep XX file | grep YY But... (10 Replies)
Discussion started by: tene
10 Replies

7. Shell Programming and Scripting

Perl: Match a line with multiple search patterns

Hi I'm not very good with the serach patterns and I'd need a sample how to find a line that has multiple patterns. Say I want to find a line that has "abd", "123" and "QWERTY" and there can be any characters or numbers between the serach patterns, I have a file that has thousands of lines and... (10 Replies)
Discussion started by: Juha
10 Replies

8. Shell Programming and Scripting

Grep for Multiple patterns

Hi All, I have a file. I need to find multiple patterns in a row and need those rows to divert to new file. I tried using grep -e / -E / -F options as given in man. But its not working. ==> cat testgrep.txt william,fernandes,xxxxx mark,morsov,yyyy yy=,xx= yyyy=,xxxx== ==>... (7 Replies)
Discussion started by: WillImm123
7 Replies

9. Shell Programming and Scripting

Grep multiple patterns

Hi, Can we grep multiple patterns in UNIX. for example: cat /x/y/oratab | grep -i "pattern1|pattern2" .... etc I require the syntax for multiple patterns. | is not working as I explained in example. Malay (4 Replies)
Discussion started by: malaymaru
4 Replies

10. UNIX for Dummies Questions & Answers

grep for multiple patterns

I want to get a list of all the files in the current directory that have two patterns. I can do first grep of one pattern and then with the output do the grep of the second pattern. if the output of 1st pattern search results in many files, it is very difficult to do a grep of the 2nd pattern for... (1 Reply)
Discussion started by: tselvanin
1 Replies
Login or Register to Ask a Question