Turning to SED to select specific records

06-23-2012

Registered User

64, 0

Join Date: Jul 2011

Last Activity: 26 March 2015, 3:49 AM EDT

Posts: 64

Thanks Given: 40

Thanked 0 Times in 0 Posts

Turning to SED to select specific records

Hi Peasant,

Thanks for weighing into this thread.

Your suggestion is the closest to what I am looking for so far. Let's have a look at the how I have applied it to the same example:

Code:

 
$  awk 'BEGIN { RS="Male|Female" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee.txt
Jessica Smith 16/09/2000

This is great so there is no need to use SED altogether. However, is it possible to include the record separator as part of the record as well? i.e. Jessica Smith 16/09/2000 Female.

Just to make the sample data a little more closer to the actual data which I am not able to review due to confidentiality reason which also include the pipe �|� character as part of the record separator. For instance:

Code:

 
L|Employee.firstname|Jessica
L|Employee.surname|Smith
L|Employee.dob|16/09/2000
L|Employee.gender|Female

As a result, is it still possible to identify the last using RS=� L\|Employee.gender\|Male| L\|Employee.gender\|Female�, some way to search the second field ($2) of record separator instead of the whole line which is made up of pipe separated fields?

Apologies for coming up with a new record format which may annoy some moderators but we are very close to wrapping up this thread.

Thanks a lot,

George

---------- Post updated 06-24-12 at 12:41 AM ---------- Previous update was 06-23-12 at 08:31 PM ----------

Hi All,

Thank you so much to everyone for providing many suggestions including some that would have taken you quite some time to prepare.

Code:

 
Yes, I am using gawk on Solaris 10:
 $ ls -lt /usr/local/bin/awk
lrwxrwxrwx   1 root     root           4 Jun  5  2008 /usr/local/bin/awk -> gawk

Code:

 
Below is a brief set of the final sample input data - employee1.txt:
 
$ more employee1.txt
L|Employee.firstname|John
L|Employee.surname|Barry
L|Employee.dob|21/04/1988
L|Employee.gender|Male
L|Employee.firstname|Jessica
L|Employee.surname|Smith
L|Employee.dob|16/09/2000
L|Employee.gender|Female
L|Employee.firstname|Joyce
L|Employee.surname|Brown
L|Employee.dob|05/12/1985
L|Employee.gender|Female
L|Employee.firstname|Kyle
L|Employee.surname|Jones
L|Employee.dob|02/10/1945
L|Employee.gender|Male
 
 
( i ) 
$ awk 'BEGIN { RS="L\|Employee.gender\|Male|L\|Employee.gender\|Female" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee1.txt
awk: warning: escape sequence `\|' treated as plain `|'
 
( ii ) $ awk 'BEGIN { RS="^*Employee.gender*$" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee.txt
$ 
 
 
( iii ) $ sed -n 'N;N;N;/\nL|Employee.surname|Smith\n.*\n.*$/p' employee1.txt | awk -F"|" '{ print $NF }'
Jessica
Smith
16/09/2000
Female
 
( iv ) $ paste - - - - < employee1.txt | awk '{ if ($2 ~ /Smith/ && $4  ~ /Employee.gender/) { print $1 "\n" $2 "\n" $3 "\n" $4 } }' | awk -F"|" '{ print $NF }'
Jessica
Smith
16/09/2000
Female

In short, attempts ( i ) & ( ii ) failed by using regex in RS with gawk. On the other hand, ( iii ) & ( iv ) have been successful but may be simplified further where possible.

We are already there but you may like to improve on existing solutions. This would not have been possible without your help.

Many thanks again,

George

gjackson123

View Public Profile for gjackson123

Find all posts by gjackson123

06-23-2012

Moderator

1,484, 567

Join Date: Mar 2011

Last Activity: 28 November 2020, 9:34 AM EST

Posts: 1,484

Thanks Given: 68

Thanked 567 Times in 444 Posts

Try this gawk, not too elegant but i'm out of ideas.

Code:

awk -F"|" ' $0=gensub(/Male|Female/,"&\n","g") { printf (" %s", $NF) } ' inputfile
 John Barry 21/04/1988 Male
 Jessica Smith 16/09/2000 Female
 Joyce Brown 05/12/1985 Female
 Kyle Jones 02/10/1945 Male

This User Gave Thanks to Peasant For This Post:

Peasant

View Public Profile for Peasant

Find all posts by Peasant

06-23-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Some more options:

Code:

awk -F\| '{print $NF}' infile | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p'

Code:

awk -F\| '{s=s RS $NF}{c=NR%4}c==2 && $NF=="Smith"{f=1} c==1{f=0;s=$NF} !c{if(f)print s}' infile

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

06-26-2012

Registered User

64, 0

Join Date: Jul 2011

Last Activity: 26 March 2015, 3:49 AM EDT

Posts: 64

Thanks Given: 40

Thanked 0 Times in 0 Posts

Sed & Awk boolean OR for more than 1 surnames

Hi All,

Both of the following statements provided are working:

Code:

 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith]\n.*\n>
Jessica
Smith
16/09/2000
Female
 
$ 
awk -F\| '{s=s RS $NF}{c=NR%4}c==2 && $NF=="Smith"{f=1} c==1{f=0;s=$NF} !c{f=0;s=$NF} !c{if(f)print s}' employee1.txt
Jessica
Smith
16/09/2000
Female

There is one last thing that I am looking for is:

Code:

 
Pick up records for employee with surname Smith / Jones.
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith|Jones]\n.*\n> did not work.
 
on the other hand, the second equivalent suggested AWK worked:
 
awk -F\| '{s=s RS $NF}{c=NR%4}c==2 && $NF ~ /Smith|Jones/
{f=1} c==1{f=0;s=$NF} !c{if(f)print s}' employee1.txt
 
Jessica
Smith
16/09/2000
Female
Kyle
Jones
02/10/1945
Male

Thanks again,

George

gjackson123

View Public Profile for gjackson123

Find all posts by gjackson123

06-26-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by gjackson123

Hi All,

Both of the following statements provided are working:

Code:

 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith]\n.*\n>
Jessica
Smith
16/09/2000
Female

[..]

The sed statement will not work properly because of the square brackets, which denote a single character that can be S,m,i,t or h...

Quote:

Code:

There is one last thing that I am looking for is:
 
Pick up records for employee with surname Smith / Jones.
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith|Jones]\n.*\n> did not work.
 
[..]

To make standard sed work you cannot use alternation, you would need to do something like this:

Code:

awk -F\| '{print $NF}' infile | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p;/\nJones\n.*\n.*$/p'

If your sed has an extension that supports extended regular expressions you could do this:

Code:

awk -F\| '{print $NF}' infile | sed -En 'N;N;N;/\n(Smith|Jones)\n.*\n.*$/p'

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

06-27-2012

Registered User

64, 0

Join Date: Jul 2011

Last Activity: 26 March 2015, 3:49 AM EDT

Posts: 64

Thanks Given: 40

Thanked 0 Times in 0 Posts

The Negation of union SED does not work

Hi,

Code:

 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p;/\nJones\n.*\n.*$/p'
Jessica
Smith
16/09/2000
Female
Kyle
Jones
02/10/1945
Male

The negation of Smith only works as follows:

Code:

 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p'
John
Barry
21/04/1988
Male
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

However, applying the same negation "!" to the same union of SED statement recommended generated duplicate number of records for each every records in the file:

Code:

 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\nSmith\n.*\n.*$/!p;/\nJones\n.*\n.*$/!p'
John
Barry
21/04/1988
Male
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

What is the reason for this & what is the correct syntax for it?

Thanks a lot again,

George

gjackson123

View Public Profile for gjackson123

Find all posts by gjackson123

06-27-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

The reason is that this means every records will be printed twice, except smith/jones which will be printed only once..

The negation of both using sed would be:

Code:

awk -F\| '{print $NF}' infile | sed 'N;N;N;/\nSmith\n.*\n.*$/d;/\nJones\n.*\n.*$/d'

--
or using an extended regular expression with GNU or BSD sed

Code:

awk -F\| '{print $NF}' infile | sed -E 'N;N;N;/\n(Smith|Jones)\n.*\n.*$/d'

Last edited by Scrutinizer; 06-27-2012 at 02:53 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

UNIX for Dummies Questions & Answers

Turning to SED to select specific records

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Quick way to select many records from a large file

Discussion started by: zenongz

2. Shell Programming and Scripting

Select records and fields

Discussion started by: giuliangiuseppe

3. Shell Programming and Scripting

To select non-duplicate records using awk

Discussion started by: paresh n doshi

4. Shell Programming and Scripting

awk print only select records from file2

Discussion started by: sigh2010

5. Shell Programming and Scripting

Block of records to select from a file

Discussion started by: nvkuriseti

6. Shell Programming and Scripting

mysql how to select a specific row from a table

Discussion started by: kpddong

7. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Discussion started by: Atrisa

8. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Discussion started by: Gtolis

9. Shell Programming and Scripting

Using a variable to select records with awk

Discussion started by: joeyg

10. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Discussion started by: shashi_kiran_v