Turning to SED to select specific records

06-21-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Try:

Code:

sed -n 'N;N;N;/\nSmith\n.*\n.*$/p' infile

--
Note: RS cannot be anything other than a single character. Only gawk and mawk have an extension so that RS can be a regex.

Last edited by Scrutinizer; 06-21-2012 at 02:51 PM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

06-21-2012

Moderator

1,484, 567

Join Date: Mar 2011

Last Activity: 28 November 2020, 9:34 AM EST

Posts: 1,484

Thanks Given: 68

Thanked 567 Times in 444 Posts

I apologize, my mistake, i missed to OS but i'm aware of limitation.

Regards
Peasant.

Peasant

View Public Profile for Peasant

Find all posts by Peasant

06-22-2012

Banned

68, 9

Join Date: May 2012

Last Activity: 7 August 2015, 4:00 PM EDT

Posts: 68

Thanks Given: 7

Thanked 9 Times in 9 Posts

I was under the impression that the string (gender) was part of your sample file.

Change the code to :

Code:

sed 's/^Male\|^Female/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith/ {print}' RS='*'

For explanation of the commands refer to my last post. Hope it helps.

PS: If your record is always comprised of 4 rows, Scrutinizer's solution is better since it runs only one process.

This User Gave Thanks to jawsnnn For This Post:

jawsnnn

View Public Profile for jawsnnn

Find all posts by jawsnnn

06-22-2012

Registered User

1,413, 498

Join Date: Mar 2012

Last Activity: 8 November 2019, 2:39 AM EST

Location: India

Posts: 1,413

Thanks Given: 101

Thanked 498 Times in 474 Posts

Will this do?

Code:

paste - - - - < inputfile|awk '$2 ~ /Smith/ {for(i=1;i<=NF;i++) printf("%s\n",$i)}'

This User Gave Thanks to elixir_sinari For This Post:

elixir_sinari

View Public Profile for elixir_sinari

Find all posts by elixir_sinari

06-22-2012

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Quote:

Originally Posted by jawsnnn

I was under the impression that the string (gender) was part of your sample file. ... PS: If your record is always comprised of 4 rows, Scrutinizer's solution is better since it runs only one process.

Observation:
That impression is why it is so important for questions to include a representative sample of data.

I prefer modular solutions because I can often use the parts in other solutions. Here is a script that packages-up 4 lines into a "super line", searches line 2 (field 2 in the super line), and then unpacks. It uses "@" as the stand-in for the embedded newlines. There are two parts, one where the awk is coded directly, the other is done with shell functions (usable because of how concise awk can be -- usable, but the shell function is not very readable):

Code:

#!/usr/bin/env bash

# @(#) s3	Demonstrate bundling of n lines, search, unbundle.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
bundle() { _n=${1-2}; _s=${3-" "}; awk 'ORS=NR%'"$_n"'?'\"$_s\"':"\n"' $2 ; }
search() { _c="$1" ; _s=${2-" "} ; awk '-F'"$_s" " $_c " $3 ; }
unbundle() { _s=${1-" "} ; awk '1' RS="$_s" $2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

FILE=${1-data1}

pl " Sample of the input data file:"
head -5 $FILE

pl " Modular solution, awk directly coded:"
awk 'ORS=NR%4?"@":"\n"' $FILE |
tee f1 |
awk "-F@" '
$2 ~ /Smith|Jackson/
' |
tee f2 |
awk '1' RS='@'

pl " Modular solution, function calls:"
bundle 4 $FILE "@" |
tee f1 |
# awk "-F@" ' $2 ~ /Smith|Jackson/ ' |
search ' $2 ~ /Smith|Jackson/ ' "@" |
tee f2 |
unbundle "@"

exit 0

producing:

Code:

% ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Sample of the input data file:
John
Barry
21/04/1988
Male
Jessica

-----
 Modular solution, awk directly coded:
Jessica
Smith
16/09/2000
Female


-----
 Modular solution, function calls:
Jessica
Smith
16/09/2000
Female

Intermediate results can be seen from the files that tee produces.

The bundling idea came from https://www.unix.com/shell-programmin...ting-code.html although there are likely other sources as well. The shell search function is more for illustration. I think a directly coded awk script in the pipeline "sandwich" would be preferable.

I think that a general bundle utility would be very useful: two modes, one collecting lines as done here, another with strings (tokens) as done with xargs in the post cited.

Best wishes ... cheers, drl

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

06-22-2012

Registered User

64, 0

Join Date: Jul 2011

Last Activity: 26 March 2015, 3:49 AM EDT

Posts: 64

Thanks Given: 40

Thanked 0 Times in 0 Posts

Turning to SED to select specific records

Hi Peasant,

Thanks for weighing into this thread.

Your suggestion is the closest to what I am looking for so far. Let's have a look at the how I have applied to the same example:

Code:

 
$ awk 'BEGIN { RS="Male|Female" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee.txt
Jessica Smith 16/09/2000

This is great so there is no need to use SED altogether. However, is it possible to include the record separator as part of the record as well? i.e. Jessica Smith 16/09/2000 Female.

Just to make the sample data a little more closer to the actual data which I am not able to review due to confidentiality reason which also include the pipe �|� character as part of the record separator. For instance:

Code:

 
L|Employee.firstname|Jessica
L|Employee.surname|Smith
L|Employee.dob|16/09/2000
L|Employee.gender|Female

As a result, is it still possible to identify the last using RS=� L\|Employee.gender\|Male| L\|Employee.gender\|Female�, some way to search the second field ($2) of record separator instead of the whole line which is made up of pipe separated fields?

Apologies for coming up with a new record format which may annoy some moderators but we are very close to wrapping up this thread.

Thanks a lot,

George

gjackson123

View Public Profile for gjackson123

Find all posts by gjackson123

06-22-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

So I take it you have gawk installed on your Solaris box, and that is what you are using, because this will absolutely not work with the standard awks on Solaris....

Quote:

Originally Posted by gjackson123

[..]
Just to make the sample data a little more closer to the actual data which I am not able to review due to confidentiality reason which also include the pipe ”|” character as part of the record separator.
[..]
Apologies for coming up with a new record format which may annoy some moderators but we are very close to wrapping up this thread.
[..]

Apologies, yet you still are not providing a representative sample.

Please do not leave people guessing. Show a representative (anonymized) sample of your input and desired output or this thread will need to be closed.

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

UNIX for Dummies Questions & Answers

Turning to SED to select specific records

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Quick way to select many records from a large file

Discussion started by: zenongz

2. Shell Programming and Scripting

Select records and fields

Discussion started by: giuliangiuseppe

3. Shell Programming and Scripting

To select non-duplicate records using awk

Discussion started by: paresh n doshi

4. Shell Programming and Scripting

awk print only select records from file2

Discussion started by: sigh2010

5. Shell Programming and Scripting

Block of records to select from a file

Discussion started by: nvkuriseti

6. Shell Programming and Scripting

mysql how to select a specific row from a table

Discussion started by: kpddong

7. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Discussion started by: Atrisa

8. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Discussion started by: Gtolis

9. Shell Programming and Scripting

Using a variable to select records with awk

Discussion started by: joeyg

10. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Discussion started by: shashi_kiran_v