Turning to SED to select specific records


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Turning to SED to select specific records
# 15  
Old 06-23-2012
Bug Turning to SED to select specific records

SmilieHi Peasant,

Thanks for weighing into this thread.

Your suggestion is the closest to what I am looking for so far. Let's have a look at the how I have applied it to the same example:

Code:
 
$  awk 'BEGIN { RS="Male|Female" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee.txt
Jessica Smith 16/09/2000

This is great so there is no need to use SED altogether. However, is it possible to include the record separator as part of the record as well? i.e. Jessica Smith 16/09/2000 Female.

Just to make the sample data a little more closer to the actual data which I am not able to review due to confidentiality reason which also include the pipe ”|” character as part of the record separator. For instance:

Code:
 
L|Employee.firstname|Jessica
L|Employee.surname|Smith
L|Employee.dob|16/09/2000
L|Employee.gender|Female

As a result, is it still possible to identify the last using RS=” L\|Employee.gender\|Male| L\|Employee.gender\|Female”, some way to search the second field ($2) of record separator instead of the whole line which is made up of pipe separated fields?

Apologies for coming up with a new record format which may annoy some moderators but we are very close to wrapping up this thread.

Thanks a lot,

George

---------- Post updated 06-24-12 at 12:41 AM ---------- Previous update was 06-23-12 at 08:31 PM ----------

Hi All,

Thank you so much to everyone for providing many suggestions including some that would have taken you quite some time to prepare.

Code:
 
Yes, I am using gawk on Solaris 10:
 $ ls -lt /usr/local/bin/awk
lrwxrwxrwx   1 root     root           4 Jun  5  2008 /usr/local/bin/awk -> gawk

Code:
 
Below is a brief set of the final sample input data - employee1.txt:
 
$ more employee1.txt
L|Employee.firstname|John
L|Employee.surname|Barry
L|Employee.dob|21/04/1988
L|Employee.gender|Male
L|Employee.firstname|Jessica
L|Employee.surname|Smith
L|Employee.dob|16/09/2000
L|Employee.gender|Female
L|Employee.firstname|Joyce
L|Employee.surname|Brown
L|Employee.dob|05/12/1985
L|Employee.gender|Female
L|Employee.firstname|Kyle
L|Employee.surname|Jones
L|Employee.dob|02/10/1945
L|Employee.gender|Male
 
 
( i ) 
$ awk 'BEGIN { RS="L\|Employee.gender\|Male|L\|Employee.gender\|Female" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee1.txt
awk: warning: escape sequence `\|' treated as plain `|'
 
( ii ) $ awk 'BEGIN { RS="^*Employee.gender*$" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee.txt
$ 
 
 
( iii ) $ sed -n 'N;N;N;/\nL|Employee.surname|Smith\n.*\n.*$/p' employee1.txt | awk -F"|" '{ print $NF }'
Jessica
Smith
16/09/2000
Female
 
( iv ) $ paste - - - - < employee1.txt | awk '{ if ($2 ~ /Smith/ && $4  ~ /Employee.gender/) { print $1 "\n" $2 "\n" $3 "\n" $4 } }' | awk -F"|" '{ print $NF }'
Jessica
Smith
16/09/2000
Female

In short, attempts ( i ) & ( ii ) failed by using regex in RS with gawk. On the other hand, ( iii ) & ( iv ) have been successful but may be simplified further where possible.

We are already there but you may like to improve on existing solutions. This would not have been possible without your help.

Many thanks again,

George
# 16  
Old 06-23-2012
Try this gawk, not too elegant but i'm out of ideas.

Code:
awk -F"|" ' $0=gensub(/Male|Female/,"&\n","g") { printf (" %s", $NF) } ' inputfile
 John Barry 21/04/1988 Male
 Jessica Smith 16/09/2000 Female
 Joyce Brown 05/12/1985 Female
 Kyle Jones 02/10/1945 Male

This User Gave Thanks to Peasant For This Post:
# 17  
Old 06-23-2012
Some more options:
Code:
awk -F\| '{print $NF}' infile | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p'

Code:
awk -F\| '{s=s RS $NF}{c=NR%4}c==2 && $NF=="Smith"{f=1} c==1{f=0;s=$NF} !c{if(f)print s}' infile

This User Gave Thanks to Scrutinizer For This Post:
# 18  
Old 06-26-2012
Data Sed & Awk boolean OR for more than 1 surnames

SmilieHi All,

Both of the following statements provided are working:

Code:
 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith]\n.*\n>
Jessica
Smith
16/09/2000
Female
 
$ 
awk -F\| '{s=s RS $NF}{c=NR%4}c==2 && $NF=="Smith"{f=1} c==1{f=0;s=$NF} !c{f=0;s=$NF} !c{if(f)print s}' employee1.txt
Jessica
Smith
16/09/2000
Female



There is one last thing that I am looking for is:

Code:
 
Pick up records for employee with surname Smith / Jones.
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith|Jones]\n.*\n> did not work.
 
on the other hand, the second equivalent suggested AWK worked:
 
awk -F\| '{s=s RS $NF}{c=NR%4}c==2 && $NF ~ /Smith|Jones/
{f=1} c==1{f=0;s=$NF} !c{if(f)print s}' employee1.txt
 
Jessica
Smith
16/09/2000
Female
Kyle
Jones
02/10/1945
Male


Thanks again,

George


# 19  
Old 06-26-2012
Quote:
Originally Posted by gjackson123
SmilieHi All,

Both of the following statements provided are working:

Code:
 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith]\n.*\n>
Jessica
Smith
16/09/2000
Female

[..]

The sed statement will not work properly because of the square brackets, which denote a single character that can be S,m,i,t or h...

Quote:
Code:
There is one last thing that I am looking for is:
 
Pick up records for employee with surname Smith / Jones.
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\n[Smith|Jones]\n.*\n> did not work.
 
[..]

To make standard sed work you cannot use alternation, you would need to do something like this:

Code:
awk -F\| '{print $NF}' infile | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p;/\nJones\n.*\n.*$/p'

If your sed has an extension that supports extended regular expressions you could do this:

Code:
awk -F\| '{print $NF}' infile | sed -En 'N;N;N;/\n(Smith|Jones)\n.*\n.*$/p'

This User Gave Thanks to Scrutinizer For This Post:
# 20  
Old 06-27-2012
The Negation of union SED does not work

SmilieHi,

Code:
 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p;/\nJones\n.*\n.*$/p'
Jessica
Smith
16/09/2000
Female
Kyle
Jones
02/10/1945
Male


The negation of Smith only works as follows:

Code:
 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\nSmith\n.*\n.*$/p'
John
Barry
21/04/1988
Male
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


However, applying the same negation "!" to the same union of SED statement recommended generated duplicate number of records for each every records in the file:

Code:
 
$ awk -F\| '{print $NF}' employee1.txt | sed -n 'N;N;N;/\nSmith\n.*\n.*$/!p;/\nJones\n.*\n.*$/!p'
John
Barry
21/04/1988
Male
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


What is the reason for this & what is the correct syntax for it?

Thanks a lot again,

George

# 21  
Old 06-27-2012
The reason is that this means every records will be printed twice, except smith/jones which will be printed only once..

The negation of both using sed would be:
Code:
awk -F\| '{print $NF}' infile | sed 'N;N;N;/\nSmith\n.*\n.*$/d;/\nJones\n.*\n.*$/d'



--
or using an extended regular expression with GNU or BSD sed
Code:
awk -F\| '{print $NF}' infile | sed -E 'N;N;N;/\n(Smith|Jones)\n.*\n.*$/d'


Last edited by Scrutinizer; 06-27-2012 at 02:53 AM..
This User Gave Thanks to Scrutinizer For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies

2. Shell Programming and Scripting

Select records and fields

Hi All I would like to modify a file like this: >antax gioq21 tris notes abcdefghij klmnopqrs >betax gion32 ter notes2 tuvzabcdef ahgskslsooin this: >tris abcdefghij klmnopqrs >ter tuvzabcdef ahgskslsoo So, I would like to remove the first two fields(and output field 3) in record... (4 Replies)
Discussion started by: giuliangiuseppe
4 Replies

3. Shell Programming and Scripting

To select non-duplicate records using awk

Friends, I have data sorted on id like this id addressl 1 abc 2 abc 2 abc 2 abc 3 aabc 4 abc 4 abc I want to pick all ids with addressesses leaving out duplicate records. Desired output would be id address 1 abc 2 abc 3 abc 4 abc (5 Replies)
Discussion started by: paresh n doshi
5 Replies

4. Shell Programming and Scripting

awk print only select records from file2

Print only records from file 2 that do not match file 1 based on criteria of comparing column 1 and column 6 Was trying to play around with following code I found on other threads but not too successful Code: awk 'NR==FNR{p=$1;$1=x;A=$0;next}{$2=$2(A?A:",,,")}1' FS=~ OFS=~ file1 FS="*"... (11 Replies)
Discussion started by: sigh2010
11 Replies

5. Shell Programming and Scripting

Block of records to select from a file

Hello: I am new to shell script programming. Now I would like to select specific records block from a file. For example, current file "xyz.txt" is containing 1million records and want to select the block of records from line number 50000 to 100000 and save into a file. Can anyone suggest me how... (3 Replies)
Discussion started by: nvkuriseti
3 Replies

6. Shell Programming and Scripting

mysql how to select a specific row from a table

i have a table records ------------ id | user | time | event 91 admin | 12:00 | hi 92 admin | 11:00 | hi 93 admin | 12:00 | bye 94 admin | 13:00 | bye 95 root | 12:00 | hi 96 root | 12:30 | hi 97 root | 12:56 | hi how could i only select and display only the user and event from... (6 Replies)
Discussion started by: kpddong
6 Replies

7. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

8. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Dear list its my first post and i would like to greet everyone What i would like to do is select records 7 and 11 from each files in a folder then run an executable inside the script for the selected parameters. The file format is something like this 7 100 200 7 100 250 7 100 300 ... (1 Reply)
Discussion started by: Gtolis
1 Replies

9. Shell Programming and Scripting

Using a variable to select records with awk

As part of a bigger task, I had to read thru a file and separate records into various batches based on a field. Specifically, separate records based on the value in the batch field as defined below. The batch field left-justified numbers. The datafile is here > cat infile 12345 1 John Smith ... (5 Replies)
Discussion started by: joeyg
5 Replies

10. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies
Login or Register to Ask a Question