Turning to SED to select specific records


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Turning to SED to select specific records
# 1  
Old 06-18-2012
Bug Turning to SED to select specific records

Hi All,

I am looking for a simple concise solution most likely using sed to process the following 4 rows of data from the same record and only keeps it if the second record satisfy certain critea such as surname matches up to smith or jackson:

Code:
 
John (firstname)
Smith (surname) 
20/05/1984 (dob)
Male (gender)

It would have been possible to use AWK if the data are on the same line with a fixed delimiter.

There is no problem writing many lines of shell scripting but I am hoping to find an easy brief solution in SED but not familiar with how it could be done.

I am running on Solaris 10 x86 platform.


Your assistance would be much appreciated,
George

Last edited by gjackson123; 06-18-2012 at 10:36 AM.. Reason: Tidy up code & provide platform detail
# 2  
Old 06-18-2012
Can you please provide a sample input file and intended output file.... I am a bit confused about what you are assuming to be a record in your file.

Based on my understanding ( that a record will always be the combination of the above 4 rows, and the second row in the above set should begin with 'Smith' to be selected), here is my solution:

Code:
 sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith.*/ {print}' RS='*'

Note: This solution assumes that '*' does not appear anywhere in your data. Replace it with another character (which does not occur in your data) if this is not the case.

Last edited by jawsnnn; 06-18-2012 at 10:50 AM..
# 3  
Old 06-20-2012
Turning to SED to select specific records

SmilieHi jawsnnn,

Thanks for your valuable input.

There is no need to provide sample input data file since your understanding of its composition is correct as shown from this initial post. Nevertheless, I am wondering whether if you could provide a brief one liner explanation on how your code would work since my SED knowledge is limited. Also, which of the following minor updates would accommodate for more than one surname:

Code:
sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith.*|^Jone.*|^Green.*/ {print}' RS='*'
 
                               or
 
  sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^(Smith|Jone|Green).*/ {print}' RS='*'
 
                               or
 
  sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^(?:Smith|Jone|Green).*/ {print}' RS='*'

I will test out each of these statements to see which one work and let you know.
Thanks again,
George
# 4  
Old 06-20-2012
I think the first variation should work fine for multiple surnames. Let me explain the solution:

Code:
sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith.*/ {print}' RS='*'

1. I appended an asterisk '*' to the string (gender), i.e. the end of your record using sed by using
Code:
sed 's/(gender)/&*/g'

Here & is replaced by the matched string.

2. Then I divide the output of this command into records separated by '*' with fields separated by '/n' or newline character. This enables me to treat the four lines in each set as four different fields in the awk command. I achieve this by setting two variables:
Code:
RS='*'
and
-F='\n'

3. Then, I simply match the second field (i.e. the second row of all sets) to the pattern
Code:
^Smith.*

which matches fields starting with the string Smith followed by any character. In retrospect, the .* in the pattern is probably not needed.
# 5  
Old 06-20-2012
Hi, gjackson123.
Quote:
Originally Posted by gjackson123
Smilie ...There is no need to provide sample input data file ...
Meta-advice.

If one were to want more than one suggested solution, one would supply sample data. That allows consistency among results. Otherwise, you are putting an additional burden on the responders to come up with sample data, which, in addition to being likely different from one another, may not be representative of the real set. In general, if faced with the task of creating sample data in addition to a solution, then I probably will move on to other questions without attempting to solve the problem.

Best wishes ... cheers, drl

Last edited by drl; 06-22-2012 at 11:51 AM..
This User Gave Thanks to drl For This Post:
# 6  
Old 06-21-2012
Turning to SED to select specific records

SmilieHi jawsnnn & drl,

Below is the employee.txt as requested:

$ more employee.txt
Code:
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

Code:
$ sed 's/(gender)/&*/g' employee.txt | more 
John 
Barry 
21/04/1988 
Male 
Jessica 
Smith 
16/09/2000 
Female 
Joyce 
Brown 
05/12/1985 
Female 
Kyle 
Jones 
02/10/1945 
Male

It doesn’t look like the sed statement is doing anything with it. Should the (gender) be replaced with something else? What should I expect the data to look like out of sed and into awk which I am more comfortable with?

I am interested getting a solution with all everyone’s help.

Thanks again,

George

---------- Post updated 06-22-12 at 12:19 AM ---------- Previous update was 06-21-12 at 06:22 PM ----------

Hi,

Below are some more attempts to figure out how your SED & AWK statements work:


Code:
$ uname -a
SunOS startrek 5.10 Generic_141444-09 sun4v sparc SUNW,SPARC-Enterprise-T5220

Code:
$ more employee.txt
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

## Returned the same list & order
Code:
$ sed 's/(gender)/&*/g' employee.txt
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

## Returned the same list & order
Code:
$ sed 's/(Male)/&*/g' employee.txt  
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

## Returned the same list & order
Code:
$ sed 's/(Female)/&*/g' employee.txt
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


## Awk is not getting the right output from SED
Code:
$ sed 's/(Male)/&*/g' employee.txt | awk -F'\n' '$2 ~ /^Smith.*/ { print }' RS='*'
$

## Same input to AWK as from SED
Code:
$ awk -F'\n' '$2 ~ /^Smith.*/ { print }' RS='*' employee.txt                  
$


I suspect the problem is from
Code:
sed 's/(gender)/&*/g'

but I am still trying to wrap my head around it.

Also, what is the purpose of the round brackets () around gender, & and *? The sed statement appears to be doing a global replacement of (gender) with &* even though I not clear whether the gender should be replaced with something else?

Thanks a lot,

George

Last edited by gjackson123; 06-22-2012 at 02:27 AM.. Reason: Cleaned out spurious formatting
# 7  
Old 06-21-2012
Perhaps this is your requirement :

Code:
$ cat input
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


$ awk 'BEGIN { RS="Male|Female" } { print $1,$2,$3 } ' input
John Barry 21/04/1988
Jessica Smith 16/09/2000
Joyce Brown 05/12/1985
Kyle Jones 02/10/1945

This User Gave Thanks to Peasant For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies

2. Shell Programming and Scripting

Select records and fields

Hi All I would like to modify a file like this: >antax gioq21 tris notes abcdefghij klmnopqrs >betax gion32 ter notes2 tuvzabcdef ahgskslsooin this: >tris abcdefghij klmnopqrs >ter tuvzabcdef ahgskslsoo So, I would like to remove the first two fields(and output field 3) in record... (4 Replies)
Discussion started by: giuliangiuseppe
4 Replies

3. Shell Programming and Scripting

To select non-duplicate records using awk

Friends, I have data sorted on id like this id addressl 1 abc 2 abc 2 abc 2 abc 3 aabc 4 abc 4 abc I want to pick all ids with addressesses leaving out duplicate records. Desired output would be id address 1 abc 2 abc 3 abc 4 abc (5 Replies)
Discussion started by: paresh n doshi
5 Replies

4. Shell Programming and Scripting

awk print only select records from file2

Print only records from file 2 that do not match file 1 based on criteria of comparing column 1 and column 6 Was trying to play around with following code I found on other threads but not too successful Code: awk 'NR==FNR{p=$1;$1=x;A=$0;next}{$2=$2(A?A:",,,")}1' FS=~ OFS=~ file1 FS="*"... (11 Replies)
Discussion started by: sigh2010
11 Replies

5. Shell Programming and Scripting

Block of records to select from a file

Hello: I am new to shell script programming. Now I would like to select specific records block from a file. For example, current file "xyz.txt" is containing 1million records and want to select the block of records from line number 50000 to 100000 and save into a file. Can anyone suggest me how... (3 Replies)
Discussion started by: nvkuriseti
3 Replies

6. Shell Programming and Scripting

mysql how to select a specific row from a table

i have a table records ------------ id | user | time | event 91 admin | 12:00 | hi 92 admin | 11:00 | hi 93 admin | 12:00 | bye 94 admin | 13:00 | bye 95 root | 12:00 | hi 96 root | 12:30 | hi 97 root | 12:56 | hi how could i only select and display only the user and event from... (6 Replies)
Discussion started by: kpddong
6 Replies

7. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

8. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Dear list its my first post and i would like to greet everyone What i would like to do is select records 7 and 11 from each files in a folder then run an executable inside the script for the selected parameters. The file format is something like this 7 100 200 7 100 250 7 100 300 ... (1 Reply)
Discussion started by: Gtolis
1 Replies

9. Shell Programming and Scripting

Using a variable to select records with awk

As part of a bigger task, I had to read thru a file and separate records into various batches based on a field. Specifically, separate records based on the value in the batch field as defined below. The batch field left-justified numbers. The datafile is here > cat infile 12345 1 John Smith ... (5 Replies)
Discussion started by: joeyg
5 Replies

10. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies
Login or Register to Ask a Question