Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 06-18-2012
Registered User
 
Join Date: Jul 2011
Posts: 26
Thanks: 19
Thanked 0 Times in 0 Posts
Bug Turning to SED to select specific records

Hi All,

I am looking for a simple concise solution most likely using sed to process the following 4 rows of data from the same record and only keeps it if the second record satisfy certain critea such as surname matches up to smith or jackson:


Code:
 
John (firstname)
Smith (surname) 
20/05/1984 (dob)
Male (gender)

It would have been possible to use AWK if the data are on the same line with a fixed delimiter.

There is no problem writing many lines of shell scripting but I am hoping to find an easy brief solution in SED but not familiar with how it could be done.

I am running on Solaris 10 x86 platform.


Your assistance would be much appreciated,
George

Last edited by gjackson123; 06-18-2012 at 09:36 AM.. Reason: Tidy up code & provide platform detail
Sponsored Links
    #2  
Old 06-18-2012
Registered User
 
Join Date: May 2012
Posts: 58
Thanks: 5
Thanked 9 Times in 9 Posts
Can you please provide a sample input file and intended output file.... I am a bit confused about what you are assuming to be a record in your file.

Based on my understanding ( that a record will always be the combination of the above 4 rows, and the second row in the above set should begin with 'Smith' to be selected), here is my solution:


Code:
 sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith.*/ {print}' RS='*'

Note: This solution assumes that '*' does not appear anywhere in your data. Replace it with another character (which does not occur in your data) if this is not the case.

Last edited by jawsnnn; 06-18-2012 at 09:50 AM..
Sponsored Links
    #3  
Old 06-20-2012
Registered User
 
Join Date: Jul 2011
Posts: 26
Thanks: 19
Thanked 0 Times in 0 Posts
Turning to SED to select specific records

Hi jawsnnn,

Thanks for your valuable input.

There is no need to provide sample input data file since your understanding of its composition is correct as shown from this initial post. Nevertheless, I am wondering whether if you could provide a brief one liner explanation on how your code would work since my SED knowledge is limited. Also, which of the following minor updates would accommodate for more than one surname:


Code:
sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith.*|^Jone.*|^Green.*/ {print}' RS='*'
 
                               or
 
  sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^(Smith|Jone|Green).*/ {print}' RS='*'
 
                               or
 
  sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^(?:Smith|Jone|Green).*/ {print}' RS='*'

I will test out each of these statements to see which one work and let you know.
Thanks again,
George
    #4  
Old 06-20-2012
Registered User
 
Join Date: May 2012
Posts: 58
Thanks: 5
Thanked 9 Times in 9 Posts
I think the first variation should work fine for multiple surnames. Let me explain the solution:


Code:
sed 's/(gender)/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith.*/ {print}' RS='*'

1. I appended an asterisk '*' to the string (gender), i.e. the end of your record using sed by using

Code:
sed 's/(gender)/&*/g'

Here & is replaced by the matched string.

2. Then I divide the output of this command into records separated by '*' with fields separated by '/n' or newline character. This enables me to treat the four lines in each set as four different fields in the awk command. I achieve this by setting two variables:

Code:
RS='*'
and
-F='\n'

3. Then, I simply match the second field (i.e. the second row of all sets) to the pattern

Code:
^Smith.*

which matches fields starting with the string Smith followed by any character. In retrospect, the .* in the pattern is probably not needed.
Sponsored Links
    #5  
Old 06-20-2012
drl's Avatar
drl drl is offline Forum Advisor  
Registered Voter
 
Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 1,480
Thanks: 15
Thanked 134 Times in 122 Posts
Hi, gjackson123.
Quote:
Originally Posted by gjackson123 View Post
...There is no need to provide sample input data file ...
Meta-advice.

If one were to want more than one suggested solution, one would supply sample data. That allows consistency among results. Otherwise, you are putting an additional burden on the responders to come up with sample data, which, in addition to being likely different from one another, may not be representative of the real set. In general, if faced with the task of creating sample data in addition to a solution, then I probably will move on to other questions without attempting to solve the problem.

Best wishes ... cheers, drl

Last edited by drl; 06-22-2012 at 10:51 AM..
The Following User Says Thank You to drl For This Useful Post:
Scrutinizer (06-21-2012)
Sponsored Links
    #6  
Old 06-21-2012
Registered User
 
Join Date: Jul 2011
Posts: 26
Thanks: 19
Thanked 0 Times in 0 Posts
Turning to SED to select specific records

Hi jawsnnn & drl,

Below is the employee.txt as requested:

$ more employee.txt

Code:
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


Code:
$ sed 's/(gender)/&*/g' employee.txt | more 
John 
Barry 
21/04/1988 
Male 
Jessica 
Smith 
16/09/2000 
Female 
Joyce 
Brown 
05/12/1985 
Female 
Kyle 
Jones 
02/10/1945 
Male

It doesn’t look like the sed statement is doing anything with it. Should the (gender) be replaced with something else? What should I expect the data to look like out of sed and into awk which I am more comfortable with?

I am interested getting a solution with all everyone’s help.

Thanks again,

George

---------- Post updated 06-22-12 at 12:19 AM ---------- Previous update was 06-21-12 at 06:22 PM ----------

Hi,

Below are some more attempts to figure out how your SED & AWK statements work:



Code:
$ uname -a
SunOS startrek 5.10 Generic_141444-09 sun4v sparc SUNW,SPARC-Enterprise-T5220


Code:
$ more employee.txt
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

## Returned the same list & order

Code:
$ sed 's/(gender)/&*/g' employee.txt
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

## Returned the same list & order

Code:
$ sed 's/(Male)/&*/g' employee.txt  
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male

## Returned the same list & order

Code:
$ sed 's/(Female)/&*/g' employee.txt
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


## Awk is not getting the right output from SED

Code:
$ sed 's/(Male)/&*/g' employee.txt | awk -F'\n' '$2 ~ /^Smith.*/ { print }' RS='*'
$

## Same input to AWK as from SED

Code:
$ awk -F'\n' '$2 ~ /^Smith.*/ { print }' RS='*' employee.txt                  
$


I suspect the problem is from
Code:
sed 's/(gender)/&*/g'

but I am still trying to wrap my head around it.

Also, what is the purpose of the round brackets () around gender, & and *? The sed statement appears to be doing a global replacement of (gender) with &* even though I not clear whether the gender should be replaced with something else?

Thanks a lot,

George

Last edited by gjackson123; 06-22-2012 at 01:27 AM.. Reason: Cleaned out spurious formatting
Sponsored Links
    #7  
Old 06-21-2012
Peasant's Avatar
Registered User
 
Join Date: Mar 2011
Posts: 509
Thanks: 14
Thanked 104 Times in 102 Posts
Perhaps this is your requirement :


Code:
$ cat input
John
Barry
21/04/1988
Male
Jessica
Smith
16/09/2000
Female
Joyce
Brown
05/12/1985
Female
Kyle
Jones
02/10/1945
Male


$ awk 'BEGIN { RS="Male|Female" } { print $1,$2,$3 } ' input
John Barry 21/04/1988
Jessica Smith 16/09/2000
Joyce Brown 05/12/1985
Kyle Jones 02/10/1945

The Following User Says Thank You to Peasant For This Useful Post:
gjackson123 (06-30-2012)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk print only select records from file2 sigh2010 Shell Programming and Scripting 11 08-09-2011 02:02 PM
Block of records to select from a file nvkuriseti Shell Programming and Scripting 3 04-20-2011 05:54 AM
Grep specific records from a file of records that are separated by an empty line Atrisa UNIX for Dummies Questions & Answers 4 12-14-2010 05:51 AM
Using a variable to select records with awk joeyg Shell Programming and Scripting 5 09-26-2008 10:48 AM
Select records based on search criteria on first column shashi_kiran_v UNIX for Dummies Questions & Answers 2 12-02-2005 12:49 PM



All times are GMT -4. The time now is 09:22 AM.