Turning to SED to select specific records


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Turning to SED to select specific records
# 8  
Old 06-21-2012
Try:
Code:
sed -n 'N;N;N;/\nSmith\n.*\n.*$/p' infile



--
Note: RS cannot be anything other than a single character. Only gawk and mawk have an extension so that RS can be a regex.

Last edited by Scrutinizer; 06-21-2012 at 02:51 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 9  
Old 06-21-2012
I apologize, my mistake, i missed to OS but i'm aware of limitation.

Regards
Peasant.
# 10  
Old 06-22-2012
I was under the impression that the string (gender) was part of your sample file.

Change the code to :
Code:
sed 's/^Male\|^Female/&*/g' file1 | awk -F'\n' '$2 ~ /^Smith/ {print}' RS='*'

For explanation of the commands refer to my last post. Hope it helps.

PS: If your record is always comprised of 4 rows, Scrutinizer's solution is better since it runs only one process.
This User Gave Thanks to jawsnnn For This Post:
# 11  
Old 06-22-2012
Will this do?

Code:
paste - - - - < inputfile|awk '$2 ~ /Smith/ {for(i=1;i<=NF;i++) printf("%s\n",$i)}'

This User Gave Thanks to elixir_sinari For This Post:
# 12  
Old 06-22-2012
Hi.
Quote:
Originally Posted by jawsnnn
I was under the impression that the string (gender) was part of your sample file. ... PS: If your record is always comprised of 4 rows, Scrutinizer's solution is better since it runs only one process.
Observation:
That impression is why it is so important for questions to include a representative sample of data.


I prefer modular solutions because I can often use the parts in other solutions. Here is a script that packages-up 4 lines into a "super line", searches line 2 (field 2 in the super line), and then unpacks. It uses "@" as the stand-in for the embedded newlines. There are two parts, one where the awk is coded directly, the other is done with shell functions (usable because of how concise awk can be -- usable, but the shell function is not very readable):
Code:
#!/usr/bin/env bash

# @(#) s3	Demonstrate bundling of n lines, search, unbundle.

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
bundle() { _n=${1-2}; _s=${3-" "}; awk 'ORS=NR%'"$_n"'?'\"$_s\"':"\n"' $2 ; }
search() { _c="$1" ; _s=${2-" "} ; awk '-F'"$_s" " $_c " $3 ; }
unbundle() { _s=${1-" "} ; awk '1' RS="$_s" $2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk

FILE=${1-data1}

pl " Sample of the input data file:"
head -5 $FILE

pl " Modular solution, awk directly coded:"
awk 'ORS=NR%4?"@":"\n"' $FILE |
tee f1 |
awk "-F@" '
$2 ~ /Smith|Jackson/
' |
tee f2 |
awk '1' RS='@'

pl " Modular solution, function calls:"
bundle 4 $FILE "@" |
tee f1 |
# awk "-F@" ' $2 ~ /Smith|Jackson/ ' |
search ' $2 ~ /Smith|Jackson/ ' "@" |
tee f2 |
unbundle "@"

exit 0

producing:
Code:
% ./s3

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5

-----
 Sample of the input data file:
John
Barry
21/04/1988
Male
Jessica

-----
 Modular solution, awk directly coded:
Jessica
Smith
16/09/2000
Female


-----
 Modular solution, function calls:
Jessica
Smith
16/09/2000
Female

Intermediate results can be seen from the files that tee produces.

The bundling idea came from https://www.unix.com/shell-programmin...ting-code.html although there are likely other sources as well. The shell search function is more for illustration. I think a directly coded awk script in the pipeline "sandwich" would be preferable.

I think that a general bundle utility would be very useful: two modes, one collecting lines as done here, another with strings (tokens) as done with xargs in the post cited.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 13  
Old 06-22-2012
Turning to SED to select specific records

SmilieHi Peasant,

Thanks for weighing into this thread.

Your suggestion is the closest to what I am looking for so far. Let's have a look at the how I have applied to the same example:

Code:
 
$ awk 'BEGIN { RS="Male|Female" } { if ($2 ~ /Smith/) { print $1, $2, $3 } }' employee.txt
Jessica Smith 16/09/2000

This is great so there is no need to use SED altogether. However, is it possible to include the record separator as part of the record as well? i.e. Jessica Smith 16/09/2000 Female.

Just to make the sample data a little more closer to the actual data which I am not able to review due to confidentiality reason which also include the pipe ”|” character as part of the record separator. For instance:

Code:
 
L|Employee.firstname|Jessica
L|Employee.surname|Smith
L|Employee.dob|16/09/2000
L|Employee.gender|Female

As a result, is it still possible to identify the last using RS=” L\|Employee.gender\|Male| L\|Employee.gender\|Female”, some way to search the second field ($2) of record separator instead of the whole line which is made up of pipe separated fields?

Apologies for coming up with a new record format which may annoy some moderators but we are very close to wrapping up this thread.

Thanks a lot,

George
# 14  
Old 06-22-2012
So I take it you have gawk installed on your Solaris box, and that is what you are using, because this will absolutely not work with the standard awks on Solaris....

Quote:
Originally Posted by gjackson123
[..]
Just to make the sample data a little more closer to the actual data which I am not able to review due to confidentiality reason which also include the pipe ”|” character as part of the record separator.
[..]
Apologies for coming up with a new record format which may annoy some moderators but we are very close to wrapping up this thread.
[..]
Apologies, yet you still are not providing a representative sample. Smilie
Please do not leave people guessing. Show a representative (anonymized) sample of your input and desired output or this thread will need to be closed.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies

2. Shell Programming and Scripting

Select records and fields

Hi All I would like to modify a file like this: >antax gioq21 tris notes abcdefghij klmnopqrs >betax gion32 ter notes2 tuvzabcdef ahgskslsooin this: >tris abcdefghij klmnopqrs >ter tuvzabcdef ahgskslsoo So, I would like to remove the first two fields(and output field 3) in record... (4 Replies)
Discussion started by: giuliangiuseppe
4 Replies

3. Shell Programming and Scripting

To select non-duplicate records using awk

Friends, I have data sorted on id like this id addressl 1 abc 2 abc 2 abc 2 abc 3 aabc 4 abc 4 abc I want to pick all ids with addressesses leaving out duplicate records. Desired output would be id address 1 abc 2 abc 3 abc 4 abc (5 Replies)
Discussion started by: paresh n doshi
5 Replies

4. Shell Programming and Scripting

awk print only select records from file2

Print only records from file 2 that do not match file 1 based on criteria of comparing column 1 and column 6 Was trying to play around with following code I found on other threads but not too successful Code: awk 'NR==FNR{p=$1;$1=x;A=$0;next}{$2=$2(A?A:",,,")}1' FS=~ OFS=~ file1 FS="*"... (11 Replies)
Discussion started by: sigh2010
11 Replies

5. Shell Programming and Scripting

Block of records to select from a file

Hello: I am new to shell script programming. Now I would like to select specific records block from a file. For example, current file "xyz.txt" is containing 1million records and want to select the block of records from line number 50000 to 100000 and save into a file. Can anyone suggest me how... (3 Replies)
Discussion started by: nvkuriseti
3 Replies

6. Shell Programming and Scripting

mysql how to select a specific row from a table

i have a table records ------------ id | user | time | event 91 admin | 12:00 | hi 92 admin | 11:00 | hi 93 admin | 12:00 | bye 94 admin | 13:00 | bye 95 root | 12:00 | hi 96 root | 12:30 | hi 97 root | 12:56 | hi how could i only select and display only the user and event from... (6 Replies)
Discussion started by: kpddong
6 Replies

7. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

8. Shell Programming and Scripting

Automatically select records from several files and then run a C executable file inside the script

Dear list its my first post and i would like to greet everyone What i would like to do is select records 7 and 11 from each files in a folder then run an executable inside the script for the selected parameters. The file format is something like this 7 100 200 7 100 250 7 100 300 ... (1 Reply)
Discussion started by: Gtolis
1 Replies

9. Shell Programming and Scripting

Using a variable to select records with awk

As part of a bigger task, I had to read thru a file and separate records into various batches based on a field. Specifically, separate records based on the value in the batch field as defined below. The batch field left-justified numbers. The datafile is here > cat infile 12345 1 John Smith ... (5 Replies)
Discussion started by: joeyg
5 Replies

10. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies
Login or Register to Ask a Question