I am looking for a simple concise solution most likely using sed to process the following 4 rows of data from the same record and only keeps it if the second record satisfy certain critea such as surname matches up to smith or jackson:
It would have been possible to use AWK if the data are on the same line with a fixed delimiter.
There is no problem writing many lines of shell scripting but I am hoping to find an easy brief solution in SED but not familiar with how it could be done.
I am running on Solaris 10 x86 platform.
Your assistance would be much appreciated,
George
Last edited by gjackson123; 06-18-2012 at 10:36 AM..
Reason: Tidy up code & provide platform detail
Can you please provide a sample input file and intended output file.... I am a bit confused about what you are assuming to be a record in your file.
Based on my understanding ( that a record will always be the combination of the above 4 rows, and the second row in the above set should begin with 'Smith' to be selected), here is my solution:
Note: This solution assumes that '*' does not appear anywhere in your data. Replace it with another character (which does not occur in your data) if this is not the case.
There is no need to provide sample input data file since your understanding of its composition is correct as shown from this initial post. Nevertheless, I am wondering whether if you could provide a brief one liner explanation on how your code would work since my SED knowledge is limited. Also, which of the following minor updates would accommodate for more than one surname:
I will test out each of these statements to see which one work and let you know.
Thanks again,
George
I think the first variation should work fine for multiple surnames. Let me explain the solution:
1. I appended an asterisk '*' to the string (gender), i.e. the end of your record using sed by using
Here & is replaced by the matched string.
2. Then I divide the output of this command into records separated by '*' with fields separated by '/n' or newline character. This enables me to treat the four lines in each set as four different fields in the awk command. I achieve this by setting two variables:
3. Then, I simply match the second field (i.e. the second row of all sets) to the pattern
which matches fields starting with the string Smith followed by any character. In retrospect, the .* in the pattern is probably not needed.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi, gjackson123.
Quote:
Originally Posted by gjackson123
...There is no need to provide sample input data file ...
Meta-advice.
If one were to want more than one suggested solution, one would supply sample data. That allows consistency among results. Otherwise, you are putting an additional burden on the responders to come up with sample data, which, in addition to being likely different from one another, may not be representative of the real set. In general, if faced with the task of creating sample data in addition to a solution, then I probably will move on to other questions without attempting to solve the problem.
$ more employee.txt
It doesn’t look like the sed statement is doing anything with it. Should the (gender) be replaced with something else? What should I expect the data to look like out of sed and into awk which I am more comfortable with?
I am interested getting a solution with all everyone’s help.
Thanks again,
George
---------- Post updated 06-22-12 at 12:19 AM ---------- Previous update was 06-21-12 at 06:22 PM ----------
Hi,
Below are some more attempts to figure out how your SED & AWK statements work:
## Returned the same list & order
## Returned the same list & order
## Returned the same list & order
## Awk is not getting the right output from SED
## Same input to AWK as from SED
I suspect the problem is from
but I am still trying to wrap my head around it.
Also, what is the purpose of the round brackets () around gender, & and *? The sed statement appears to be doing a global replacement of (gender) with &* even though I not clear whether the gender should be replaced with something else?
Thanks a lot,
George
Last edited by gjackson123; 06-22-2012 at 02:27 AM..
Reason: Cleaned out spurious formatting
I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...
Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Hi All
I would like to modify a file like this:
>antax gioq21 tris notes
abcdefghij
klmnopqrs
>betax gion32 ter notes2
tuvzabcdef
ahgskslsooin this:
>tris
abcdefghij
klmnopqrs
>ter
tuvzabcdef
ahgskslsoo
So, I would like to remove the first two fields(and output field 3) in record... (4 Replies)
Friends,
I have data sorted on id like this
id addressl
1 abc
2 abc
2 abc
2 abc
3 aabc
4 abc
4 abc
I want to pick all ids with addressesses leaving out duplicate records. Desired output would be
id address
1 abc
2 abc
3 abc
4 abc (5 Replies)
Print only records from file 2 that do not match file 1 based on criteria of comparing column 1 and column 6
Was trying to play around with following code I found on other threads but not too successful
Code:
awk 'NR==FNR{p=$1;$1=x;A=$0;next}{$2=$2(A?A:",,,")}1' FS=~ OFS=~ file1 FS="*"... (11 Replies)
Hello:
I am new to shell script programming. Now I would like to select specific records block from a file. For example, current file "xyz.txt" is containing 1million records and want to select the block of records from line number 50000 to 100000 and save into a file. Can anyone suggest me how... (3 Replies)
i have a table
records
------------
id | user | time | event
91 admin | 12:00 | hi
92 admin | 11:00 | hi
93 admin | 12:00 | bye
94 admin | 13:00 | bye
95 root | 12:00 | hi
96 root | 12:30 | hi
97 root | 12:56 | hi
how could i only select and display only the user and event from... (6 Replies)
Hi everyone.
I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this:
ID: 20
Name: X
Age: 19
ID: 21
Name: Z
ID: 22
Email: xxx@yahoo.com
Name: Y
Age: 19
I want to grep records that... (4 Replies)
Dear list
its my first post and i would like to greet everyone
What i would like to do is select records 7 and 11 from each files in a folder then run an executable inside the script for the selected parameters.
The file format is something like this
7 100 200
7 100 250
7 100 300 ... (1 Reply)
As part of a bigger task, I had to read thru a file and separate records into various batches based on a field. Specifically, separate records based on the value in the batch field as defined below. The batch field left-justified numbers.
The datafile is here
> cat infile
12345 1 John Smith ... (5 Replies)
Hi All,
I need to select only those records having a non zero record in the first column of a comma delimited file.
Suppose my input file is having data like:
"0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005"
"0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)