Search/grep on row and column wise


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search/grep on row and column wise
# 1  
Old 06-20-2015
Search/grep on row and column wise

Hello,
I have a comma seperate metadata as follows:

Code:
CITY        ,COUNTY,STATE,COUNTRY
NEW_YORK,NYC     ,NY     ,USA
NEWARK    ,ESSEX ,NJ      ,USA
CHICAGO   ,COOK  ,IL       ,USA
SEATTLE   ,MINER ,WA     ,USA


In my process, I get two key values ie
CITY NAME (can be one of the values in first column shown above)
COLUMN NAME (I get one of the values in the 1st row, ie either CITY or COUNTY or STATE or COUNTRY)

For EX:
If my keys are CHICAGO & STATE, then I need my ouput as IL
If my keys are NEWARK & COUNTRY, then I need my ouput as USA


Please note that the column position can change. ie the order in which I get data in my columns is not necessarily in the order shown above. I may get my columns in this order also -> CITY,STATE,COUNTY,STATE

Now to implement this solution, I can grep on the CITY_NAME but then I need to know where the other column is. I am trying different solutions but without much success.

Any simple solutions to find this using sed or awk is appreciated.
Thanks.

Last edited by Don Cragun; 06-21-2015 at 12:18 AM.. Reason: Add CODE and ICODE tags.
# 2  
Old 06-21-2015
None of your examples match your sample data.

You don't have a field named CITY; you have a field named CITY followed by eight spaces. If you try to look for a value in the CITY field, it should fail because there is no CItY field.

With a comma separated data file, you don't have a city name of CHICAGO; you have a city name of CHICAGO followed by three spaces, and you don't have a city name of NEWARK; you have a city name of NEWARK followed by four spaces.

And, if someone entered the keys, CHICAGO followed by three spaces and STATE they should get IL followed by 7 spaces; not just IL.

And, if you have columns in the order CITY,STATE,COUNTY,STATE (which you said is possible) and the user gives keys NEW_YORK and STATE are you supposed to return the value in column 2, column 4, or both???

How are the keys supplied to your script?

What operating system and shell are you using?

What have you tried so far?

Is this a homework assignment?
# 3  
Old 06-21-2015
Quote:
Originally Posted by Don Cragun
None of your examples match your sample data.

You don't have a field named CITY; you have a field named CITY followed by eight spaces. If you try to look for a value in the CITY field, it should fail because there is no CItY field.

With a comma separated data file, you don't have a city name of CHICAGO; you have a city name of CHICAGO followed by three spaces, and you don't have a city name of NEWARK; you have a city name of NEWARK followed by four spaces.

And, if someone entered the keys, CHICAGO followed by three spaces and STATE they should get IL followed by 7 spaces; not just IL.

And, if you have columns in the order CITY,STATE,COUNTY,STATE (which you said is possible) and the user gives keys NEW_YORK and STATE are you supposed to return the value in column 2, column 4, or both???

How are the keys supplied to your script?

What operating system and shell are you using?

What have you tried so far?

Is this a homework assignment?



Sorry for the confusion but there are no space in the data. I only put spaces to make it easier to understand. Here's how the data will appear:

Code:
CITY,COUNTY,STATE,COUNTRY
NEW_YORK,NYC,NY,USA
NEWARK,ESSEX,NJ,USA
CHICAGO,COOK,IL,USA
SEATTLE,MINER,WA,USA

Q. How are the keys supplied to your script? A value from first column and a column name
Ex: NEWARK & STATE, SEATTLE & COUNTRY, NEWARK & COUNTY

Q. What operating system and shell are you using? LINUX and ksh

So far, I have tried doing a egrep on both CITY and the value from the first row (ex: NEW_YORK or CHICAGO) to get two rows. But after that, I need to narrow down the value to a specific column name which can be either COUNTY or STATE or COUNTRY and here's where I am struck.
Moderator's Comments:
Mod Comment If you don't start using CODE tags (as required by forum rules) you may be banned from The UNIX & Linux Forums.


No, this is not a homework but rather I need to mine a metadata file to narrow down the value to a specific row and column.

Hope my explantion helps.

Last edited by Don Cragun; 06-21-2015 at 01:16 AM.. Reason: Add CODE tags, again.
# 4  
Old 06-21-2015
Quote:
Originally Posted by calredd
Sorry for the confusion but there are no space in the data. I only put spaces to make it easier to understand. Here's how the data will appear:

Code:
CITY,COUNTY,STATE,COUNTRY
NEW_YORK,NYC,NY,USA
NEWARK,ESSEX,NJ,USA
CHICAGO,COOK,IL,USA
SEATTLE,MINER,WA,USA

Q. How are the keys supplied to your script? A value from first column and a column name
Ex: NEWARK & STATE, SEATTLE & COUNTRY, NEWARK & COUNTY


Q. What operating system and shell are you using? LINUX and ksh

So far, I have tried doing a egrep on both CITY and the value from the first row (ex: NEW_YORK or CHICAGO) to get two rows. But after that, I need to narrow down the value to a specific column name which can be either COUNTY or STATE or COUNTRY and here's where I am struck.
Moderator's Comments:
Mod Comment If you don't start using CODE tags (as required by forum rules) you may be banned from The UNIX & Linux Forums.


No, this is not a homework but rather I need to mine a metadata file to narrow down the value to a specific row and column.

Hope my explantion helps.
I asked "How are the keys supplied to your script?" You responded "A value from first column and a column name
Ex: NEWARK & STATE, SEATTLE & COUNTRY, NEWARK & COUNTY
"
That response is an answer to the question "What is supplied to your script?", not "How".
  1. Is your script passed two operands?
  2. Is your script passed one operand with a literal <space><ampersand><space> between the two keys?
  3. Are the values read from a file? (If so what is the file format?)
  4. Can more than one pair of keys be supplied to your script to be processed during a single invocation of your script?
  5. Exactly what format should be used for the output?
    1. Are the keys supposed to be included in the output?
    2. If more than one pair of keys are input, is the output supposed to be a single line or one line for each pair of keys?
  6. Is the data built into the script or is it in a file?
  7. If the data to be processed is in a file, is the pathname of the file built into the script (If so, what is that pathname?), or is it passed in as an option argument (If so, what is the option letter and what is the default pathname if no option is provided?), or is it passed in as an operand?
Please help us help you. Don't make us guess at what you're trying to do. Show us that you have thought about the design of your script and aren't just asking us to not only guess at how your script is supposed to work but also what the output is supposed to look like.

And, PLEASE, use CODE tags where appropriate in your posts.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 06-21-2015
Quote:
Originally Posted by Don Cragun
I asked "How are the keys supplied to your script?" You responded "A value from first column and a column name
Ex: NEWARK & STATE, SEATTLE & COUNTRY, NEWARK & COUNTY
"
That response is an answer to the question "What is supplied to your script?", not "How".
  1. Is your script passed two operands?
  2. Is your script passed one operand with a literal <space><ampersand><space> between the two keys?
  3. Are the values read from a file? (If so what is the file format?)
  4. Can more than one pair of keys be supplied to your script to be processed during a single invocation of your script?
  5. Exactly what format should be used for the output?
    1. Are the keys supposed to be included in the output?
    2. If more than one pair of keys are input, is the output supposed to be a single line or one line for each pair of keys?
  6. Is the data built into the script or is it in a file?
  7. If the data to be processed is in a file, is the pathname of the file built into the script (If so, what is that pathname?), or is it passed in as an option argument (If so, what is the option letter and what is the default pathname if no option is provided?), or is it passed in as an operand?
Please help us help you. Don't make us guess at what you're trying to do. Show us that you have thought about the design of your script and aren't just asking us to not only guess at how your script is supposed to work but also what the output is supposed to look like.

And, PLEASE, use CODE tags where appropriate in your posts.



1. Is your script passed two operands?
Yes, I get two parameters passed.

2. Is your script passed one operand with a literal <space><ampersand><space> between the two keys?
No. I get two parameters passed.

3. Are the values read from a file? (If so what is the file format?)
Yes, it is a comma seperated file as shown in my prior post.

4. Can more than one pair of keys be supplied to your script to be processed during a single invocation of your script?
No, I only get one pair of keys.

5. Exactly what format should be used for the output?
I just need to fetch the value which can assigned to a variable.

6. Are the keys supposed to be included in the output?
No

7. If more than one pair of keys are input, is the output supposed to be a single line or one line for each pair of keys?
Only one pair of keys are input.

8. Is the data built into the script or is it in a file?
Data is an external file.

9. If the data to be processed is in a file, is the pathname of the file built into the script (If so, what is that pathname?), or is it passed in as an option argument (If so, what is the option letter and what is the default pathname if no option is provided?), or is it passed in as an operand?
Data is a pathname. For sake of simplicity, pls assume that data is residing at /tmp/metadata.csv



My solution so far has been:

$ cat /tmp/metadata.csv | egrep "NEWARK|COUNTY"
CITY,COUNTY,STATE,COUNTRY
NEWARK,ESSEX,NJ,USA



Where /tmp/metadata.csv is the file path.
NEWARK is the first parameter
COUNTY is the second parameter

Now, I am trying to use awk to get my final output as ESSEX.



Hope this explantion helps.
Thanks again for your questions.
# 6  
Old 06-21-2015
Try
Code:
awk -F,    '
FNR==1          {for (n=1; n<=NF; n++) {if ($n==COL) KEYCOL=n}
                 next
                }
$1==KEY         {print $KEYCOL}
'  KEY=CHICAGO COL=COUNTY   file
COOK

In case you have one parameter with the keywords separated by "&", try

Code:
awk -F,    '
FNR==1          {gsub (" ", "", KEY)
                 split (KEY, Karr, "&")
                 for (n=1; n<=NF; n++) {if ($n==Karr[2]) KEYCOL=n}
                 next
                }
$1==Karr[1]     {print $KEYCOL}
'  KEY="CHICAGO & COUNTY"   file

This User Gave Thanks to RudiC For This Post:
# 7  
Old 06-21-2015
In case the database might be switched to use expected spaces in names (such as NEW YORK instead of underscores (as in NEW_YORK), you might want to change:
Code:
FNR==1          {gsub (" ", "", KEY)
                 split (KEY, Karr, "&")

in the 2nd script to:
Code:
FNR==1          {split (KEY, Karr, " *& *")

I don't know how this database is going to account for the fact that the five boroughs of New York City are five different counties (The Bronx is in Bronx county, Brooklyn is in Kings county, Manhattan is in New York county, Queens is in Queens county, and Staten Island is in Richmond county).
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search string in multiple files and display column wise

I have 3 files. Each of those files have the same number of records, however certain records have different values. I would like to grep the field in ALL 3 files and display the output with only the differences in column wise and if possible line number File1 Name = Joe Age = 33... (3 Replies)
Discussion started by: sidnow
3 Replies

2. UNIX for Dummies Questions & Answers

Print row wise

Hi Help, I have an I/p, which looks like --- FF GG HH I want the o/p to be like --- FF GG HH. How we can do that? Thanks (7 Replies)
Discussion started by: Indra2011
7 Replies

3. Shell Programming and Scripting

Grep the column for a row

I have a problem caught and need to discuss with all you guys. I have a file containing a rows which are separated by “~” For eg. Server~321~UP~Linux~Member 121213~5778~Down~Unix~Provider I want to use the grep which search the row on the basis of columns like the grep... (6 Replies)
Discussion started by: adisky123
6 Replies

4. Shell Programming and Scripting

calculating row-wise standard deviation using awk

Hi, I have a file containing 100,000 rows-by-120 columns and I need to compute for the standard deviation for each row. Any idea on how to calculate row-wise standard deviation using awk? My sample data looks like this: input data: 23 35 12 25 16 17 18 19 29 12 12 26 15 14 15 23 12 12... (2 Replies)
Discussion started by: ida1215
2 Replies

5. Shell Programming and Scripting

search column and delete row if greater than value

Hi, as the title states i need to find a way to search a column for values great than 1000, and if it is, then delete that row. An example 1 7.021 6.967 116.019 4 U 6.980E+07 0.000E+00 e 0 0 0 0 2 8.292 7.908 118.063 3 U 1.440E+07 0.000E+00 e 0 821 814 ... (3 Replies)
Discussion started by: olifu02
3 Replies

6. Shell Programming and Scripting

awk-gsub on column-wise on each row

awk '{ gsub(/....=/,""); print }' want.dat >final.dat the above awk command which removes all the chars before and including '=' on the entire row. --thats what it meant be.:) but i need to remove text on column-wise on each row. many thanks, EM ---------- Post updated at 10:00 AM... (4 Replies)
Discussion started by: elamurugu
4 Replies

7. Shell Programming and Scripting

Search a file column wise and delete it

Scottn, m really sorry but i have not got my answer yet. my concern is how to delete the row !!! i have a file which has a column that is unique i am intending to serach it and if it is there to remove the row. the file looks like ROLLNO,NAME ,SUB1,SUB2,SUB3,TOTAL,PERCENTAGE,RESULT... (9 Replies)
Discussion started by: gotam
9 Replies

8. Shell Programming and Scripting

Search a file column wise and delete it

i have a file which has a column that is unique i am intending to serach it and if it is there to remove the row. the file looks like ROLLNO,NAME ,SUB1,SUB2,SUB3,TOTAL,PERCENTAGE,RESULT 15 ,rig ,34 ,56 ,87 ,177 ,59 % ,PASS 23 ,wel ,45 ,76 ,56 ,177 ,59 % ... (0 Replies)
Discussion started by: gotam
0 Replies

9. Shell Programming and Scripting

Row to column converter using Awk or grep?

Hello, Can someone please help me on this.:confused: I have a file which has more than 1 million lines (XML file). What I need is: Search for "abcd" in the input file > output the result into a output.txt (colloum1) Search for "efghi" in the input file > output the result in to... (3 Replies)
Discussion started by: Needhelp2
3 Replies

10. Shell Programming and Scripting

Unix addition ( Row wise)

Hi I have a file like a,1 b,2 d,3 a,2 b,3 c,7 Result Desired: a,3 b,5 d,3 c,7 i.e on the bases of 1st field the addition is done of the 2nd field and result printed out. (3 Replies)
Discussion started by: gauravgoel
3 Replies
Login or Register to Ask a Question