Filter a .CSV file based on the 5th column values

10-22-2013

Registered User

55, 1

Join Date: Jun 2013

Last Activity: 3 April 2019, 9:20 AM EDT

Location: New York

Posts: 55

Thanks Given: 29

Thanked 1 Time in 1 Post

Filter a .CSV file based on the 5th column values

I have a .CSV file with the below format:

Code:

"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
"46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013",""

5th column of the file has different strings. I need to filter out the file based on the 5th column value. Lets say, I need a new file from the current file which has records only with the value "string 1" in its fifth field.

For this I tried the below command,

Code:

awk -F"," ' { if toupper($5) == "STRING 1") PRINT  }' file1.csv > file2.csv

but it was throwing me an error as following:

Code:

awk:  { if toupper($5) == "STRING 1") PRINT  }
awk:       ^ syntax error
awk:  { if toupper($5) == "STRING 1") PRINT  }
awk:                                ^ syntax error

I also tried the following but it did not help much:

Code:

awk -F"," ' { if ($5 == "string 1") print  }' file1.csv > file2.csv

I then used the following which gives me an odd output.

Code:

awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

Output:

Code:

"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10
"12310" "42324564756" "a simple string with a   comma" string 1  without commas" "string 1" "USD" "12" "70%" "08/01/2013" ""
"23455" "12312255564" "string  with string 1  commas" "string with or  without commas" "string 2" "USD" "433" "70%" "07/15/2013" ""
"23525" "74535243123" "string   with commas string 1 "string with or  without commas" "string 1" "CAND" "744" "70%" "05/06/2013" ""
"46476" "15467534544" "lengthy string  with commas string 1 "string with or  without commas" "string 2" "CAND" "388" "70%" "09/21/2013" ""

P.S: I used toupper command to be on the safe side, as I am not sure if the string will be in lower or higher case. Also, Please advise if the space in the string matters while searching for a pattern using AWK... Thanks in advance.

Last edited by Don Cragun; 10-22-2013 at 01:35 AM.. Reason: Use CODE tags for input and output samples as well as for code samples.

dhruuv369

View Public Profile for dhruuv369

Find all posts by dhruuv369

10-22-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

You have at least two problems here:

First, you are telling awk that your field separator is a comma, but some commas in your input file are not field separators.

Second, you are telling awk to look for lines where the string string 1 is the 5th field in the line; but that string is never between commas in your input (even if we only counted the commas that are meant to be field separators. Each of your intended fields contains double quotes and the strings "string 1" and string 1 are not the same. To look for a string containing quotes, you have to put escaped quotes in your match string. For example:

Code:

$5 == "\"string 1\""

The output you got from the command:

Code:

awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

may have seemed strange to you, but it is exactly what I would have expected. Note the difference between $5="string 1" and $5=="string 1". With a single equal sign, you set the 5th field to string 1 rather than testing if the 5th field was string 1.

Try the following:

Code:

awk -F'"' 'toupper($10) == "STRING 1"' OFS='"' file1.csv

Using double quote as the field separator, $1 and $NF will be empty strings, other odd fields will be commas, and the even fields will be the data between pairs of double quotes. So $2 will be the data between the 1st pair of double quotes, $4 will be the data between the second pair of double quotes, ..., $10 will be the data between the 5th pair of double quotes, ...

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10-22-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

If your awk version allows for multi-char-FS, try

Code:

awk -F'","' 'toupper($5)=="STRING 1"' OFS='","' file

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Linux

Filter a .CSV file based on the 5th column values

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

Discussion started by: as7951

2. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Discussion started by: ejianu

3. Shell Programming and Scripting

Filter tab file based on column value

Discussion started by: nans

4. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Discussion started by: LMHmedchem

5. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Discussion started by: ashok.k

6. Linux

To get all the columns in a CSV file based on unique values of particular column

Discussion started by: sanvel

7. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

Discussion started by: dhruuv369

8. Shell Programming and Scripting

Fetching values in CSV file based on column name

Discussion started by: bharathbangalor

9. Shell Programming and Scripting

Script for extracting data from csv file based on column values.

Discussion started by: Vivekit82