Filter a .CSV file based on the 5th column values


 
Thread Tools Search this Thread
Operating Systems Linux Filter a .CSV file based on the 5th column values
# 1  
Old 10-22-2013
Filter a .CSV file based on the 5th column values

I have a .CSV file with the below format:

Code:
"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
"46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013",""

5th column of the file has different strings. I need to filter out the file based on the 5th column value. Lets say, I need a new file from the current file which has records only with the value "string 1" in its fifth field.

For this I tried the below command,

Code:
awk -F"," ' { if toupper($5) == "STRING 1") PRINT  }' file1.csv > file2.csv

but it was throwing me an error as following:

Code:
awk:  { if toupper($5) == "STRING 1") PRINT  }
awk:       ^ syntax error
awk:  { if toupper($5) == "STRING 1") PRINT  }
awk:                                ^ syntax error

I also tried the following but it did not help much:

Code:
awk -F"," ' { if ($5 == "string 1") print  }' file1.csv > file2.csv

I then used the following which gives me an odd output.

Code:
awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

Output:

Code:
"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10
"12310" "42324564756" "a simple string with a   comma" string 1  without commas" "string 1" "USD" "12" "70%" "08/01/2013" ""
"23455" "12312255564" "string  with string 1  commas" "string with or  without commas" "string 2" "USD" "433" "70%" "07/15/2013" ""
"23525" "74535243123" "string   with commas string 1 "string with or  without commas" "string 1" "CAND" "744" "70%" "05/06/2013" ""
"46476" "15467534544" "lengthy string  with commas string 1 "string with or  without commas" "string 2" "CAND" "388" "70%" "09/21/2013" ""

P.S: I used toupper command to be on the safe side, as I am not sure if the string will be in lower or higher case. Also, Please advise if the space in the string matters while searching for a pattern using AWK... Thanks in advance.

Last edited by Don Cragun; 10-22-2013 at 01:35 AM.. Reason: Use CODE tags for input and output samples as well as for code samples.
# 2  
Old 10-22-2013
You have at least two problems here:

First, you are telling awk that your field separator is a comma, but some commas in your input file are not field separators.

Second, you are telling awk to look for lines where the string string 1 is the 5th field in the line; but that string is never between commas in your input (even if we only counted the commas that are meant to be field separators. Each of your intended fields contains double quotes and the strings "string 1" and string 1 are not the same. To look for a string containing quotes, you have to put escaped quotes in your match string. For example:
Code:
$5 == "\"string 1\""

The output you got from the command:
Code:
awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

may have seemed strange to you, but it is exactly what I would have expected. Note the difference between $5="string 1" and $5=="string 1". With a single equal sign, you set the 5th field to string 1 rather than testing if the 5th field was string 1.

Try the following:
Code:
awk -F'"' 'toupper($10) == "STRING 1"' OFS='"' file1.csv

Using double quote as the field separator, $1 and $NF will be empty strings, other odd fields will be commas, and the even fields will be the data between pairs of double quotes. So $2 will be the data between the 1st pair of double quotes, $4 will be the data between the second pair of double quotes, ..., $10 will be the data between the 5th pair of double quotes, ...
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 10-22-2013
If your awk version allows for multi-char-FS, try
Code:
awk -F'","' 'toupper($5)=="STRING 1"' OFS='","' file

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

2. Shell Programming and Scripting

Get maximum per column from CSV file, based on date column

Hello everyone, I am using ksh on Solaris 10 and I'm gathering data in a CSV file that looks like this: 20170628-23:25:01,1,0,0,1,1,1,1,55,55,1 20170628-23:30:01,1,0,0,1,1,1,1,56,56,1 20170628-23:35:00,1,0,0,1,1,2,1,57,57,2 20170628-23:40:00,1,0,0,1,1,1,1,58,58,2... (6 Replies)
Discussion started by: ejianu
6 Replies

3. Shell Programming and Scripting

Filter tab file based on column value

Hello I have a tab text file with many columns and have to filter rows ONLY if column 22 has the value of '0', '1', '2' or '3' (out of 0-5). If Column 22 has value '0','1', '2' or '3' (highlighted below), then remove anything less than 10 and greater 100 (based on column 5) AND remove anything... (1 Reply)
Discussion started by: nans
1 Replies

4. Shell Programming and Scripting

Filter file to remove duplicate values in first column

Hello, I have a script that is generating a tab delimited output file. num Name PCA_A1 PCA_A2 PCA_A3 0 compound_00 -3.5054 -1.1207 -2.4372 1 compound_01 -2.2641 0.4287 -1.6120 3 compound_03 -1.3053 1.8495 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

5. Shell Programming and Scripting

UNIX command -Filter rows in fixed width file based on column values

Hi All, I am trying to select the rows in a fixed width file based on values in the columns. I want to select only the rows if column position 3-4 has the value AB I am using cut command to get the column values. Is it possible to check if cut -c3-4 = AB is true then select only that... (2 Replies)
Discussion started by: ashok.k
2 Replies

6. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

7. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

8. Shell Programming and Scripting

Fetching values in CSV file based on column name

input.csv: Field1,Field2,Field3,Field4,Field4 abc ,123 ,xyz ,000 ,pqr mno ,123 ,dfr ,111 ,bbb output: Field2,Field4 123 ,000 123 ,111 how to fetch the values of Field4 where Field2='123' I don't want to fetch the values based on column position. Instead want to... (10 Replies)
Discussion started by: bharathbangalor
10 Replies

9. Shell Programming and Scripting

Script for extracting data from csv file based on column values.

Hi all, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 5 columns having values say column 1,column 2.....column 5 as below along with their valuesm.... (3 Replies)
Discussion started by: Vivekit82
3 Replies
Login or Register to Ask a Question