Filtering records of a file based on a value of a column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Filtering records of a file based on a value of a column
# 1  
Old 09-24-2008
Filtering records of a file based on a value of a column

Hi all,
I would like to extract records of a file based on a condition. The file contains 47 fields, and I would like to extract only those records that match a certain value in one of the columns, e.g.

COL1 COL2 COL3 ............... COL47
1 XX 45 N
2 YY 34 y
3 ZZ 44 N
4 XX 89 Y
5 XX 45 N
6 YY 84 D
7 ZZ 22 S

From this file, I would like to extract all records whose COL2=XX or YY, and all other records will be excluded (as shown below).

COL1 COL2 COL3 ............... COL47
1 XX 45 N
2 YY 34 y
4 XX 89 Y
5 XX 45 N
6 YY 84 D

Does anybody know how to do this using sed or awk or any other UNIX tool? Thank you.
# 2  
Old 09-24-2008
try:
Code:
awk '($2 == "XX"  || $2 == "YY")' filename

# 3  
Old 09-24-2008
Thanks for the reply Yogesh. I'm sorry, I forgot to mention that my file is tilde (~) separated. By the way, I was able to see other threads similar to what I want to accomplish, the only problem now is how to include two conditions using OR.

I used:

awk -F"~" '$2 ~ /XX/{print }' inputfile > outputfile

this script works, and was able to filter all records with XX in field 2. However, I would also like to inlude YY. The script below

awk -F"~" '$2 ~ /YY/{print }' inputfile >> outputfile

will append all records with YY values in field 2 to the output file, but the order of the records will be different.

I would like to know how I can include OR field2="YY" in the original script:
awk -F"~" '$2 ~ /XX/{print }' inputfile > outputfile

Thank you.
# 4  
Old 09-24-2008
Code:
awk -F"~" '$2 ~ /XX|YY/ {print }' inputfile > outputfile

More generally, if you cannot put both conditions in a single regular expression, the syntax of a proper "or" is

Code:
awk '$1 ~ /foo/ || $2 ~ /bar/ { print }'

# 5  
Old 09-24-2008
Thanks so much era! this script works! Smilie

awk '$1 ~ /foo/ || $2 ~ /bar/ { print }'
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filtering records of a csv file based on a value of a column

Hi, I tried filtering the records in a csv file using "awk" command listed below. awk -F"~" '$4 ~ /Active/{print }' inputfile > outputfile The output always has all the entries. The same command worked for different users from one of the forum links. content of file I was... (3 Replies)
Discussion started by: sunilmudikonda
3 Replies

2. UNIX for Beginners Questions & Answers

Filtering based on column values

Hi there, I am trying to filter a big file with several columns using values on a column with values like (AC=5;AN=10;SF=341,377,517,643,662;VRT=1). I wont to filter the data based on SF= values that are (bigger than 400) ... (25 Replies)
Discussion started by: daashti
25 Replies

3. Shell Programming and Scripting

Filtering first file columns based on second file column

Hi friends, I have one file like below. (.csv type) SNo,data1,data2 1,1,2 2,2,3 3,3,2 and another file like below. Exclude data1 where Exclude should be treated as column name in file2. I want the output shown below. SNo,data2 1,2 2,3 3,2 Where my data1 column got removed from... (2 Replies)
Discussion started by: ks_reddy
2 Replies

4. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

5. Shell Programming and Scripting

Filtering lines for column elements based on corresponding counts in another column

Hi, I have a file like this ACC 2 2 21 aaa AC 443 3 22 aaa GCT 76 1 33 xxx TCG 34 2 33 aaa ACGT 33 1 22 ggg TTC 99 3 44 wee CCA 33 2 33 ggg AAC 1 3 55 ddd TTG 10 1 22 ddd TTGC 98 3 22 ddd GCT 23 1 21 sds GTC 23 4 32 sds ACGT 32 2 33 vvv CGT 11 2 33 eee CCC 87 2 44... (1 Reply)
Discussion started by: polsum
1 Replies

6. UNIX for Dummies Questions & Answers

Filtering records from 1 file based on some manipulation doen on second file

Hi, I am looking for an awk script which should help me to meet the following requirement: File1 has records in following format INF: FAILEd RECORD AB1234 INF: FAILEd RECORD PQ1145 INF: FAILEd RECORD AB3215 INF: FAILEd RECORD AB6114 ............................ (2 Replies)
Discussion started by: mintu41
2 Replies

7. Shell Programming and Scripting

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times. 13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG 13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG 13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT 13 18 1 + chromosome 1,... (5 Replies)
Discussion started by: polsum
5 Replies

8. Shell Programming and Scripting

Nawk script to compare records of a file based on a particular column.

Hi Gurus, I am struggling with nawk command where i am processing a file based on columns. Here is the sample data file. UM113570248|24-AUG-11|4|man1|RR211 Alert: Master Process failure |24-AUG-11 UM113570624|24-AUG-11|4|man1| Alert: Pattern 'E_DCLeDAOException' found |24-AUG-11... (7 Replies)
Discussion started by: usha rao
7 Replies

9. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

10. Shell Programming and Scripting

filtering records based on numeric field value in 8th position

I have a ";" delimited file.Whcih conatins a number fileds of length 4 charcters in 8th position But there is a alphanumeric charcters like : space, ";" , "," , "/" , "23-1" , "23 1" , "aqjhdj" , "jun-23" , "APR-04" , "4:00AM" , "-234" , "56784 ", "." , "+" "_" , "&" , "*" , "^" , "%" , "!"... (2 Replies)
Discussion started by: indusri
2 Replies
Login or Register to Ask a Question