Awk/sed/cut to filter out records from a file based on criteria


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk/sed/cut to filter out records from a file based on criteria
# 1  
Old 06-26-2017
Awk/sed/cut to filter out records from a file based on criteria

I have two files and would need to filter out records based on certain criteria, these column are of variable lengths, but the lengths are uniform throughout all the records of the file. I have shown a sample of three records below. Line 1-9 is the item number "0227546_1" in the case of the first record, the department number is line 361-369, which would be 3101_1_1 in the first record. I would want to compare two files for item numbers (keep in mind they could be at different records in these two files) and if they have a different department number I would need the entire record to be filtered out in a third file.

I have also included a sample of the second file and expected output as there was a change in dept #, it only shows two records.

Any help will be greatly appreciated, I have been struggling with this for a while.


First file:

Code:
0227546_1           0227546,OIL FILTER      PH3675     HONEYWELL                                                        
0227546A            0227546,OIL FILTER      PH3675     HONEYWELL                
0227546             0227546,OIL FILTER      PH3675     HONEYWELL                
0                   0                                                           
3101_1_1            Under Car                                                   
3101_1              Parts and Repair                                            
3101                Auto Retail                                                 
301                 Automotive                                                  
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0227546A            0227546,OIL FILTER      PH3675     HONEYWELL                
0227546A            0227546,OIL FILTER      PH3675     HONEYWELL                
0                   0                                                           
3101_1_1            Under Car                                                   
3101_1              Parts and Repair                                            
3101                Auto Retail                                                 
301                 Automotive                                                  
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0249068_1           0249068,Fresco Castiglione Chianti                                                                  
0249068A            0249068,Fresco Castiglione Chianti                          
0249068             0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0249068A            0249068,Fresco Castiglione Chianti                          
0249068A            0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0274321_1           0274321,SIMPLY PERFECT SILVER  KING                                                                 
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321             0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_3_2            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_3_2            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
0                   0

Second File:

Code:
0249068_1           0249068,Fresco Castiglione Chianti                                                                  
0249068A            0249068,Fresco Castiglione Chianti                          
0249068             0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0249068A            0249068,Fresco Castiglione Chianti                          
0249068A            0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0274321_1           0274321,SIMPLY PERFECT SILVER  KING                                                                 
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321             0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_5_1            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0274321A            0274321,SIMPLY PERFECT SILVER  KING                         
0                   0                                                           
1478_3_2            Down Pillows                                                
1478_3              Down Products                                               
1478                White Goods                                                 
478                 Domestics                                                   
2                   Hardlines                                                   
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0308148_1           0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE                                                     
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148             0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                  
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0

Third file (Output):

Code:
0249068_1           0249068,Fresco Castiglione Chianti                                                                  
0249068A            0249068,Fresco Castiglione Chianti                          
0249068             0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0249068A            0249068,Fresco Castiglione Chianti                          
0249068A            0249068,Fresco Castiglione Chianti                          
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0                                                                                                   


0308148_1           0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE                                                     
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148             0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                  
9000                WORLDWIDE                                                   
EA  EACH                0                   0                                                                                                   
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0308148A            0308148,CLEMENTE VII/BARRIQUE 750ML GASTELGREVE             
0                   0                                                           
1022_1_2            Table                                                       
1022_1              Red                                                         
1022                Wine                                                        
422                 Wine                                                        
3                   Consumables                                                 
9000                WORLDWIDE                                                   
0                   0

# 2  
Old 06-26-2017
a bit of a longhand - making some assumptions based on the file samples.....
Code:
awk -f mia.awk file1.txt file2.txt

where mia.awk is:
Code:
FNR==NR {
  if (split($1,a, "_") ==3)
     f1[$1]
  next
}
NF{
  if (split($1, a, "_") ==3 && $1 in f1)
     p=1
  rec=(!rec)?$0:rec ORS $0
  next
}
p{
    print rec ORS
    p=0; rec=""
}


Last edited by vgersh99; 06-26-2017 at 06:02 PM..
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 06-26-2017
Thanks @vgersh99

Tried to do so but got the following error:
Code:
mia.awk: {:  not found
mia.awk[2]: syntax error at line 2 : `(' unexpected

Can you please break down to me what is happening in the script?
# 4  
Old 06-26-2017
Quote:
Originally Posted by MIA651
Thanks @vgersh99

Tried to do so but got the following error:
Code:
mia.awk: {:  not found
mia.awk[2]: syntax error at line 2 : `(' unexpected

Can you please break down to me what is happening in the script?
make sure you're copy/pasting the entire script without missing anything.
Works here just fine...
# 5  
Old 06-26-2017
Thanks again. I think there is a misunderstanding, each "paragraph" in the file represents a record (1 line). So item is columns 1-9 and department is columns 361-369. Of course, I tried to copy-paste but it just turned it into a paragraph. For readability purposes, I sperated the records by a space.
# 6  
Old 06-26-2017
Quote:
Originally Posted by MIA651
Thanks again. I think there is a misunderstanding, each "paragraph" in the file represents a record (1 line). So item is columns 1-9 and department is columns 361-369. Of course, I tried to copy-paste but it just turned it into a paragraph. For readability purposes, I sperated the records by a space.


Well... then provide A REAL sample file withOUT any editing...


Sent from my iPhone using Tapatalk
# 7  
Old 06-27-2017
Quote:
Originally Posted by vgersh99
Well... then provide A REAL sample file withOUT any editing...


Sent from my iPhone using Tapatalk
It would be impossible to do so as there is a limitation on the width when I try to post it. Any chance you can please explain the steps so I can try to edit it to suit my needs?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Filter records from a log file based on timestamp

Dear Experts, I have a log file that contains a timestamp, I would like to filter record from that file based on timestamp. For example refer below file - cat sample.txt Jan 19 20:51:48 mukul-Vostro-14-3468 systemd: pam_unix(systemd-user:session): session opened for user root by (uid=0)... (6 Replies)
Discussion started by: mukulverma2408
6 Replies

2. Shell Programming and Scripting

awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI= must be >.05 . The other condition to add is if SVTYPE=Fusion, then in order to print that line READ_COUNT must... (3 Replies)
Discussion started by: cmccabe
3 Replies

3. Shell Programming and Scripting

awk to print specific line in file based on criteria

In the file below I am trying to extract a specific instance of path, if the adjacent plugin": "/rundb/api/v1/plugin/49/. Thank you :). file "path": "/results/analysis/output/Home/Auto_user_S5-00580-4-Medexome_65_028/plugin_out/FileExporter_out.52", "plugin": "/rundb/api/v1/plugin/49/",... (8 Replies)
Discussion started by: cmccabe
8 Replies

4. Shell Programming and Scripting

Filter records based on 2nd file

Hello, I want to filter records of a file if they fall in range associated with a second file. First the chr number (2nd col of 1st file and 1st col of 2nd file) needs to be matched. Then if the 3rd col of the first file falls within any of the ranges specified by the 2nd and 3rd cols , then... (4 Replies)
Discussion started by: ritakadm
4 Replies

5. Shell Programming and Scripting

Extract error records based on specific criteria from Unix file

Hi, I look for a awk one liner for below issue. input file ABC 1234 abc 12345 ABC 4567 678 XYZ xyz ght 678 ABC 787 yyuu ABC 789 7890 777 zxr hyip hyu mno uii 678 776 ABC ty7 888 All lines should be started with ABC as first field. If a record has another value for 1st... (7 Replies)
Discussion started by: ratheesh2011
7 Replies

6. Shell Programming and Scripting

Filter/remove duplicate .dat file with certain criteria

I am a beginner in Unix. Though have been asked to write a script to filter(remove duplicates) data from a .dat file. File is very huge containig billions of records. contents of file looks like 30002157,40342424,OTC,mart_rec,100, ,0 30002157,40343369,OTC,mart_rec,95, ,0... (6 Replies)
Discussion started by: mukeshguliao
6 Replies

7. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

8. Shell Programming and Scripting

Filter records in a file using AWK

I want to filter records in one of my file using AWK command (or anyother command). I am using the below code awk -F@ '$1=="0003"&&"$2==20100402" print {$0}' $INPUT > $OUTPUT I want to pass the 0003 and 20100402 values through a variable. How can I do this? Any help is much... (1 Reply)
Discussion started by: gpaulose
1 Replies

9. UNIX for Dummies Questions & Answers

Select records based on search criteria on first column

Hi All, I need to select only those records having a non zero record in the first column of a comma delimited file. Suppose my input file is having data like: "0","01/08/2005 07:11:15",1,1,"Created",,"01/08/2005" "0","01/08/2005 07:12:40",1,1,"Created",,"01/08/2005"... (2 Replies)
Discussion started by: shashi_kiran_v
2 Replies

10. Shell Programming and Scripting

Selecting records from file on criteria.

Can I have 2 files as in input to the awk command? Situation is somewhat below, File A contains number & value delimited by a space. File B contains number as a part of a line. I am not supposed to retrieve more than 1 number from a line. If number from file B matches with number from... (7 Replies)
Discussion started by: videsh77
7 Replies
Login or Register to Ask a Question