Script to parse and compare information in two fields of file


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Script to parse and compare information in two fields of file
# 1  
Old 08-28-2015
Script to parse and compare information in two fields of file

Hello,
I am working parsing a large input file1(field CFA)
I have to compare the the file1 field(CFA byte 88-96) with the content of the file2(It contains only one field) and and insert rows equal
in another file.
Here is my code and sample input file:


Code:
#########################################
# F.ne: CheckNBS
#########################################

function CheckNBS
{
    
writeInfo "************************************************************************************************"
writeInfo "----------------- CHECK FILE NBS FILE2 Start: $d ------------------"    
FILE2="$DIR_OUT"/"FILE2"_"${NamingDate}.data"
FILE_OUT="$DIR_OUT"/"OUT_CAMPIONE_NBS"_"${NamingDate}.ctrl"


ListFILE=`cat "$NBSPATH"/"*"${DATA_RIFERIMENTO}"*"`
for FILE1 in ${ListFILE}
do

writeInfo "Elaborazione FILE1 : ${FILE1}"

ListCFA=`cat ${FILE2}`
for CFA in ${ListCFA}
do

zcat "$NBSPATH"/"$FILE1" | grep $CFA | awk '$1 == "201" { print $0 }' >> ${FILE_OUT}

done
done
}

Execution is very slow. I can use awk also on compressed files ?

Can you help me?

Last edited by Don Cragun; 08-28-2015 at 03:52 PM.. Reason: Add CODE tags.
# 2  
Old 08-28-2015
You could remove the grep (untested):
Code:
zcat "$NBSPATH"/"$FILE1" | awk -v CFA=$CFA '/CFA/ && $1 == "201" { print $0 }' >> ${FILE_OUT}

Not sure if this would bring you much of a performance gain though.
hth
This User Gave Thanks to sea For This Post:
# 3  
Old 08-28-2015
You're zcatting "$NBSPATH"/"$FILE1" and running grep | awk once for every CFA in $FILE2. That consumes a lot of resources. Why don't you uncompress once into a temp file and use e.g. grep -f $FILE2 on the temp file? Does your system offer the zgrep command?
This User Gave Thanks to RudiC For This Post:
# 4  
Old 08-28-2015
but the field in the CFA FILE1 it's positioned at bytes 88(for 12 byte)
and in this way it is not identified, for this I used grep.




I can do a substring (awk)?
thanks a lot

Last edited by Don Cragun; 08-28-2015 at 03:53 PM.. Reason: Add ICODE tags.
# 5  
Old 08-28-2015
I'm not sure I understand. Please provide (abbreviated, reasonable) samples of the input files.
# 6  
Old 08-28-2015
I'll explain:

FILE2:

Code:
888011193163
888011087843
888011198112
888011126841
888010319633
888010887347
888011103891
888011174045
888011181727
999001166011
888010522751
888010534587
888010751405
888010824309
888000744563
888000995836
888010941118
888011026395
888010224776
888010344784

FILE1(COMPRESSED):

Code:
12015060700000009     
201  3358447808                      2015-05-14-02.07.22.000000000000012000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000001                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541448263017051
201  3666678887                      2015-05-12-14.28.06.000000000000009000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541425923651051
201  3666678887                      2015-05-14-10.57.54.000000000000010000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541448351096051
201  335357257                       2015-05-12-17.15.43.000000000000005000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541425957517051
201  3389474079                      2015-05-13-01.22.00.000000000000010000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541426042602051
201  3389474079                      2015-05-14-16.19.01.000000000000009000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541448418547051
201  3389474079                      2015-05-14-05.28.12.000000000000010000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541448287851051
201  3356067312                      2015-05-14-23.08.56.000000000000009000000000000024999000875455LOS000000001020001320119   P          0SNG000CO  000000000                    10000004  GUNB-GST01  GPRSN20150603000000000000024132||||    01444617541448499799051
201  3386401372                      2015-05-10-13.20.33.000000000000025000000000000001888010471777NES000000003112701320119EC9P          0PAM001CO  000000003                    10000004  GUNB-GSTTZ  GPRSN20150603000000000000001132||||    01444617541413713855098
201  3386401372                      2015-05-11-07.40.33.000000000000024000000000000001888010471777NES000000003112701320119EC9P          0PAM001CO  000000003                    10000004  GUNB-GSTTZ  GPRSN20150603000000000000001132||||    01444617541413895397098
900000891

I have to look for values FILE2 inside FILE1(at 88° byte for 12 byte),




in case of equality write the entire line of FILE1 on FILE_OUT.

thanks a lot

Last edited by Don Cragun; 08-28-2015 at 03:54 PM.. Reason: Add CODE tags.
# 7  
Old 08-28-2015
Please use code tags as required by forum rules!

None of the strings in FILE2 is found in FILE1. Should - by sheer coincidence - strings from file2 exist in file1, this might work
Code:
grep -Ff file2 file1
201 3386401372 2015-05-11-07.40.33.000000000000024000000000000001888010471777NES0000000031127888011181727 0PAM001CO 000000003 10000004 GUNB-GSTTZ GPRSN20150603000000000000001132|||| 01444617541413895397098

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is there a UNIX command that can compare fields of files with differing number of fields?

Hi, Below are the sample files. x.txt is from an Excel file that is a list of users from Windows and y.txt is a list of database account. $ head -500 x.txt y.txt ==> x.txt <== TEST01 APP_USER_PROFILE USER03 APP_USER_PROFILE TEST02 APP_USER_EXP_PROFILE TEST04 APP_USER_PROFILE USER01 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

2. Shell Programming and Scripting

Parse file for fields and specific text

I have a file of ~500,000 entries in the following: file.txt chr1 11868 12227 ENSG00000223972.5 . + HAVANA exon . gene_id "ENSG00000223972.5"; transcript_id "ENST00000456328.2"; gene_type "transcribed_unprocessed_pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type... (17 Replies)
Discussion started by: cmccabe
17 Replies

3. Shell Programming and Scripting

awk script to parse case with information in two fields of file

The below awk parser works for most data inputs, but I am having trouble with the last one. The problem is in the below rules steps 1 and 2 come from $2 (NC_000013.10:g.20763686_20763687delinsA) and steps 3 and 4 come from $1 (NM_004004.5:c.34_35delGGinsT). Parse Rules: The header is... (0 Replies)
Discussion started by: cmccabe
0 Replies

4. UNIX for Advanced & Expert Users

Nawk coding trying to compare two fields in a file

I have a file. We can call file1. It has these contents. STOPLOSS Control_file 0000000010.000 STOPLOSS Audit_file_Signoff +0000000010.00 nawk '{ fmt="%3s %15s %15s %15s %21s\n"; if ($3==$6) { tpy="Successful Match"; jnme=$1;... (4 Replies)
Discussion started by: wawa
4 Replies

5. Shell Programming and Scripting

Parse files in directory and compare with another file

I have two files File 1 in reading directory is of following format Read 1 A T Read 3 T C Read 5 G T Read 7 A G Read 10 A G Read 12 C G File 2 in directory contains Read 5 A G Read 6 T C Read 7 G A Read 8 G A Read 20 A T File2 contains (1 Reply)
Discussion started by: empyrean
1 Replies

6. Shell Programming and Scripting

Compare fields within a file

Hi Folks, I have a file with 22 columns. In which, I need to remove nulls if found at $2, $4, $14 & $16. Then, needs to compare the fields such as $2 == $14 && $4 == $16. The final output will print all the fields where the above conditions are satisfied. Could you please anyone help me... (6 Replies)
Discussion started by: Jerald
6 Replies

7. Shell Programming and Scripting

Perl: Parse Hex file into fields

Hi, I want to split/parse certain bits of the hex data into another field. Example: Input data is Word1: 4f72abfd Output: Parse bits (5 to 0) into field word1data1=0x00cd=205 decimal Parse bits (7 to 6) into field word1data2=0x000c=12 decimal etc. Word2: efff3d02 Parse bits (13 to... (1 Reply)
Discussion started by: morrbie
1 Replies

8. Shell Programming and Scripting

How to read and parse the content of csv file containing # as delimeter into fields using Bash?

#!/bin/bash i=0 cat 1.csv | while read fileline do echo "$fileline" IFS="#" flds=( $fileline ) nrofflds=${#flds} echo "noof fields$nrofflds" fld=0 while do echo "noof counter$fld" echo "$nrofflds" #fld1="${flds}" trying to store the content of line to fields but i... (4 Replies)
Discussion started by: barani75
4 Replies

9. Shell Programming and Scripting

Trying to Parse Version Information from Text File

I have a file name version.properties with the following data: major.version=14 minor.version=234 I'm trying to write a grep expression to only put "14" to stdout. The following is not working. grep "major.version=(+)" version.properties What am I doing wrong? (6 Replies)
Discussion started by: obfunkhouser
6 Replies

10. Shell Programming and Scripting

Need awk script to compare 2 fields in fixed length file.

Need a script that manipulates a fixed length file that will compare 2 fields in that file and if they are equal write that line to a new file. i.e. If fields 87-93 = fields 119-125, then write the entire line to a new file. Do this for every line in the file. After we get only the fields... (1 Reply)
Discussion started by: Muga801
1 Replies
Login or Register to Ask a Question