Complex file matching


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Complex file matching
# 1  
Old 04-27-2011
Complex file matching

Hello All,

I have two files on which I have to do "pattern" based matching and need to place the records to "Matched" and "Unmatched" output files respectively.

Here we go:

Code:
 
cat file1
ft , * , *, prem , odacc
ftpr , * ,* , prem , odacc
ft,aa,*,*,odacc
ft,*,*,*,*
*,*,*,odacc
 
cat file2 
abc,*,*,prem,odacc
* , bcd , * , prem , odacc

Now , I have to do a seach from file1 and file2 such that , if the fields in "file1" are equal to "file2" or one of the fields in either of the file is "*" then place the file1 content in to "matched" else in to "unmatched".

Here , "*" is considered as universal acceptance character ( so will be true always irrepsective of the corresponding filed value in other file)

The required outcome from file "matched"
Code:
 
ft , * , *, prem , odacc ## matches with * , bcd , * , prem , odacc from file2
ftpr , * ,* , prem , odacc ## matches with * , bcd , * , prem , odacc from file2
ft,*,*,*,* ## matches with * , bcd , * , prem , odacc from file2
*,*,*,odacc ## matches with * , bcd , * , prem , odacc from file2

from file "Un matched"
Code:
 
ft,aa,*,*,odacc

# 2  
Old 04-27-2011
Well, before the *, it was a straight sort and comm, but with them, more a cartesian product NxM problem. The usual JDBC/unixODBC SQL solutions do not work cleanly since LIKE is unidirectional, and this wild card is bidirectional.

If one file is much shorter, it could be placed into a two dimensional string array and then the longer file can be filtered by that array to decide which report to write it into, iterating through all the fields and records in the array for each incoming record, a=* or b=* or a=b. Empty file cells would get *, or is this a typo, since prem does not match odacc in column 4?
*,*,*,odacc ## matches with * , bcd , * , prem , odacc from file2
Even with the wild cards, some small optimization could be had by sorting so the * are low, to give up if input first field > array first field. The spaces in one file might mess this up a bit.
# 3  
Old 04-28-2011
Hello DGPickett,

Sorry and it's a typo.

I tried below and it's seems to be working OK as of now ( might end up in duplicates , how ever need to get rid of those ).

Code:
 
#!/usr/bin/ksh
rm matched 2>/dev/null
rm unmatched 2>/dev/null
while IFS=, read f1 f2 f3 f4
do
c=0;
while IFS=, read e1 e2 e3 e4
do
if [[ "$e1" = "$f1" || $f1 = "*" || $e1 = "*" ]] && [[ "$e2" = "$f2" || $e2 = "*" || $f2 = "*" ]] ##&& [[ "$e3" = "$f3" || $e3 = "*" || $f3 = "*" ]] && [[ "$e4" = $f4 || $e4 = "*" || $f4 = "*" ]]
then
c=1
print $f1","$f2","$f3","$f4 >> matched
break
fi
done <f2
if [ $c -eq 0 ];then
print $f1","$f2","$f3","$f4 >> unmatched
fi
done <f1

As the number of fields are fixed in my case and no possiblity of extra spaces the solutions seems to be OK. Performance on this yet to check.

Thanks for looking in to this.

Regards
Ravi
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search the specific content from the complex file

Hi, I have a file with complex data without delimiter, have requirement to fetch the specific record based on some charcters. here is my file data ... (12 Replies)
Discussion started by: Riverstone
12 Replies

2. Shell Programming and Scripting

Complex File Move

Hello, This is my first post, so please forgive my obvious lack of UNIX knowledge. I am trying/needing to write a script that follows this functional flow: 1. Access a config file that contains format: <directory> <filetype> <daterange> <directory> <filetype> <daterange> <directory>... (3 Replies)
Discussion started by: WildBeard83
3 Replies

3. Shell Programming and Scripting

echoing complex awk command into file fails

Using hp-ux's shell, I'm trying to echo a complex awk command into a script file for later use. But it fails on a newline character and splits the rest of the command onto the next line. echo ' printf("%s: TOTAL = %18.0lf\n", FILENAME, TOTAL) >> "TOTAL.TXT";' >>awk.script Looks... (3 Replies)
Discussion started by: Scottie1954
3 Replies

4. Shell Programming and Scripting

Position of the string in a complex file

I had a similar problem few days back and got this fixed with the below command when I have a file with this format GS*12345***** ST*1******** A* B* E* RMR*123455(This is the unique number to locate this row) F* SE*1*** GE*12345* GS*878787***** ST*2 H* J* RMR*567889(This is the... (9 Replies)
Discussion started by: Muthuraj K
9 Replies

5. Shell Programming and Scripting

Sorting complex file with awk

i have a file ddd.txt its delimiter is : but has , and "" within each column as below and also each line ends with ; I_EP,"29":I_US,"120":I_P_ID,"2020":I_NEW,"600":I_OLD,"400":I_POW,"4.5":I_NAME,"TOM";... (9 Replies)
Discussion started by: blackzinga80
9 Replies

6. Shell Programming and Scripting

Parsing a complex log file

I have a log file that has many SQL statements/queries/blocks and their resultant output (success or failure) added to each of them. I need to pick up all the statements which caused errors and write them to a separate file. On most cases, the SQL statement is a single line, like DROP . And if... (1 Reply)
Discussion started by: exchequer598
1 Replies

7. Shell Programming and Scripting

search 3 file and write to 4th file (a bit complex)

hi buddies; rollbackip.txt:10.14.3.65 2 10.14.3.65 3 ... lookup.txt: ... 10.14.3.65 2 10.14.5.55 1 55 10.14.6.66 1 66 10.14.3.65 3 10.14.7.77 3 77 10.14.8.88 2 88 10.14.9.99 4 99 ... ip-port.txt ... port111 3 10.14.5.55 57 port111 2 10.14.5.55 51 port111 1 10.14.5.55 59 ->... (7 Replies)
Discussion started by: gc_sw
7 Replies

8. Shell Programming and Scripting

Splitting a complex file using awk

I have a file that contains the following format delete from table1; delete from table2; insert into table1 (col1, col2) values (value1, value2)@ insert into table1 (col1, col2) values(value3, value4)@ insert into table2(col1, col2,col3) values(value1, value2, value3)@ etc etc This is... (9 Replies)
Discussion started by: hukcjv
9 Replies

9. Programming

Need to modify contents of file with complex patterns.

hi, my fstab file content is like this along with some other lines: /dev/vg0/var1 /var1 ext3 defaults 0 2 /dev/vg0/flx1 /flx1 ext3 defaults 0 2 /dev/vg0/var /var ext3 defaults 0 1 /dev/vg0/flx /flx ext3 defaults 0 2 I want to remove lines with /dev/vg0/var and... (5 Replies)
Discussion started by: success
5 Replies

10. Shell Programming and Scripting

Complex file count problem

Hi all! I have a question regarding possibilities to do line counts. SEARCH_VAR=TEX rsh $REM_HOST -l $REM_USER "cd $REM_DIR; ls *$SEARCH_VAR* 2> /dev/null" | sort -n | awk 'BEGIN { FS = "-" } ; { print $1"\t"$0 }' Will produce an output on the screen like this: 483 483-SOME-TEXT-1... (1 Reply)
Discussion started by: bbergstrom74
1 Replies
Login or Register to Ask a Question