Very Challenging Problem. Please read fully.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Very Challenging Problem. Please read fully.
# 1  
Old 07-18-2008
Error Very Challenging Problem. Please read fully.

Hi,

This is the Third thread i'm putting here for the same problem. Smilie

Actually, i'm trying a script like this.. but its taking a long time.. about 3 days to complete fully..

Code:
#!/bin/ksh

if [ $# != 2 ] 
then
exit 1
fi

while read i
do
    while read j
    do
	field7=`echo $j|cut -d "|" -f7`
	field13=`echo $j|cut -d "|" -f13`
	field14=`echo $j|cut -d "|" -f14`
        if [ "${i}" == "${field7}|${field13}|${field14}" ]
	then
	print "$j"
	break 
	fi
    done < $2
done < $1

The first file being

Code:
1TVAO|OVEPT|VO
1TVAO|OVPDM|VO
6NFXE|17CLP|DH
6NFXE|NRZO4|EQ
6NFXE|SMOSA|EQ
ACA15|11X1W|DX
ACA15|1LN88|DX
ACA15|1LNSK|DX
ACA15|1LNVX|DX
ACA15|1LNVX|FD
ACA15|1ZOAA|DX
ACA15|NRLAF|DX
ACA15|NRZCN|DX
ACA15|NRZFC|DX
ACA15|NRZX8|DX
ACA15|O41AC|DX
ACA17|1LN88|DX
ACA17|NRZX8|DX
ACA1E|11X2W|DX
ACA1E|1LN88|DH


The second file being...
Code:
1TVAO|S3WS0306|45101000|4513000|AJGJ|CB10|1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|OVEPT|VO|430300|430300|430300|009|IC    |Z
1TVAO|S3WS0306|45101000|451000|AJFGJ|CB10|1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|OVPDM|VO|430300|430300|430300|009|IC    |Z
6NFXE|S3SN0201|41101000|451101000|B7HT|CB10|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|17CLP|DH|******|6670NI|410402|011|LQ    |Z
AGRJE|NA|NA|NA|NA|NA|6NFXE|S3021401|4511101000|4511201000|B7BXHT|CB10|NRZO4|EQ|402100|6670DC|410402|001|EQ|Z|U|Y|VT
6NFXE|S3SN0201|41101000|45111000|BXHT|CB10|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|SMOSA|EQ|******|6670NI|410402|016|EQ    |Z
ACA15|S3BW1120|41101000|4511000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|11X1W|DX|410312|410312|410312|011|LQ    |Z
ACA15|S3BW1120|41101000|45112000|AEZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|1LN88|DX|410312|410312|410312|A14|IOC   |Z
ARCXE|NA|NA|NA|NA|NA|A5|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|1LN88|DX|410312|420100|420100|A14|IOC   |Z
ACA15|NA|NA|NA|NA|NA|A15|NA|NA|NA|NA|NA|1LNSK|DX|410312|410312|410312|A14|TC    |Z
ACA15|NA|NA|NA|NA|NA|A15|NA|NA|NA|NA|NA|1LNVX|DX|410312|410312|410312|009|IOC   |Z
ALBBE|S3BW1118|41101000|4511201000|KSBL|CB20|ACA15|S3BW100120|4511101000|4511201000|KPASBL|CB20|1LNVX|FD|410312|410210|410210|A14|IOC|Z|N|Y|IS
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|1ZOAA|DX|410312|410312|410312|011|LQ|Z|A|Y|IS
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|NRLAF|DX|410312|410312|410312|A15|EQ    |Z
ACA15|S3BW1120|41101000|4511201000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|NRZCN|DX|410312|410312|410312|009|NQ    |Z
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|NRZFC|DX|410312|410312|410312|009|NQ    |Z
ACA15|S3BW1120|41101000|4511201000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|NRZX8|DX|410312|410312|410312|A14|NQ    |Z
ACA15|S3BW1120|41101000|4511201000|AEDHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|O41AC|DX|410312|410312|410312|009|NQ-AC|Z|N|Y|IS
ACA17|S3BW1120|42111000|4512201000|AEDHZ|CB10|ACA17|S3BW100120|4512111000|4512201000|AEBDHZ|CB10|1LN88|DX|410325|410312|410312|A14|IOC   |Z
ACA17|S3BW1120|42111000|4512201000|AHZ|CB10|ACA17|S3BW100120|4512111000|4512201000|AEBDHZ|CB10|NRZX8|DX|410325|410312|410312|009|NQ    |Z
ACA1E|S3BW1120|41101000|4511201000|ADHZ|CB10|ACA1E|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|11X2W|DX|410312|410312|410312|011|LQ    |Z


The expected result is..

Code:
1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|OVEPT|VO|430300|430300|430300|009|IC    |Z
1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|OVPDM|VO|430300|430300|430300|009|IC    |Z
6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|17CLP|DH|******|6670NI|410402|011|LQ    |Z
AGRJE|NA|NA|NA|NA|NA|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|NRZO4|EQ|402100|6670DC|410402|001|EQ|Z|U|Y|VT
6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|SMOSA|EQ|******|6670NI|410402|016|EQ    |Z
ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|11X1W|DX|410312|410312|410312|011|LQ    |Z
ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|1LN88|DX|410312|410312|410312|A14|IOC   |Z
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|1LNSK|DX|410312|410312|410312|A14|TC    |Z
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|1LNVX|DX|410312|410312|410312|009|IOC   |Z
ALBBE|S3BW1118|451000|45111000|KPASBL|CB20|ACA15|S3BW100120|4511101000|4511201000|KPASBL|CB20|1LNVX|FD|410312|410210|410210|A14|IOC|Z|N|Y|IS
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|1ZOAA|DX|410312|410312|410312|011|LQ|Z|A|Y|IS
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|NRLAF|DX|410312|410312|410312|A15|EQ    |Z
ACA15|S3BW1120|45111000|45112000|AEZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|NRZCN|DX|410312|410312|410312|009|NQ    |Z
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|NRZFC|DX|410312|410312|410312|009|NQ    |Z
ACA15|S0120|4101000|4511000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|NRZX8|DX|410312|410312|410312|A14|NQ    |Z
ACA15|S30120|41101000|45112000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|O41AC|DX|410312|410312|410312|009|NQ-AC|Z|N|Y|IS
ACA17|S3100120|4111000|45122000|AEHZ|CB10|ACA17|S3BW100120|4512111000|4512201000|AEBDHZ|CB10|1LN88|DX|410325|410312|410312|A14|IOC   |Z
ACA17|S3BW1120|411000|4512201000|AEBDHZ|CB10|ACA17|S3BW100120|4512111000|4512201000|AEBDHZ|CB10|NRZX8|DX|410325|410312|410312|009|NQ    |Z
ACA1E|S3BW1120|4101000|4511201000|AEBDHZ|CB10|ACA1E|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|11X2W|DX|410312|410312|410312|011|LQ    |Z
ACA1E|S3BW20|4111000|4511201000|AEBDHZ|CB10|ACA1E|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|1LN88|DH|410312|410312|410312|A14|IOC   |Z

The above script was posted in this forum by fpmurphy. Thanks for that fpmuphy.. but this script is taking about 3 days to scan 48000 records in first file to 77000 records in second file..

The earlier threads which i had posted and contains some more description about the problem are...

https://www.unix.com/shell-programmin...wk-script.html

https://www.unix.com/shell-programmin...ther-file.html


Can we use some kind of counter, so that the script starts scanning the next record from the second file from where it stopped.
What i'm trying to say is that, here, i believe its taking more time because every time it fetches a record from first file it scans the second file from the beginning. But the second file is sorted with respect to 7th, 13th and 14th column.


Thanks,
RRVARMA
# 2  
Old 07-18-2008
Tools thinking out loud

Would it only be a one-to-one relationship between those key fields?

Could the 2nd file be made into a new temp file where
3keyfields"~"full_data_record
then sorted

perhaps a grep or other command
if successful then cut -d"~" -f2 >>new_file
# 3  
Old 07-18-2008
Hi joeyg,

Thanks for the feed back.. but i'm not able to get you.. Smilie

Here both the files are sorted.

Thanks,
RRVARMA
# 4  
Old 07-18-2008
Just as an aside, when I process the first two files with that script, I don't get the output in third script. I get:
Code:
1TVAO|S3WS0306|45101000|4513000|AJGJ|CB10|1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|OVEPT|VO|430300|430300|430300|009|IC    |Z
1TVAO|S3WS0306|45101000|451000|AJFGJ|CB10|1TVAO|S3WS033306|4513101000|4513201000|AJBFGJ|CB10|OVPDM|VO|430300|430300|430300|009|IC    |Z
6NFXE|S3SN0201|41101000|451101000|B7HT|CB10|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|17CLP|DH|******|6670NI|410402|011|LQ    |Z
AGRJE|NA|NA|NA|NA|NA|6NFXE|S3021401|4511101000|4511201000|B7BXHT|CB10|NRZO4|EQ|402100|6670DC|410402|001|EQ|Z|U|Y|VT
6NFXE|S3SN0201|41101000|45111000|BXHT|CB10|6NFXE|S3SN021401|4511101000|4511201000|B7BXHT|CB10|SMOSA|EQ|******|6670NI|410402|016|EQ    |Z
ACA15|S3BW1120|41101000|4511000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|11X1W|DX|410312|410312|410312|011|LQ    |Z
ACA15|S3BW1120|41101000|45112000|AEZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|1LN88|DX|410312|410312|410312|A14|IOC   |Z
ALBBE|S3BW1118|41101000|4511201000|KSBL|CB20|ACA15|S3BW100120|4511101000|4511201000|KPASBL|CB20|1LNVX|FD|410312|410210|410210|A14|IOC|Z|N|Y|IS
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|1ZOAA|DX|410312|410312|410312|011|LQ|Z|A|Y|IS
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|NRLAF|DX|410312|410312|410312|A15|EQ    |Z
ACA15|S3BW1120|41101000|4511201000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|NRZCN|DX|410312|410312|410312|009|NQ    |Z
ACA15|NA|NA|NA|NA|NA|ACA15|NA|NA|NA|NA|NA|NRZFC|DX|410312|410312|410312|009|NQ    |Z
ACA15|S3BW1120|41101000|4511201000|AEHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|NRZX8|DX|410312|410312|410312|A14|NQ    |Z
ACA15|S3BW1120|41101000|4511201000|AEDHZ|CB10|ACA15|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|O41AC|DX|410312|410312|410312|009|NQ-AC|Z|N|Y|IS
ACA17|S3BW1120|42111000|4512201000|AEDHZ|CB10|ACA17|S3BW100120|4512111000|4512201000|AEBDHZ|CB10|1LN88|DX|410325|410312|410312|A14|IOC   |Z
ACA17|S3BW1120|42111000|4512201000|AHZ|CB10|ACA17|S3BW100120|4512111000|4512201000|AEBDHZ|CB10|NRZX8|DX|410325|410312|410312|009|NQ    |Z
ACA1E|S3BW1120|41101000|4511201000|ADHZ|CB10|ACA1E|S3BW100120|4511101000|4511201000|AEBDHZ|CB10|11X2W|DX|410312|410312|410312|011|LQ    |Z

That script is inefficient, but is it even doing what you want it do? We want to get the algorithm right.
# 5  
Old 07-18-2008
You can give this a try but it uses a significant amount of memory if you have big files:

Code:
awk 'BEGIN{FS=OFS="|"}
NR==FNR{a[$0]=1;next}
a[$7FS$13FS$14]{print}' file1 file2

If you get errors use nawk, gawk or /usr/xpg4/bin/awk on Solaris.

Regards
# 6  
Old 07-18-2008
Franklin's script runs really fast. If memory is an issue, though, try the following:
Code:
#!/bin/ksh

if [ $# != 2 ] 
then
exit 1
fi

IFS="|"

while read i
do
    set -A array $i

    while read j
    do
        field7=${array[6]}
        field13=${array[12]}
        field14=${array[13]}

        if [ "${j}" == "${field7}|${field13}|${field14}" ]
	then
	echo "$i"
	break 
	fi
    done < $1
done < $2

# 7  
Old 07-18-2008
If I understand your problem, you're doing a global search and match, output conditional on that match. The reason it takes so long is that you are going through your big file for each line in your control. By the above script, you are processing "J" times for each "I". If you start by sorting your "J" records by the 7, 13 and 14 fields, then you can "ladder walk" the "I" records and the "J" records, thus only going through each list once. Your script will be done almost immediately. The "i" records must, of course, also be sorted the same way.

Last edited by fsahog; 07-18-2008 at 11:29 PM.. Reason: To make it better
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Challenging scenario

Hi, My input file contains 1,2 2,4 3,6 4,9 9,10 My expected output is 1,10 2,10 3,6 4,1 9,10 (6 Replies)
Discussion started by: pandeesh
6 Replies

2. Shell Programming and Scripting

simple but challenging ignore case

Folks - I found this code on the forums to extract a paragraph for a matching pattern but I don't know how to make it ignore case. grep "-ip" is not an option for me as I am on SUSE LINUX. Thanks for ur help. I run this script as below: grep_para.ksh sqlstate < logfile "The end result... (2 Replies)
Discussion started by: beowulfkid
2 Replies

3. Shell Programming and Scripting

Challenging Awk array problem

Hi, I rather have a very complicated awk problem here, at least to me. I have two files. File 1: 607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4310 1567 1413 1 607 917 214 114 45 chr1 3000237 ... (19 Replies)
Discussion started by: polsum
19 Replies

4. Shell Programming and Scripting

Need help with this challenging code....

Hello All, I am new to this forum, and the reason I came here is to seek solution from the experts. I have written following wrapper script, it was running fine from past couple of months, until last week. When one of the function in the script which suppose to login through ssh to the... (2 Replies)
Discussion started by: tajdar
2 Replies

5. Programming

A challenging problem involving symbolic links.

Hello, I'm working on an application that bridges together several applications involved in creating a video workflow for editing with digital cinema cameras. The main platform is MacOSX. Because of the nature of some of the utilities for working with this video footage I must spoof filenames... (2 Replies)
Discussion started by: ibloom
2 Replies

6. Solaris

Hostname not fully qualified..

Hi Friends.. I have a small problem with the hostname of my system.I had installed Solaris 10 X86 on Vmware in my windows 2000 system.After booting of my solaris system,if i give check-hostname command it says ,, hostname is not fully qualified ,,change the hostname to hostname.xxx.xxxxxx.com... (3 Replies)
Discussion started by: sdspawankumar
3 Replies

7. Shell Programming and Scripting

Challenging!! Help needed

Hi, I have a script xyz.ksh which accpets two parameters the format of first one is :X_TABLENAME_Y and second one is a digit. I can extract a table name from that parameter and store it in a variable var_tblnm, so if i pass a parameter X_TABLE1_Y the value in var_tblenm is "TABLE1" now i have... (1 Reply)
Discussion started by: hcdiss
1 Replies

8. UNIX for Dummies Questions & Answers

A Challenging situation for the MODERATORS

Well, I hope this way you will respond to my inquiries. I have 4 unix servers,with static ips (though i dont think this is an issue)....i can telnet and rlogin from one to the other....if i FTP from on et othe other and try to execute : cd /user return /user : no such file or... (1 Reply)
Discussion started by: BAM
1 Replies

9. UNIX for Dummies Questions & Answers

A Challenging Situation : i hope the moderators will respond to this problem..

I have the following situation : i have 4 Unix Sco servers, one Windows 2000 server, and an ADSL internet connection. All the servers, that is the 4 unix and the windows server have real static IPs supplied by my ISP. the servers are connected to a Switch , the switch is connected to an... (2 Replies)
Discussion started by: BAM
2 Replies

10. UNIX for Advanced & Expert Users

Very Challenging Question! Need help bad!

I am in desperate need of an answer to this question. I have looked everywhere (even the man pages) and found very little. Solaris has the concept of "plumbing" a network interface. What does this mean? I would be really greatful to whoever could help me answer this question. I am so... (1 Reply)
Discussion started by: Sparticus007
1 Replies
Login or Register to Ask a Question