awk, associative array, compare files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk, associative array, compare files
# 1  
Old 10-18-2011
Bug awk, associative array, compare files

i have a file like this
Code:
< '393200103052';'H3G';'20081204'
< '393200103059';'TIM';'20110111'
< '393200103061';'TIM';'20060206'
< '393200103064';'OPI';'20110623'
> '393200103052';'HKG';'20081204'
> '393200103056';'TIM';'20110111'
> '393200103088';'TIM';'20060206'

Now i have to generate a file which should have records like this
For: '393200103052' filed 2 differs :'H3G' 'HKG'
I have to genreate one more file like this,these are the records which are not exist with '>' symbol
Code:
< '393200103059';'TIM';'20110111'
< '393200103061';'TIM';'20060206'
< '393200103064';'OPI';'20110623'

if anyone help,advance thankful
Moderator's Comments:
Mod Comment Please use code tags when posting data and code samples!

Last edited by vgersh99; 10-18-2011 at 02:13 PM.. Reason: code tags, please!
# 2  
Old 10-18-2011
It's better to post a descriptive topic than "help needed please".

Quote:
Now i have to generate a file which should have records like this
For: '393200103052' filed 2 differs :'H3G' 'HKG'
What rationale picks out H3G and HKG as different? They look the same to me.

How should the program be able to know which records don't exist? Is there some other file which contains all the records which must exist?
# 3  
Old 10-18-2011
Network

I had following two files file1.txt and file2.txt
File1.txt
Code:
Field1|Filed2|Filed3
'393200103001';'TIM';'20080205'
'393200103017';'TIM';'20040521'
'393200103025';'OPI';'20041025'
'393200103032';'OPI';'20080218'
'393200103048';'OPI';'20101122'
'393200103052';'H3G';'20081204'
'393200103059';'TIM';'20110111'
'393200103061';'TIM';'20060206'
'393200103064';'OPI';'20110623'

File2.txt
Code:
Field1|Filed2|Filed3
'393200103001';'TIM';'20080205'
'393200103017';'TIM';'20040521'
'393200103025';'OPI';'20041025'
'393200103032';'OPI';'20080218'
'393200103048';'OPI';'20101122'
'393200103052';'HKG';'20081204'
'393200103056';'TIM';'20110111'
'393200103088';'TIM';'20060206'

My requirement is that i have to generate three more files like
missed_file1.txt-which should have the records missing that are in file2 but not in file1.
missed_file2.txt--which should have the records missing that are in file1 but not in file2.
common.txt- which should have like this
For '393200103052' filed2 differs file1:'H3G' file2:'HKG'

I think its more clear.I had done the thing but problem was my script is working fine for small files but original files have millions of records.so its taking very much time.

---------- Post updated at 12:26 PM ---------- Previous update was at 12:22 PM ----------
Code:
awk -F '[;$]+' '
        # load first file into array indexed by field 1
        NR == FNR {
                for (i=2; i<=NF; i++) {
                         file1[$1,$2,$3,i] = $i 
                }
                # store the number of fields for this index
                file1nf[$1] = NF
                next
        }
        {
                if (!file1nf[$1]) {
                             print ""$1";"$2";"$3"" >> "missed_file1.txt"
                            next
                }
                    }
' file1.txt file2.txt

same script i had used to generate missed_file2.txt by changing the last line like thsi
file2.txt file1.txt

Last edited by Scott; 10-18-2011 at 02:30 PM.. Reason: Use code tags, please...
# 4  
Old 10-18-2011
Code:
awk -F \; 'NR==FNR{a[$1];next} !($1 in a)' file1.txt file2.txt > missed_file1.txt

'393200103056';'TIM';'20110111'
'393200103088';'TIM';'20060206'

awk -F \; 'NR==FNR{a[$1];next} !($1 in a)' file2.txt file1.txt > missed_file2.txt

'393200103059';'TIM';'20110111'
'393200103061';'TIM';'20060206'
'393200103064';'OPI';'20110623'

awk -F \; 'NR==FNR{a[$1]=$2;next} $1 in a && a[$1]!=$2{print $1, $2,a[$1]} '  file2.txt file1.txt  > common.txt

'393200103052' 'H3G' 'HKG'

# 5  
Old 10-18-2011
Code:
awk -F \; 'NR==FNR{a[$1];next} !($1 in a)' file1.txt file2.txt > missed_file1.txt

'393200103056';'TIM';'20110111'
'393200103088';'TIM';'20060206'

awk -F \; 'NR==FNR{a[$1];next} !($1 in a)' file2.txt file1.txt > missed_file2.txt

'393200103059';'TIM';'20110111'
'393200103061';'TIM';'20060206'
'393200103064';'OPI';'20110623'

awk -F \; 'NR==FNR{a[$1]=$2;next} $1 in a && a[$1]!=$2{print $1, $2,a[$1]} '  file2.txt file1.txt  > common.txt

'393200103052' 'H3G' 'HKG'

# 6  
Old 10-19-2011
Network thank you

Its really working awesome.But my question is my original file1.txt and file2.txt have records in millions,so does it take huge time or efficient enough.

---------- Post updated at 11:21 PM ---------- Previous update was at 11:12 PM ----------

i need to compare the same for filed3 also.Iam java developer so no idea about shell script please help me regarding this also.I mean to say if filed3 filed2 changes i need to show both changes.
File1.txt
'393200103088';'TIM';'20060207'
File2.txt
'393200103088';'TIM';'20060208'
common.txt(Filed3 has changed for filed1)
'393200103088' '20060201' '2006208'

---------- Post updated at 11:39 PM ---------- Previous update was at 11:21 PM ----------

I had generated file3.txt by the following command.
diff file1.txt file2.txt > file3.txt
If you look into file3.txt
The records missing from file2 started with < symbol.
And the records missing from file1 started with > symbol.
And it also has coommon records(coomon first field)using this i have to generate same three files as i have mentioned in previous thread.
1.misssed_file1.txt
2.missed_file2.txt
3.common.txt
< '393200103052';'H3G';'20081204'
< '393200103059';'TIM';'20110111'
< '393200103061';'TIM';'20060206'
< '393200103064';'OPI';'20110623'
> '393200103052';'HKG';'20081204'
> '393200103056';'TIM';'20110111'
> '393200103088';'TIM';'20060206'
I think here > and < symobls are helpful to generate required files.
please help me regarding this.

---------- Post updated 10-19-11 at 12:23 AM ---------- Previous update was 10-18-11 at 11:39 PM ----------

my orginial file sizes
file1.txt-604 mb
file2.txt-422 mb
i had tested your code with original files its taking much time to process the files.please help me how i can optimize time.
# 7  
Old 10-19-2011
How long is it taking? You aren't going to process 600mb of data in a split second.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk Associative Array and/or Referring to Field by String (Nonconstant String Value)

I will start with an example of what I'm trying to do and then describe how I am approaching the issue. File PS028,005 Lexeme HRS # M # PhraseType 1(1:1) 7(7) PhraseLab 501 503 ClauseType ZYq0 PS028,005 Lexeme W # L> # BNH # M #... (17 Replies)
Discussion started by: jvoot
17 Replies

2. Shell Programming and Scripting

Associative array index question

I am trying to assign indexes to an associative array in a for loop but I have to use an eval command to make it work, this doesn't seem correct I don't have to do this with regular arrays For example, the following assignment fails without the eval command: #! /bin/bash read -d "\0" -a... (19 Replies)
Discussion started by: Riker1204
19 Replies

3. Shell Programming and Scripting

Using associative array for comparison

Hello together, i make something wrong... I want an array that contains information to associate it for further processing. Here is something from my bash... You will know, what I'm trying to do. I have to point out in advance, that the variable $SYSOS is changing and not as static as in my... (2 Replies)
Discussion started by: Decstasy
2 Replies

4. Shell Programming and Scripting

Awk: Dealing with whitespace in associative array indicies

Is there a reliable way to deal with whitespace in array indicies? I am trying to annotate fails in a database using a table of known fails. In a begin block I have code like this: # Read in Known Fail List getline < "'"$failListFile"'"; getline < "'"$failListFile"'"; getline <... (6 Replies)
Discussion started by: Michael Stora
6 Replies

5. Shell Programming and Scripting

Morse Code with Associative Array

Continuing my quest to learn BASH, Bourne, Awk, Grep, etc. on my own through the use of a few books. I've come to an exercise that has me absolutely stumped. The specifics: 1. Using ONLY BASH scripting commands (not sed, awk, etc.), write a script to convert a string on the command line to... (22 Replies)
Discussion started by: ksmarine1980
22 Replies

6. Shell Programming and Scripting

Bash 3d associative array with bash3 AND multiple files

Hello again guru’s (big apologies for wall of text) Still working on that DNS updater for my production team and while there is a ton of hit in searches i can't seem to find the answer to this. Context: We have apps that switch from let’s say host1 to host2. REAL basic DNS clustering... (5 Replies)
Discussion started by: maverick72
5 Replies

7. Shell Programming and Scripting

Associative array

I have an associative array named table declare -A table table="fruit" table="veggie" table="GT" table="eminem" Now say I have a variable returning the value highway How do I find corresponding value GT ?? (this value that I find (GT in this case) is supposed to be the name of a mysql... (1 Reply)
Discussion started by: leghorn
1 Replies

8. Shell Programming and Scripting

Help needed on Associative array in awk

Hi All, I got stuck up with shell script where i use awk. The scenario which i am working on is as below. I have a file text.txt with contents COL1 COL2 COL3 COL4 1 A 500 400 1 B 500 400 1 A 500 200 2 A 290 300 2 B 290 280 3 C 100 100 I could able to sum col 3 and col4 based on... (3 Replies)
Discussion started by: imsularif
3 Replies

9. Shell Programming and Scripting

Problem with lookup values on AWK associative array

I'm at wits end with this issue and my troubleshooting leads me to believe it is a problem with the file formatting of the array referenced by my script: awk -F, '{if (NR==FNR) {a=$4","$3","$2}\ else {print a "," $0}}' WBTSassignments1.txt RNCalarms.tmp On the WBTSassignments1.txt file... (2 Replies)
Discussion started by: JasonHamm
2 Replies

10. Shell Programming and Scripting

Associative Array

Hi, I am trying to make an associative array to use in a popup_menu on a website. Here is what i have: foreach $entr ( @entries ) { $temp_uid = $entr->get_value(uid); $temp_naam = $entr->get_value(sn); $s++; } This is the popup_menu i want to use it in. popup_menu(-name=>'modcon',... (4 Replies)
Discussion started by: tine
4 Replies
Login or Register to Ask a Question