comm - sorted result issues


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers comm - sorted result issues
# 1  
Old 11-28-2005
comm - sorted result issues

In AIX 5.2, we are attempting to create a delta file by comparing the prior extract to the new extract. We are having some records appear as new when we wouldn't expect it.

Problem appears to be related to the appearance of a new record with a key that is wholly contained in another records key. (Not sure why the key would even matter since we are performing record-level comparisons...)

Ultimately looking to understand:
1) Why are we getting the additional records back?
2) Why does the sort -n (noted below) resolve the issue?


An example probably illustrates the issue best... NOTE(Both sorts use the same syntax.)

Previous Extract (sorted result named prev_extract.dat.srt)
11111|Value A1|Value A2
12345|Value A3|Value A4
12346|Value A5|Value A6
9999|Value A7|Value A8

New Extract (pre-sort)
11111|Value A1|Value A2
12345|Value A3|Value A4
12346|Value A5|Value A6
9999|Value A7|Value A8
123|Value A9|Value A10

Sort New Extract
sort -t"|" -k1,1 New_Extract.dat > New_Extract.dat.srt

New Extract (sorted result)
11111|Value A1|Value A2
123|Value A9|Value A10
12345|Value A3|Value A4
12346|Value A5|Value A6
9999|Value A7|Value A8

Compare the Files
comm -23 New_Extract.dat.srt prev_extract.dat.srt > Extract_addchg.dat


Based on our understanding, the comparison results should show any records that are new (adds) or have been modified. So Extract_addchg.dat should look like:
123|Value A9|Value A10

However, our Extract_addchg.dat actually looks like:
123|Value A9|Value A10
12345|Value A3|Value A4
12346|Value A5|Value A6


If we change our sort commands to a sort -n ... then the 123 record moves well before the 12345 and 12346 and the comm only returns the records desired.


Any explanations?
# 2  
Old 11-29-2005
I'd be quite interested in any explanations to that too - changing the pipes (|) to anything else gives the correct result, but I've no idea why this should be.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to compare the current result with previous line result.?

Hi Gurus, I have requirement to compare current result with previous reuslt. The sample case is below. 1 job1 1 1 job2 2 1 job3 3 2 job_a1 1 2 job_a2 2 2 job_a3 3 3 job_b1 1 3 job_b2 2 for above sample file, GID is group ID, for input line, the job run... (1 Reply)
Discussion started by: ken6503
1 Replies

2. Linux

comm command help

The manual does not cover this very well. What do the following compares will do ? 1) comm -13 file1 file2: will it display what is in file2 not in file1? 2) comm -23 file1 file2: will it display what in 1 but not in 2 ? Thanks (5 Replies)
Discussion started by: mrn6430
5 Replies

3. UNIX for Dummies Questions & Answers

help on COMM command please

could some one please explain with examples how comm -12 & comm -3 works. I am confused with manual page, Thankyou. (2 Replies)
Discussion started by: Ariean
2 Replies

4. UNIX for Dummies Questions & Answers

help in comm command

Hi all, I need help in comm command , I am having 2 files . I have to display the common line in the two file only onnce and i have to also display the non common line as well. tmpcut1 -- First file cat tmpcut1 smstr_303000_O_432830_... f_c2_queue_sys30.sys30 RUNNING 10 1000... (1 Reply)
Discussion started by: arunkumar_mca
1 Replies

5. Shell Programming and Scripting

problem with using comm

hi, I have two unsorted files and want to delete the lines which are common to both. file 1: S1069656304010437 S1069656304010449 S1470204501005393 S1069656304010474 S0001209208001294 S0000000012345678 S0001457507000590 S0002641707000784 S1470204501005381 S0001457507000280... (4 Replies)
Discussion started by: jathin12
4 Replies

6. Shell Programming and Scripting

comm not working

Hi Sorry if this a repeat question, I have the following two files. Both are sorted. file1 ---- CSCeb69473 CSCsg70355 CSCsj78917 CSCsj85065 CSCsl48743 CSCsl72823 CSCsl77748 file2 ---- CSCsg39295 (7 Replies)
Discussion started by: amitrajvarma
7 Replies

7. UNIX for Dummies Questions & Answers

display the result of wc -l with words before and after the result

hello showrev -p | wc -l returns: 381 What to do in case I want to have this output: number of lines returned by showrev -p is: 381 thx (3 Replies)
Discussion started by: melanie_pfefer
3 Replies

8. Shell Programming and Scripting

Outputting formatted Result log file from old 30000 lines result log<help required>

Well I have a 3000 lines result log file that contains all the machine data when it does the testing... It has 3 different section that i am intrsted in 1) starting with "20071126 11:11:11 Machine Header 1" 1000 lines... "End machine header 1" 2) starting with "20071126 12:12:12 Machine... (5 Replies)
Discussion started by: vikas.iet
5 Replies

9. UNIX for Dummies Questions & Answers

Comm, command help

See my other post on sdiff .... I don't think sdiff is able to do what I want. The 'comm' command does what I need and works fine as far as the logic and results. The problem I'm having is with the output format, it outputs 3 columns of data, but because of the way it starts each line... (2 Replies)
Discussion started by: cowpoke
2 Replies

10. Shell Programming and Scripting

comm ?!

Hi, I have two large files with uid's: - 581004 File1.txt - 292675 File2.txt I want to know which uid's are in File1.txt and not in File2.txt. I have used comm -23 File1.txt File2.txt. This should do the trick i thought. But in the output i keep having uid's in File1.txt that are also in... (8 Replies)
Discussion started by: tine
8 Replies
Login or Register to Ask a Question