To get Non matching records for current day


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To get Non matching records for current day
# 1  
Old 12-16-2013
To get Non matching records for current day

My objective is to get the non matching records of previous day with current day.
eg, file1 contains
Code:
1 a
2 b

and file2 contains:
Code:
2 b
3 c

then expected output is
Code:
3 c¨

another example

file 1 contains:

Code:
1 a
2 b

file 2 contains
Code:
1 c
2 b

results expected:
Code:
1 c

I did used
Code:
comm -13

to achive this which is working fine with me. But when the file size is large it is failing. I am attaching here the two input files and the output file.The output file generated is wrong.

In the output file, only the record starting with 246 should have appear. Please advise.

Command executed:
Code:
comm -13<(sort data01012013.txt)<(sort data02012013.txt) > output.txt

My OS is unix solaris.

Thanks for your help.

Last edited by Don Cragun; 12-16-2013 at 05:54 PM.. Reason: Add several more CODE tags.
# 2  
Old 12-16-2013
Works for me if I add some spaces:
Code:
comm -13 <(sort /tmp/data01012013.txt) <(sort /tmp/data02012013.txt)
246    martin    Paul    Maas    NULL    NULL    Q6eXufheKL....

# 3  
Old 12-16-2013
You might need to convert these files to unix format dos2unix data02012013.txt and dos2unix data01012013.txt

or:

Code:
comm -13 <(sort /tmp/data01012013.txt | tr -d '\015') <(sort /tmp/data02012013.txt | tr -d '\015')

# 4  
Old 12-16-2013
In addition to RudiC's and Chubler_XL's comments, the comm utility is only guaranteed to work on text files. Text files have lines that are no longer than the number of bytes reported by the command:
Code:
getconf LINE_MAX

on your system. You didn't say which Solaris system version you're using, but I don't remember any Solaris system having a maximum text file line length that is as big as the 57,810 byte longest line length that you have in both of your input files.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 12-17-2013
Thanks for giving your valuable time on this issue.

@RudiC's/ Chubler_XL, I tried with your solution but no success.

@Don Cragun, The txt file basically contain dumps of character large object data for one of the columns and hence there was increase in record size.

Can we change this LINE_MAX value? Is there any impact on system if we can change it?

Here is the details ..
Code:
$ uname
SunOS
$ uname -v
Generic_125100-06
$ getconf LINE_MAX
2048

Thanks.

Last edited by newbie2014; 12-17-2013 at 04:01 AM..
# 6  
Old 12-17-2013
Hi.

Here is a demonstration of the failure and a work-around using a perl version of comm:
Code:
#!/usr/bin/env bash

# @(#) s1       Demonstrate comparison of comm, perl comm on Solaris.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
untab() { perl -wp -e 's/\t/        /g' $1 ; }
flip() { perl -wp -e 's/\r//g' $1 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C pll

rm -f f[12]
pl " Input data files data[12]"
file data[12]
wc data[12]

pl " Data file f[12] after conversion:"
cp data1 f1
cp data2 f2

flip f1 > x1 ; mv x1 f1
flip f2 > x1 ; mv x1 f2

file f[12]
wc f[12]

pl " Script ./comm.pl:"
what ./comm.pl

pl " Results, perl comm:"
./comm.pl -13 <(sort f1) <(sort f2) > f4
wc f4
file f4
untab f4 | pll 78

pl " Results, standard comm:"
comm -13 <(sort f1) <(sort f2) > f3
wc f3
file f3
untab f3 | pll 78

pl " Short lines in small files, result from perl comm:"
head data[34]
pe
./comm.pl -13 <( sort data3 ) <( sort data4 )

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = POSIX, LANG = POSIX
(Versions displayed with local utility "version")
OS, ker|rel, machine: SunOS, 5.10, i86pc
Distribution        : Solaris 10 10/08 s10x_u6wos_07b X86
bash GNU bash 3.00.16
pll (local) 1.19

-----
 Input data files data[12]
data1:          ascii text
data2:          ascii text
       2    3760   74398 data1
       2    2918   58115 data2
       4    6678  132513 total

-----
 Data file f[12] after conversion:
f1:             ascii text
f2:             ascii text
       2    3760   74396 f1
       2    2918   58113 f2
       4    6678  132509 total

-----
 Script ./comm.pl:
comm.pl Compare two sorted files line by line, perl.

-----
 Results, perl comm:
       1      51     305 f4
f4:             ascii text
 (Longest line: 640; fit into lines of length 78)
         1         2         3     ...     61        62        63        
12345678901234567890123456789012345...45678901234567890123456789012345678
        246        martin        Pa...NULL        NULL        6        NU

-----
 Results, standard comm:
      15    1485   29004 f3
f3:             ascii text
 (Longest line: 2364; fit into lines of length 78)
         1         2         3     ... 233        234        235        2
12345678901234567890123456789012345...89012345678901234567890123456789012
243        Williamss        Serena ...</AcctId>\n    <EventId>\n      <Va
>300852</Value>\n      <Modified>fa...\n    <Val3>285963</Val3>\n    <Whe
009-04-27T09:57:42Z</When>\n    <Wh...eue_item>\n  <queue_item xmlns="urn
j.api.facebook.com">\n    <AcctId>2...ified>false</Modified>\n    </Event
\n    <Msg>Goedendag, op 18 maart j...pe>\n    <Val1>3</Val1>\n    <Val2>
Val2>\n    <Val3>285661</Val3>\n   ...<queue_item xmlns="urn:obj.api.face
k.com">\n    <AcctId>243</AcctId>\n...jn facebook scherp abonnement verle
. Mij...</Msg>\n    <Type>4</Type>\..."urn:obj.api.facebook.com">\n    <A
Id>243</AcctId>\n    <EventId>\n   ...ype>\n    <Val1>1</Val1>\n    <Val2
</Val2>\n    <Val3>287325</Val3>\n ...:59Z</When>\n    <WhenLT>2009-04-28
:47:59+02:00</WhenLT>\n    <Modifie.../Msg>\n    <Type>4</Type>\n    <Val
</Val1>\n    <Val2>88</Val2>\n    <...d>\n      <Value>310313</Value>\n  
 <Modified>false</Modified>\n    </...<Val1>1</Val1>\n    <Val2>131</Val2
    <Val3>286913</Val3>\n    <When>...   <EventId>\n      <Value>312158</
246        martin        Paul      ...NULL        NULL        6        NU

-----
 Short lines in small files, result from perl comm:
==> data3 <==
1 a
2 b

==> data4 <==
2 b
1 c

        1 c

Some comments. It looks like the sort is OK, possibly sort is managing lines on it own. The trouble is with comm, which appears to mangle long lines. The perl version of comm can handle long lines, at the added cost of overhead of a interpreted language.

This output, while busy, shows the input files, then the results of a perl script in a shell function, flip, to remove carriage returns. The Solaris version of file is not as useful as the LInux version, just showing ascii as opposed to:
Code:
data1: UTF-8 Unicode text, with very long lines, with CRLF line terminators
data2: ASCII text, with very long lines, with CRLF line terminators

for example.

The converted files are then sorted and fed into comm and comm.pl. The result is that comm appears to split long lines into chunks, whereas comm.pl handles them without such a flaw.

The long lines are presented in an abbreviated style by a local code, pll, that we use here. The TABS were converted to runs of blanks by another perl code in function untab.

The final run just shows that comm.pl -13 works as one expects comm -13 to work.

You can find comm.pl at http://cpansearch.perl.org/src/CWEST.../comm/comm.mjd along many other perl versions of common *nix commands. There are also many GNU-style commands in /usr/sfw/bin/ on Solaris systems, but comm does not appear to among them.

Best wishes ... cheers, drl

PS Just so that you can see the entire sample input files, here is the result of the display of converted files f1 and f2 above:
Code:
$ untab f1 | pll 78 
 (Longest line: 58137; fit into lines of length 78)
         1         2         3     ...0        5811        5812        58
12345678901234567890123456789012345...12345678901234567890123456789012345
242        Mandella        Martina ...ent_queue_arr>        213        NU
243        Williamss        Serena ...ent_queue_arr>        216        NU

$ untab f2 | pll 78 
 (Longest line: 58137; fit into lines of length 78)
         1         2         3     ...0        5811        5812        58
12345678901234567890123456789012345...12345678901234567890123456789012345
243        Williamss        Serena ...ent_queue_arr>        216        NU
246        martin        Paul      ...NULL        NULL        6        NU


Last edited by drl; 12-18-2013 at 07:14 AM.. Reason: Minor typo.
# 7  
Old 12-18-2013
Thanks for your valuable time and the workaround. This will compel me to learn perl. No issues.
Quote:
There are also many GNU-style commands in /usr/sfw/bin/ on Solaris systems, but comm does not appear to among them.
Is there any ways to have a workaround using unix shell commands?
Many thanks!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to get first & last day of a month from current date?

Hi, I need the first & last day of a month from any given date. For better understanding, if i need to back-fill data for date 07/20/2019 i.e July 20 2019, i need the first & last day has 07/01/2019 - 07/31/2019. FYI: I'm using GIT BASH terminal. sample code: export DT=$(date --date='6 days... (2 Replies)
Discussion started by: Rocky975583
2 Replies

2. UNIX for Beginners Questions & Answers

How to test the current days to compare a given day?

Hi, I tested this : #!/bin/bash set +x CurrentDay=$(date +'%a') (Fri) on my server Fri=$(date -d "Friday" | awk '{print $1}') Sat=$(date -d "Saturday" | awk '{print $1}') if ] ; then echo "ok" ; else echo "ok" ; fi But the output tell me always "ok" why?! Thanks in advance :b: (5 Replies)
Discussion started by: Arnaudh78
5 Replies

3. UNIX for Dummies Questions & Answers

How to save current day files only?

i want to save current day file daily for this is am using below command. cp -p $(ls -lrt | grep "Apr 15" | awk '{print $9}' in order to script this part, i am saving date output in a file using below command date | awk '{print $2,$3}' >>t1 thru below command i want to list the file of... (7 Replies)
Discussion started by: scriptor
7 Replies

4. UNIX for Dummies Questions & Answers

Move the files between Current day & a previous day

Hi All, I have a requirement where I need to first capture the current day & move all the files from a particular directory based on a previous day. i.e move all the files from one directory to another based on current day & a previous day. Here is what I am trying, but it gives me errors.... (2 Replies)
Discussion started by: dsfreddie
2 Replies

5. UNIX for Dummies Questions & Answers

current day remote files from FTP

Hi All, I have to work on a korn shell script to pick up only the current day files dropped on the remote server (using ftp). The file do not have daytimestamp on it. It has to be based on server time (AIX) The file naming convention is "test_file.txt" When I log in into the ftp account... (15 Replies)
Discussion started by: pavan_test
15 Replies

6. Shell Programming and Scripting

Command to list current day files only

Hi All, can anyone pls share the command to list the files of current day only. i want to check if there are any files in a particular directory which are not of current date. (6 Replies)
Discussion started by: josephroyal
6 Replies

7. Shell Programming and Scripting

Get current day on Julian date format

Hi guys, I know if I try to get a julian date using a specific date I can but I try to get the current date I got an error as you can see below: This one works fine: date -d "2010/10/30" +%j But I can't get the current date as below: `date -d "+%Y/%m/%d`" +%j Does somebody can... (6 Replies)
Discussion started by: edudiogo
6 Replies

8. UNIX for Dummies Questions & Answers

I want to get day from the current system date

Hi, I want to check what day is today (like mon,Tue,wed) When i checked the syntax, i dont see there is a format specifier for getting the day. Let me know how to get the same. I am very new to unix and so I am asking some basic questions. cheers, gops (2 Replies)
Discussion started by: gopskrish
2 Replies

9. Shell Programming and Scripting

delete files one day old in current month only

i want to delete files that are one day old condition is files should be of current month only ie if iam running script on 1 march it should not delete files of 28 feb(29 if leap year :-)} any modifications to find $DIR -type f -atime +1 -exec rm -f{}\; (4 Replies)
Discussion started by: maverick
4 Replies

10. Shell Programming and Scripting

How to compare prev day file to current day file

Hi all: I am new to this board and UNIX programming so please forgive if I don't explain something correctly. I am trying to write a script to keep track of our links, we link one program written for Client A to Client B's directory. What we want to do is to keep track of our linked programs... (1 Reply)
Discussion started by: Smurtzy
1 Replies
Login or Register to Ask a Question