Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Check/print missing number in a consecutive range and remove duplicate numbers

Shell Programming and Scripting


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 03-23-2017   -   Original Discussion by newbie_01
newbie_01's Unix or Linux Image
newbie_01 newbie_01 is offline
Registered User
 
Join Date: May 2009
Last Activity: 18 January 2018, 11:48 AM EST
Posts: 232
Thanks: 44
Thanked 1 Time in 1 Post
Check/print missing number in a consecutive range and remove duplicate numbers

Hi,

In an ideal scenario, I will have a listing of db transaction log that gets copied to a DR site and if I have them all, they will be numbered consecutively like below.



Code:
1_79811_01234567.arc
1_79812_01234567.arc
1_79813_01234567.arc
1_79814_01234567.arc
1_79815_01234567.arc
2_86754_01234567.arc
2_86755_01234567.arc
2_86756_01234567.arc
2_86757_01234567.arc
2_86758_01234567.arc
3_82692_01234567.arc
3_82693_01234567.arc
3_82694_01234567.arc
3_82695_01234567.arc
3_82696_01234567.arc

There will be some scenario where files are not coped for some reason maybe network failure for example so there will be gap in the list of files.

So the list above may be something like below where there is a gap in a supposed to consecutive list.



Code:
1_79811_01234567.arc
1_79812_01234567.arc
1_79815_01234567.arc
2_86754_01234567.arc
2_86755_01234567.arc
2_86757_01234567.arc
2_86758_01234567.arc
3_82692_01234567.arc
3_82694_01234567.arc
3_82696_01234567.arc

Does anyone know a quick way of checking for what is the missing number in the consecutive range?

At the moment, what I am doing is I am cutting the list into 3 separate list based on the first character. The first digit is the transaction group, the second digit is the transaction number and the 3rd digit is the db id which is a constant.

Then I am reading each new list, set the first number as a 'base' and then incrementing it by 1, assign it to a variable and then comparing that number with what I read next. If they don't match, the I print that as the missing number or gap. It is a very long tedious process.

I am hoping someone know a trick of checking what is the missing number in the consecutive range and print it.

I don't want to insert the missing number in the existing list, I will be re-directing it to an exception list so I know what transaction log is missing that I will have to re-copy.

Also in some instance, I will have a listing where there will be duplicates in the listing like below



Code:
1_79811_01234567.arc
1_79812_01234567.arc
1_79812_01234567.arc
1_79813_01234567.arc
1_79812_01234567.arc
1_79814_01234567.arc
1_79815_01234567.arc
2_86754_01234567.arc
2_86756_01234567.arc
2_86755_01234567.arc
2_86756_01234567.arc
2_86757_01234567.arc
2_86756_01234567.arc
2_86758_01234567.arc
3_82692_01234567.arc
3_82692_01234567.arc
3_82693_01234567.arc
3_82694_01234567.arc
3_82695_01234567.arc
3_82694_01234567.arc
3_82696_01234567.arc

This is where the database log was send to the DR site multiple times. Is there a way to check for which lines are duplicates and how many lines/rows are there?

For example, from the latest listing above, 1_79812_01234567.arc has 3 entries, 2_86756_01234567.arc has 3 entries, 3_82694_01234567.arc has 2 entries and so on.

Any advice will be much appreciated. Thanks in advance.
Sponsored Links
    #2  
Old Unix and Linux 03-23-2017   -   Original Discussion by newbie_01
RudiC's Unix or Linux Image
RudiC RudiC is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 23 January 2018, 5:11 AM EST
Location: Aachen, Germany
Posts: 11,971
Thanks: 354
Thanked 3,688 Times in 3,386 Posts
I'm afraid there's no "quick way" to do what you request. It has to be done like what you describe, line by line, value by value. Why don't you post your attempt to be discussed, analysed, and hopefully improved? And, post the desired output for problem 2.

By the way, a similar problem has been solved here.

And, for your second problem, how about

Code:
sort file | uniq -c
      1 1_79811_01234567.arc
      3 1_79812_01234567.arc
      1 1_79813_01234567.arc
      1 1_79814_01234567.arc
      1 1_79815_01234567.arc
      1 2_86754_01234567.arc
      1 2_86755_01234567.arc
      3 2_86756_01234567.arc
      1 2_86757_01234567.arc
      1 2_86758_01234567.arc
      2 3_82692_01234567.arc
      1 3_82693_01234567.arc
      2 3_82694_01234567.arc
      1 3_82695_01234567.arc
      1 3_82696_01234567.arc


Last edited by RudiC; 03-23-2017 at 05:45 AM..
The Following User Says Thank You to RudiC For This Useful Post:
newbie_01 (03-30-2017)
Sponsored Links
    #3  
Old Unix and Linux 03-23-2017   -   Original Discussion by newbie_01
rbatte1's Unix or Linux Image
rbatte1 rbatte1 is offline Forum Staff  
Root armed
 
Join Date: Jun 2007
Last Activity: 22 January 2018, 8:39 AM EST
Location: Lancashire, UK
Posts: 3,441
Thanks: 1,493
Thanked 671 Times in 604 Posts
To look for 'missing' files, could you get a list of the files you have into a temporary file and then generate another than contains the names you think you should have. You can then do the following:-


Code:
grep -vf  found_files  expected_files

You have to be careful that you match exactly between the two files, so if you expect to have a file that is a123 and another that is a12345, then searching in this way will not report if a123 is p[resent but a12345 is missing.

If this is a concern, build your expected list to be like this:-

Code:
^abc123$
^abc12345$

This will match the string and anchor the ends to beginning and end of line so you get a complete match.



Does that help at all? It's good to share and we might be able to suggest some improvements.



Robin
    #4  
Old Unix and Linux 03-30-2017   -   Original Discussion by newbie_01
newbie_01's Unix or Linux Image
newbie_01 newbie_01 is offline
Registered User
 
Join Date: May 2009
Last Activity: 18 January 2018, 11:48 AM EST
Posts: 232
Thanks: 44
Thanked 1 Time in 1 Post
Hi all,

I've uploaded the script that I am using at the moment and some test data. It works like I intend it to, just thought maybe it can be improved somehow. So all the ones that are FAILED are the ones missing in the range of consecutive series

I didn't know I can use sort | uniq -c to check for duplicate, thanks to RudiC.

Checking up on the link below if it can be used instead.

https://www.unix.com/shell-programmin...tern-file.html



Code:
$ ./x.ksh

-------------------------------------------------
- Running check_gap on test_clean.txt ...
-------------------------------------------------

- [ test_clean.txt.1.uniq ] ... starting gap check
   ... = Checking for 79811 => PASSED
   ... = Checking for 79812 => PASSED
   ... = Checking for 79813 => PASSED
   ... = Checking for 79814 => PASSED
   ... = Checking for 79815 => PASSED

- [ test_clean.txt.2.uniq ] ... starting gap check
   ... = Checking for 86754 => PASSED
   ... = Checking for 86755 => PASSED
   ... = Checking for 86756 => PASSED
   ... = Checking for 86757 => PASSED
   ... = Checking for 86758 => PASSED

- [ test_clean.txt.3.uniq ] ... starting gap check
   ... = Checking for 82692 => PASSED
   ... = Checking for 82693 => PASSED
   ... = Checking for 82694 => PASSED
   ... = Checking for 82695 => PASSED
   ... = Checking for 82696 => PASSED



-------------------------------------------------
- Running check_gap on test_gap.txt ...
-------------------------------------------------

- [ test_gap.txt.1.uniq ] ... starting gap check
   ... = Checking for 79811 => PASSED
   ... = Checking for 79812 => FAILED
   ... = Checking for 79813 => PASSED
   ... = Checking for 79814 => PASSED
   ... = Checking for 79815 => PASSED
   ... = Checking for 79816 => FAILED
   ... = Checking for 79817 => FAILED
   ... = Checking for 79818 => FAILED
   ... = Checking for 79819 => PASSED

- [ test_gap.txt.2.uniq ] ... starting gap check
   ... = Checking for 86754 => PASSED
   ... = Checking for 86755 => PASSED
   ... = Checking for 86756 => FAILED
   ... = Checking for 86757 => PASSED
   ... = Checking for 86758 => PASSED
   ... = Checking for 86759 => FAILED
   ... = Checking for 86760 => FAILED
   ... = Checking for 86761 => FAILED
   ... = Checking for 86762 => FAILED
   ... = Checking for 86763 => FAILED
   ... = Checking for 86764 => FAILED
   ... = Checking for 86765 => PASSED

- [ test_gap.txt.3.uniq ] ... starting gap check
   ... = Checking for 82692 => PASSED
   ... = Checking for 82693 => PASSED
   ... = Checking for 82694 => FAILED
   ... = Checking for 82695 => PASSED
   ... = Checking for 82696 => PASSED
   ... = Checking for 82697 => FAILED
   ... = Checking for 82698 => FAILED
   ... = Checking for 82699 => FAILED
   ... = Checking for 82700 => PASSED



-------------------------------------------------
- Running check_gap on test_dup.txt ...
-------------------------------------------------

- loggrp = 1 contains duplicated copies of the log
      3 1_79812_01234567.arc
      2 1_79819_01234567.arc

- [ test_dup.txt.1.uniq ] ... starting gap check
   ... = Checking for 79811 => PASSED
   ... = Checking for 79812 => PASSED
   ... = Checking for 79813 => PASSED
   ... = Checking for 79814 => PASSED
   ... = Checking for 79815 => PASSED
   ... = Checking for 79816 => FAILED
   ... = Checking for 79817 => FAILED
   ... = Checking for 79818 => FAILED
   ... = Checking for 79819 => PASSED

- loggrp = 2 contains duplicated copies of the log
      3 2_86756_01234567.arc
      2 2_86758_01234567.arc

- [ test_dup.txt.2.uniq ] ... starting gap check
   ... = Checking for 86754 => PASSED
   ... = Checking for 86755 => PASSED
   ... = Checking for 86756 => PASSED
   ... = Checking for 86757 => PASSED
   ... = Checking for 86758 => PASSED
   ... = Checking for 86759 => FAILED
   ... = Checking for 86760 => PASSED

- loggrp = 3 contains duplicated copies of the log
      2 3_82692_01234567.arc
      4 3_82696_01234567.arc
      2 3_82700_01234567.arc

- [ test_dup.txt.3.uniq ] ... starting gap check
   ... = Checking for 82692 => PASSED
   ... = Checking for 82693 => PASSED
   ... = Checking for 82694 => PASSED
   ... = Checking for 82695 => PASSED
   ... = Checking for 82696 => PASSED
   ... = Checking for 82697 => FAILED
   ... = Checking for 82698 => PASSED
   ... = Checking for 82699 => FAILED
   ... = Checking for 82700 => PASSED

Attached Files
File Type: txt test_gap.txt (328 Bytes, 3 views)
File Type: txt test_clean.txt (328 Bytes, 2 views)
File Type: txt test_dup.txt (658 Bytes, 2 views)
File Type: sh x.sh (2.6 KB, 4 views)
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How to combine and insert missing consecutive numbers - awk or script? newbie_01 UNIX for Dummies Questions & Answers 3 05-21-2013 03:29 AM
Request to check:remove entries with duplicate numbers in first row manigrover Shell Programming and Scripting 3 07-31-2012 09:57 AM
remove consecutive duplicate rows LMHmedchem Shell Programming and Scripting 13 06-09-2011 11:47 AM
Inserting a range of consecutive numbers into a text file VNR Shell Programming and Scripting 4 03-27-2009 03:13 AM
Fill in missing numbers in range bistru Shell Programming and Scripting 5 05-17-2006 03:42 AM



All times are GMT -4. The time now is 06:13 AM.