Check/print missing number in a consecutive range and remove duplicate numbers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Check/print missing number in a consecutive range and remove duplicate numbers
# 1  
Old 03-23-2017
Check/print missing number in a consecutive range and remove duplicate numbers

Hi,

In an ideal scenario, I will have a listing of db transaction log that gets copied to a DR site and if I have them all, they will be numbered consecutively like below.

Code:
1_79811_01234567.arc
1_79812_01234567.arc
1_79813_01234567.arc
1_79814_01234567.arc
1_79815_01234567.arc
2_86754_01234567.arc
2_86755_01234567.arc
2_86756_01234567.arc
2_86757_01234567.arc
2_86758_01234567.arc
3_82692_01234567.arc
3_82693_01234567.arc
3_82694_01234567.arc
3_82695_01234567.arc
3_82696_01234567.arc

There will be some scenario where files are not coped for some reason maybe network failure for example so there will be gap in the list of files.

So the list above may be something like below where there is a gap in a supposed to consecutive list.

Code:
1_79811_01234567.arc
1_79812_01234567.arc
1_79815_01234567.arc
2_86754_01234567.arc
2_86755_01234567.arc
2_86757_01234567.arc
2_86758_01234567.arc
3_82692_01234567.arc
3_82694_01234567.arc
3_82696_01234567.arc

Does anyone know a quick way of checking for what is the missing number in the consecutive range?

At the moment, what I am doing is I am cutting the list into 3 separate list based on the first character. The first digit is the transaction group, the second digit is the transaction number and the 3rd digit is the db id which is a constant.

Then I am reading each new list, set the first number as a 'base' and then incrementing it by 1, assign it to a variable and then comparing that number with what I read next. If they don't match, the I print that as the missing number or gap. It is a very long tedious process.

I am hoping someone know a trick of checking what is the missing number in the consecutive range and print it.

I don't want to insert the missing number in the existing list, I will be re-directing it to an exception list so I know what transaction log is missing that I will have to re-copy.

Also in some instance, I will have a listing where there will be duplicates in the listing like below

Code:
1_79811_01234567.arc
1_79812_01234567.arc
1_79812_01234567.arc
1_79813_01234567.arc
1_79812_01234567.arc
1_79814_01234567.arc
1_79815_01234567.arc
2_86754_01234567.arc
2_86756_01234567.arc
2_86755_01234567.arc
2_86756_01234567.arc
2_86757_01234567.arc
2_86756_01234567.arc
2_86758_01234567.arc
3_82692_01234567.arc
3_82692_01234567.arc
3_82693_01234567.arc
3_82694_01234567.arc
3_82695_01234567.arc
3_82694_01234567.arc
3_82696_01234567.arc

This is where the database log was send to the DR site multiple times. Is there a way to check for which lines are duplicates and how many lines/rows are there?

For example, from the latest listing above, 1_79812_01234567.arc has 3 entries, 2_86756_01234567.arc has 3 entries, 3_82694_01234567.arc has 2 entries and so on.

Any advice will be much appreciated. Thanks in advance.
# 2  
Old 03-23-2017
I'm afraid there's no "quick way" to do what you request. It has to be done like what you describe, line by line, value by value. Why don't you post your attempt to be discussed, analysed, and hopefully improved? And, post the desired output for problem 2.

By the way, a similar problem has been solved here.

And, for your second problem, how about
Code:
sort file | uniq -c
      1 1_79811_01234567.arc
      3 1_79812_01234567.arc
      1 1_79813_01234567.arc
      1 1_79814_01234567.arc
      1 1_79815_01234567.arc
      1 2_86754_01234567.arc
      1 2_86755_01234567.arc
      3 2_86756_01234567.arc
      1 2_86757_01234567.arc
      1 2_86758_01234567.arc
      2 3_82692_01234567.arc
      1 3_82693_01234567.arc
      2 3_82694_01234567.arc
      1 3_82695_01234567.arc
      1 3_82696_01234567.arc


Last edited by RudiC; 03-23-2017 at 05:45 AM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 03-23-2017
To look for 'missing' files, could you get a list of the files you have into a temporary file and then generate another than contains the names you think you should have. You can then do the following:-
Code:
grep -vf  found_files  expected_files

You have to be careful that you match exactly between the two files, so if you expect to have a file that is a123 and another that is a12345, then searching in this way will not report if a123 is p[resent but a12345 is missing.

If this is a concern, build your expected list to be like this:-
Code:
^abc123$
^abc12345$

This will match the string and anchor the ends to beginning and end of line so you get a complete match.



Does that help at all? It's good to share and we might be able to suggest some improvements.



Robin
# 4  
Old 03-30-2017
Hi all,

I've uploaded the script that I am using at the moment and some test data. It works like I intend it to, just thought maybe it can be improved somehow. So all the ones that are FAILED are the ones missing in the range of consecutive series

I didn't know I can use sort | uniq -c to check for duplicate, thanks to RudiC.

Checking up on the link below if it can be used instead.

awk to insert missing string based on pattern in file

Code:
$ ./x.ksh

-------------------------------------------------
- Running check_gap on test_clean.txt ...
-------------------------------------------------

- [ test_clean.txt.1.uniq ] ... starting gap check
   ... = Checking for 79811 => PASSED
   ... = Checking for 79812 => PASSED
   ... = Checking for 79813 => PASSED
   ... = Checking for 79814 => PASSED
   ... = Checking for 79815 => PASSED

- [ test_clean.txt.2.uniq ] ... starting gap check
   ... = Checking for 86754 => PASSED
   ... = Checking for 86755 => PASSED
   ... = Checking for 86756 => PASSED
   ... = Checking for 86757 => PASSED
   ... = Checking for 86758 => PASSED

- [ test_clean.txt.3.uniq ] ... starting gap check
   ... = Checking for 82692 => PASSED
   ... = Checking for 82693 => PASSED
   ... = Checking for 82694 => PASSED
   ... = Checking for 82695 => PASSED
   ... = Checking for 82696 => PASSED



-------------------------------------------------
- Running check_gap on test_gap.txt ...
-------------------------------------------------

- [ test_gap.txt.1.uniq ] ... starting gap check
   ... = Checking for 79811 => PASSED
   ... = Checking for 79812 => FAILED
   ... = Checking for 79813 => PASSED
   ... = Checking for 79814 => PASSED
   ... = Checking for 79815 => PASSED
   ... = Checking for 79816 => FAILED
   ... = Checking for 79817 => FAILED
   ... = Checking for 79818 => FAILED
   ... = Checking for 79819 => PASSED

- [ test_gap.txt.2.uniq ] ... starting gap check
   ... = Checking for 86754 => PASSED
   ... = Checking for 86755 => PASSED
   ... = Checking for 86756 => FAILED
   ... = Checking for 86757 => PASSED
   ... = Checking for 86758 => PASSED
   ... = Checking for 86759 => FAILED
   ... = Checking for 86760 => FAILED
   ... = Checking for 86761 => FAILED
   ... = Checking for 86762 => FAILED
   ... = Checking for 86763 => FAILED
   ... = Checking for 86764 => FAILED
   ... = Checking for 86765 => PASSED

- [ test_gap.txt.3.uniq ] ... starting gap check
   ... = Checking for 82692 => PASSED
   ... = Checking for 82693 => PASSED
   ... = Checking for 82694 => FAILED
   ... = Checking for 82695 => PASSED
   ... = Checking for 82696 => PASSED
   ... = Checking for 82697 => FAILED
   ... = Checking for 82698 => FAILED
   ... = Checking for 82699 => FAILED
   ... = Checking for 82700 => PASSED



-------------------------------------------------
- Running check_gap on test_dup.txt ...
-------------------------------------------------

- loggrp = 1 contains duplicated copies of the log
      3 1_79812_01234567.arc
      2 1_79819_01234567.arc

- [ test_dup.txt.1.uniq ] ... starting gap check
   ... = Checking for 79811 => PASSED
   ... = Checking for 79812 => PASSED
   ... = Checking for 79813 => PASSED
   ... = Checking for 79814 => PASSED
   ... = Checking for 79815 => PASSED
   ... = Checking for 79816 => FAILED
   ... = Checking for 79817 => FAILED
   ... = Checking for 79818 => FAILED
   ... = Checking for 79819 => PASSED

- loggrp = 2 contains duplicated copies of the log
      3 2_86756_01234567.arc
      2 2_86758_01234567.arc

- [ test_dup.txt.2.uniq ] ... starting gap check
   ... = Checking for 86754 => PASSED
   ... = Checking for 86755 => PASSED
   ... = Checking for 86756 => PASSED
   ... = Checking for 86757 => PASSED
   ... = Checking for 86758 => PASSED
   ... = Checking for 86759 => FAILED
   ... = Checking for 86760 => PASSED

- loggrp = 3 contains duplicated copies of the log
      2 3_82692_01234567.arc
      4 3_82696_01234567.arc
      2 3_82700_01234567.arc

- [ test_dup.txt.3.uniq ] ... starting gap check
   ... = Checking for 82692 => PASSED
   ... = Checking for 82693 => PASSED
   ... = Checking for 82694 => PASSED
   ... = Checking for 82695 => PASSED
   ... = Checking for 82696 => PASSED
   ... = Checking for 82697 => FAILED
   ... = Checking for 82698 => PASSED
   ... = Checking for 82699 => FAILED
   ... = Checking for 82700 => PASSED

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate consecutive lines with specific string

Hello, I'm trying to remove the duplicate consecutive lines with specific string "WARNING". File.txt abc; WARNING 2345 WARNING 2345 WARNING 2345 WARNING 2345 WARNING 2345 bcd; abc; 123 123 123 WARNING 1234 WARNING 2345 WARNING 2345 efgh; (6 Replies)
Discussion started by: Mannu2525
6 Replies

2. Shell Programming and Scripting

Remove duplicate entries based on the range

I have file like this: chr start end chr15 99874874 99875874 chr15 99875173 99876173 aa1 chr15 99874923 99875923 chr15 99875173 99876173 aa1 chr15 99874962 99875962 chr15 99875173 99876173 aa1 chr1 ... (7 Replies)
Discussion started by: raj_k
7 Replies

3. UNIX for Dummies Questions & Answers

How to combine and insert missing consecutive numbers - awk or script?

Hi all, I have two (2) sets of files that are based on some snapshots of database that I want to merge and insert any missing sequential number. Below are example representation of these files: file1: DATE TIME COL1 COL2 COL3 COL4 ID 01/10/2013 0800 100 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

4. Shell Programming and Scripting

Request to check:remove entries with duplicate numbers in first row

Hi I have a file 1 xyz 456 1 xyz 456 1 xyz 456 2 abc 8459 3 gfd 657 4 ghf 658 4 ghf 658 I want the output 1 xyz 456 2 abc 8459 3 gfd 657 4 ghf 658 (3 Replies)
Discussion started by: manigrover
3 Replies

5. Shell Programming and Scripting

Print consecutive numbers in column2

Hi, I have an input file of the following style input.txt The 4000 at the end indicates the total no. of columns in that row. I would like to replace all -1s with consecutive 1 and 2 and print the whole line again. So, the output would be output.txt Thanks in advance. (7 Replies)
Discussion started by: jacobs.smith
7 Replies

6. Shell Programming and Scripting

remove consecutive duplicate rows

I have some data that looks like, 1 3300665.mol 3300665 5177008 102.093 2 3300665.mol 3300665 5177008 102.093 3 3294015.mol 3294015 5131552 102.114 4 3294015.mol 3294015 5131552 102.114 5 3293734.mol 3293734 5129625 104.152 6 3293734.mol ... (13 Replies)
Discussion started by: LMHmedchem
13 Replies

7. Shell Programming and Scripting

remove html tags,consecutive duplicate lines

I need help with a script that will remove all HTML tags from an HTML document and remove any consecutive duplicate lines, and save it as a text document. The user should have the option of including the name of an html file as an argument for the script, but if none is provided, then the script... (7 Replies)
Discussion started by: clicstic
7 Replies

8. UNIX for Dummies Questions & Answers

Help required on Printing of Numbers, which are missing in the range

Hi Experts, Need help on printing of numbers, which are missing in the range. Pls find the details below Input 1000000002 1000000007 1234007940 1234007946 Output 1000000003 1000000004 1000000005 1000000006 1234007941 (2 Replies)
Discussion started by: krao
2 Replies

9. Shell Programming and Scripting

Inserting a range of consecutive numbers into a text file

I have a text file in the following format .... START 1,1 2,1 3,1 .. .. 9,1 10,1 END .... I want to change to the output to .... START 1,1 2,1 3,1 .. (4 Replies)
Discussion started by: VNR
4 Replies

10. Shell Programming and Scripting

Fill in missing numbers in range

I need to edit a list of numbers on the following form: 1 1.0 2 1.4 5 2.1 7 1.9 I want: 1 1.0 2 1.4 3 0.0 4 0.0 5 2.1 6 0.0 7 1.9 (i want to add the missing number in column 1 together with 0.0 in column 2). I guess it is rather trivial but i didn't even manage to read column... (5 Replies)
Discussion started by: bistru
5 Replies
Login or Register to Ask a Question