Help with awk script to get missing numbers in column 1

11-05-2013

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

Quote:

awk -F, 'length($1)<8{for (i=a+1; i<$1; i++) print i; a=$1}' file.txt

Since the first example was coma delimeted then -F, (field delimiter) was used but not needed for the file posted.

length($1)<8 use only records that have field 1 length < 8

for (i=a+1; i<$1; i++) print i; a=$1 for value of a + 1 (stored from last record) to value of first field print the value of i list; store field 1 in a variable

This User Gave Thanks to rdrtx1 For This Post:

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

11-05-2013

Registered User

304, 2

Join Date: Oct 2011

Last Activity: 29 May 2019, 2:44 AM EDT

Posts: 304

Thanks Given: 59

Thanked 2 Times in 2 Posts

Quote:

Originally Posted by rdrtx1

Thank you, I think I get the idea

Ophiuchus

View Public Profile for Ophiuchus

Find all posts by Ophiuchus

11-06-2013

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Quote:

Originally Posted by Ophiuchus

Hello Akshay Hegde,

Thank you for your help.

It works with sample file but is not working with real file.

Real file has 5,440,177 lines and the last number in column 1 is 5440255. So, substracting both there are 78 numbers that are missing in column 1.
But trying your script I get more than 33 million of lines and I stopped since it seems enters in an infinite loop.

Is there a way to preload an array from 1 to N (N=13 in this case, in real file N=5440255)? in order to compare array which values from column 1 are not in array?

Thanks again

Try this since you have not supplied real input and not even mentioned that $1 length should not exceed more than 7. You were getting wrong result, it does not mean that it enters in an infinite loop. And in #1 you shown that your input is comma separated, but in real input it's not.

Missing and Count is shown below, change print x,++n to print x once test is done

Code:

$ awk  'length($1)<8{while(++x<$1)print x,++n}' file.txt 
65330 1
130866 2
196402 3
261938 4
327474 5
393010 6
458546 7
524082 8
589618 9
655154 10
720690 11
786226 12
851762 13
916097 14
917298 15
982834 16
1048370 17
1092661 18
1113906 19
1179442 20
1244978 21
1310514 22
1376050 23
1441586 24
1507122 25
1572658 26
1637741 27
1638194 28
1703730 29
1722211 30
1769266 31
1834802 32
1900338 33
1965874 34
2031410 35
2096946 36
2162482 37
2228018 38
2293554 39
2359090 40
2424626 41
2490162 42
2555698 43
2621234 44
2686770 45
2752306 46
2817842 47
2883378 48
2948914 49
3014450 50
3079986 51
3090613 52
3145522 53
3211058 54
3276594 55
3322017 56
3342130 57
3407666 58
3473202 59
3538738 60
3604274 61
3669810 62
3735346 63
3800882 64
3866418 65
3931954 66
3997490 67
4063026 68
4128562 69
4194098 70
4216084 71
4259634 72
4325170 73
4390706 74
4456242 75
4521778 76
4587314 77
4652850 78
4718386 79
4783922 80
4807884 81
4849458 82
4860937 83
4914994 84
4980530 85
5046066 86
5051204 87
5111602 88
5177138 89
5242674 90
5308210 91
5373746 92
5439282 93

This User Gave Thanks to Akshay Hegde For This Post:

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

11-06-2013

Registered User

304, 2

Join Date: Oct 2011

Last Activity: 29 May 2019, 2:44 AM EDT

Posts: 304

Thanks Given: 59

Thanked 2 Times in 2 Posts

Quote:

Originally Posted by Akshay Hegde

Hello Akshay,

Thanks for your help. I provided a simple sample since the logic should work for a small sample and in general. The handling of length of 7 for column1 was introduce by rdrtx1 since he found 10 wrong records that I didn't know about their existence.

The real file is comma delimited, I only upload the first column since is to big with more columns and the script would be the same only needed to remove the field separator.

Your last code it seems to work fine with real file now and is great the addition of count.

Many thanks

Ophiuchus

View Public Profile for Ophiuchus

Find all posts by Ophiuchus

Shell Programming and Scripting

Help with awk script to get missing numbers in column 1

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Discussion started by: as7951

2. Programming

Find gaps in time data and replace missing time value and column 2 value by interpolation in awk

Discussion started by: malandisa

3. UNIX for Dummies Questions & Answers

How to combine and insert missing consecutive numbers - awk or script?

Discussion started by: newbie_01

4. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

5. Shell Programming and Scripting

Fill missing numbers in second column with zeros

Discussion started by: shoaibjameel123

6. Shell Programming and Scripting

how to include the missing column in the original file using awk

Discussion started by: natalie23

7. Shell Programming and Scripting

AWK processing -numbers to another column

Discussion started by: BeJay

8. Shell Programming and Scripting

trying to make an AWK code for ordering numbers in a column from least to highest

Discussion started by: ananyob

9. Shell Programming and Scripting

ksh/awk help - output missing numbers

Discussion started by: afavis