Help with awk script to get missing numbers in column 1


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with awk script to get missing numbers in column 1
# 15  
Old 11-05-2013
Quote:
awk -F, 'length($1)<8{for (i=a+1; i<$1; i++) print i; a=$1}' file.txt
Since the first example was coma delimeted then -F, (field delimiter) was used but not needed for the file posted.

length($1)<8 use only records that have field 1 length < 8

for (i=a+1; i<$1; i++) print i; a=$1 for value of a + 1 (stored from last record) to value of first field print the value of i list; store field 1 in a variable
This User Gave Thanks to rdrtx1 For This Post:
# 16  
Old 11-05-2013
Quote:
Originally Posted by rdrtx1
Since the first example was coma delimeted then -F, (field delimiter) was used but not needed for the file posted.

length($1)<8 use only records that have field 1 length < 8

for (i=a+1; i<$1; i++) print i; a=$1 for value of a + 1 (stored from last record) to value of first field print the value of i list; store field 1 in a variable
Thank you, I think I get the idea SmilieSmilie
# 17  
Old 11-06-2013
Quote:
Originally Posted by Ophiuchus
Hello Akshay Hegde,

Thank you for your help.

It works with sample file but is not working with real file.

Real file has 5,440,177 lines and the last number in column 1 is 5440255. So, substracting both there are 78 numbers that are missing in column 1.
But trying your script I get more than 33 million of lines and I stopped since it seems enters in an infinite loop.

Is there a way to preload an array from 1 to N (N=13 in this case, in real file N=5440255)? in order to compare array which values from column 1 are not in array?

Thanks again

Try this since you have not supplied real input and not even mentioned that $1 length should not exceed more than 7. You were getting wrong result, it does not mean that it enters in an infinite loop. And in #1 you shown that your input is comma separated, but in real input it's not.

Missing and Count is shown below, change print x,++n to print x once test is done

Code:
$ awk  'length($1)<8{while(++x<$1)print x,++n}' file.txt 
65330 1
130866 2
196402 3
261938 4
327474 5
393010 6
458546 7
524082 8
589618 9
655154 10
720690 11
786226 12
851762 13
916097 14
917298 15
982834 16
1048370 17
1092661 18
1113906 19
1179442 20
1244978 21
1310514 22
1376050 23
1441586 24
1507122 25
1572658 26
1637741 27
1638194 28
1703730 29
1722211 30
1769266 31
1834802 32
1900338 33
1965874 34
2031410 35
2096946 36
2162482 37
2228018 38
2293554 39
2359090 40
2424626 41
2490162 42
2555698 43
2621234 44
2686770 45
2752306 46
2817842 47
2883378 48
2948914 49
3014450 50
3079986 51
3090613 52
3145522 53
3211058 54
3276594 55
3322017 56
3342130 57
3407666 58
3473202 59
3538738 60
3604274 61
3669810 62
3735346 63
3800882 64
3866418 65
3931954 66
3997490 67
4063026 68
4128562 69
4194098 70
4216084 71
4259634 72
4325170 73
4390706 74
4456242 75
4521778 76
4587314 77
4652850 78
4718386 79
4783922 80
4807884 81
4849458 82
4860937 83
4914994 84
4980530 85
5046066 86
5051204 87
5111602 88
5177138 89
5242674 90
5308210 91
5373746 92
5439282 93

This User Gave Thanks to Akshay Hegde For This Post:
# 18  
Old 11-06-2013
Quote:
Originally Posted by Akshay Hegde
Try this since you have not supplied real input and not even mentioned that $1 length should not exceed more than 7. You were getting wrong result, it does not mean that it enters in an infinite loop. And in #1 you shown that your input is comma separated, but in real input it's not.
Hello Akshay,

Thanks for your help. I provided a simple sample since the logic should work for a small sample and in general. The handling of length of 7 for column1 was introduce by rdrtx1 since he found 10 wrong records that I didn't know about their existence.

The real file is comma delimited, I only upload the first column since is to big with more columns and the script would be the same only needed to remove the field separator.

Your last code it seems to work fine with real file now and is great the addition of count.

Many thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Programming

Find gaps in time data and replace missing time value and column 2 value by interpolation in awk

Dear all, I am kindly seeking assistance on the following issue. I am working with data that is sampled every 0.05 hours (that is 3 minutes intervals) here is a sample data from the file 5.00000 15.5030 5.05000 15.6680 5.10000 16.0100 5.15000 16.3450 5.20000 16.7120 5.25000... (4 Replies)
Discussion started by: malandisa
4 Replies

3. UNIX for Dummies Questions & Answers

How to combine and insert missing consecutive numbers - awk or script?

Hi all, I have two (2) sets of files that are based on some snapshots of database that I want to merge and insert any missing sequential number. Below are example representation of these files: file1: DATE TIME COL1 COL2 COL3 COL4 ID 01/10/2013 0800 100 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

4. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Hi, I need an awk script (or whatever shell-construct) that would take data like below and get the max value of 3 column, when grouping by the 1st column. clientname,day-of-month,max-users ----------------------------------- client1,20120610,5 client2,20120610,2 client3,20120610,7... (3 Replies)
Discussion started by: ckmehta
3 Replies

5. Shell Programming and Scripting

Fill missing numbers in second column with zeros

Hi All, I have 100 files with names like this: 1.dat, 2.dat, 3.dat until 100.dat. My dat files look like this: 42323 0 438939 1 434 0 0.9383 3434 120.23 3 234 As you can see in the second column, some numbers are missing. I want to fill those missing places with 0's in all... (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

6. Shell Programming and Scripting

how to include the missing column in the original file using awk

Hi Experts, The content of the raw file: date,nomsgsent,nomsgnotdeliver,nomsgdelay 201003251000,1000,1,2 201003251000,900,0,0 201003251000,1450,0,0 201003251000,1230,0,0 However, sometimes, the column will missing in the raw files: e.g. date,nomsgsent,nomsgdelay... (8 Replies)
Discussion started by: natalie23
8 Replies

7. Shell Programming and Scripting

AWK processing -numbers to another column

Hi Guys, I'm trying to clean up my home logger file and can't seem to work this out. Here is my data: 10-19-2009 08:39 00.2 00.0 00.7 01.1 49.1 0.0 11.9 270.1 -49.1 220.9 10-19-2009 08:40 00.2 00.0 00.7 00.7 49.1 0.0 171.9 171.9 49.1 220.9 10-19-2009 08:41 00.1 00.0 00.7 00.8 24.5 0.0... (2 Replies)
Discussion started by: BeJay
2 Replies

8. Shell Programming and Scripting

trying to make an AWK code for ordering numbers in a column from least to highest

Hi all, I have a large column of numbers like 5.6789 2.4578 9.4678 13.5673 1.6589 ..... I am trying to make an awk code so that awk can easily go through the column and arrange the numbers from least to highest like 1.6589 2.4578 5.6789 ....... can anybody suggest, how can I do... (5 Replies)
Discussion started by: ananyob
5 Replies

9. Shell Programming and Scripting

ksh/awk help - output missing numbers

Here is what I am trying to do: I have a list of numbers that I pulled from an awk command in a column like so: 1 3 4 7 8 I want to find which numbers in the list are missing out of a range. So let's say I want to find out from the list above which numbers are missing from the... (6 Replies)
Discussion started by: afavis
6 Replies
Login or Register to Ask a Question