awk not working for calculating no of lines with criteria

07-16-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

OK. I have looked up the tar header format. The tar header contains lots of nul bytes, so any attempt to process a tar archive using the shell, awk, sed, or any other Linux or UNIX text processing utilities produces undefined results. The 1st 100 bytes in a tar header may contain the file's name (if it is <= 100 bytes long), may contain one or more directory names from the file's pathname (if they fit along with the file's name in 100 bytes), and may contain complete garbage left over from archiving a previous file. If the file's name is longer than 100 bytes, but the complete stored pathname is <= 155 bytes, the pathname (including the final component) may be saved in bytes 345-499 (with the 1st byte numbered 0). So your awk script seems to be looking for "02" and "07" at specific points in the middle of a pathname that ends with a newline character and that is somewhere between 86 and 100 bytes long. If these conditions are met in the 1st file archived in the tar file, you may get the results you want for that file; otherwise, all bets are off.

If you will show us what I asked for in my last message (or at least the 1st several lines of output from the tar command and the corresponding output you want to be produced for those lines), we may be able to help you parse the output of a tar archive listing command to get what you want. Otherwise, I don't see how we can help.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

07-16-2013

Registered User

100, 0

Join Date: Mar 2012

Last Activity: 26 July 2017, 2:30 AM EDT

Posts: 100

Thanks Given: 22

Thanked 0 Times in 0 Posts

Hi don,
as required out from both zcat & the below code needs is as follows

OUTPUT FROM ZCAT filename.tar.gz

Code:

20130701/
0001750020745500000000000082010060000000000                                                USSDlike                                        0000000000000429496704040
5899136999995
000000000000002148063927402YD-MTSBAL               519132008926477227        1120130701074546201307020745460000000001121005060000000001
  405891369335696         MTSCHNAOC               2471                    00000000000004294967040405899136999995
000000000000003148064263403YD-MTSBAL               519131878925724626        1120130701074550201307020745500000000001134005060000000000
                          MTSCHNAOC                                       00000000000004294967040405899136999995

Output from the code

Code:

000000000000002155850114502YD-MTSBAL               519132008641092603        1120130714101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995
000000000000003155849253702YD-MTSBAL               519132009153053234        1120130714101512201307151015120000000001122005050820600001
  405891360052922         MTSCHNAOC               2471                    00000000000004294967040405899136999995

Moderator's Comments:

You have repeatedly been asked to use CODE tags. Without CODE tags, spacing gets lost in the HTML output. Given the context, the data shown here is probably incorrect.

Last edited by Don Cragun; 07-16-2013 at 04:41 AM.. Reason: Add CODE tags

siramitsharma

View Public Profile for siramitsharma

Find all posts by siramitsharma

07-16-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by siramitsharma

Hi don,
as required out from both zcat & the below code needs is as follows

OUTPUT FROM ZCAT filename.tar.gz

Code:

20130701/
0001750020745500000000000082010060000000000                                                USSDlike                                        0000000000000429496704040
5899136999995
000000000000002148063927402YD-MTSBAL               519132008926477227        1120130701074546201307020745460000000001121005060000000001
  405891369335696         MTSCHNAOC               2471                    00000000000004294967040405899136999995
000000000000003148064263403YD-MTSBAL               519131878925724626        1120130701074550201307020745500000000001134005060000000000
                          MTSCHNAOC                                       00000000000004294967040405899136999995

Output from the code

Code:

000000000000002155850114502YD-MTSBAL               519132008641092603        1120130714101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995
000000000000003155849253702YD-MTSBAL               519132009153053234        1120130714101512201307151015120000000001122005050820600001
  405891360052922         MTSCHNAOC               2471                    00000000000004294967040405899136999995

Moderator's Comments:

You have repeatedly been asked to use CODE tags. Without CODE tags, spacing gets lost in the HTML output. Given the context, the data shown here is probably incorrect.

OK. This is not what I asked for, but it is informative.

I take back everything I said before. I made the wild assumption that your filename filename.tar.gz followed normal UNIX and Linux conventions (i.e., it was a tar output file that had been compressed using gzip. But, the output from the zcat clearly shows that this is not a tar archive. So, exactly what command line was used to create filename.tar.gz?

And, no matter what created this file, the awk script you have been showing us would never produce the four lines of output you have shown above. Two of these lines seem to meet your criteria, although the text I marked in red (that you showed in bold) can't both be from input columns 84 and 85. (Although both lines do contain 07 in columns 84 and 85.) But, the other two lines don't contain the strings "02" or "07" anywhere that I can see.

So. Forget about the awk code. Tell us in English what criteria you used to decide that the four lines of output shown above are the output that you want?

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

07-16-2013

Registered User

100, 0

Join Date: Mar 2012

Last Activity: 26 July 2017, 2:30 AM EDT

Posts: 100

Thanks Given: 22

Thanked 0 Times in 0 Posts

command line used for creating filename.tar.gz is as follows:

Code:

tar -zcvf filename.tar.gz file*.*

OUTPUT FROM ZCAT filename.tar.gz

Code:

20130701/
0001750020745500000000000082010060000000000                                                USSDlike                                        0000000000000429496704040
5899136999995
000000000000002148063927402YD-MTSBAL               519132008926477227        1120130701074546201307020745460000000001121005060000000001
  405891369335696         MTSCHNAOC               2471                    00000000000004294967040405899136999995
000000000000003148064263403YD-MTSBAL               519131878925724626        1120130701074550201307020745500000000001134005060000000000
                          MTSCHNAOC                                       00000000000004294967040405899136999995

Above is the input

Now for required output, i have placed a check for printing those lines which only have 02 in 26th field of the input line & 07 in the 84th field with 2 as length.

Code:

 if(substr($0,26,2)=="02" && substr($0,84,2) == mon)

So in case it matches then i print the output in a file, count no of match & also the filename from where condition has matched,i.e,
if

Code:

filename.tar.gz

is having 10 files with file names say file1, file2... file10, then for every condition matched above should print something like this

Code:

000000000000002155850114502YD-MTSBAL               519132008641092603        1120130714101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995

000000000000002155850114502YD-MTS               519132008641092603        1120130715101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995

Since the above is the matched condition so match counter will be increased accordingly.

In the end i would need match & not match count for each file & for match condition output to be in a.txt. Content of countfile should look something like this

Code:

file1 match count, notmatch count
file2 match count, notmatch count
file3 match count, notmatch count
.
.
.
file10 match count, notmatch count

Content of

Code:

a.txt

should look as mentioned above

since i have space constraints so untar cannot be done

.
Hope this clarifies....

siramitsharma

View Public Profile for siramitsharma

Find all posts by siramitsharma

07-21-2013

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Since you sent me private mail asking me to help you on this again, I take it that you ignored my previous messages in this thread. The archive files produced by awk contain lots of NULL bytes; so by definition tar archive files are binary, not text, files. The shell and awk utilities are built to work with text files; not binary files, so there is no way to do what you're trying to do with a standard awk. (Some implementations may provide extensions to awk enabling it to work on binary files, but I do not have access to any such implementation. You might also be able to write a perl program to do this, but I am not fluent enough in perl to help you try this.)

It would be easy to extract the files from the archive and walk through the regular files in the extracted file hierarchy to get what you want. But, you say you don't have the room to do that.

The output format produced by tar -t and tar -tv is not standardized (and varies from implementation to implementation). It may be possible for you to use tar -t or tar -tv to get a list of regular files stored in the archive and then use tar -xO pathname in a loop with pathname set to a different regular file in the archive each time through the loop so you can feed the contents of that file through your awk script without saving a copy of the file on disk.

That will require reading the archive n+1 times if there are n regular files in the archive and even this only works if all of the regular files in the archive are text files. I encourage you to play with tar to see if you can make this work. (On some implementations, tar -tf archive will list directories in the archive with a trailing slash on the name and other files without a trailing slash. If the implementation of tar on your system does this; you can use the trailing slash to determine whether to skip that file or to extract it and feed it to your awk script.)

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

07-21-2013

Registered User

100, 0

Join Date: Mar 2012

Last Activity: 26 July 2017, 2:30 AM EDT

Posts: 100

Thanks Given: 22

Thanked 0 Times in 0 Posts

hey don,
thanks for the input, when i am in need i dont ignore other remarks. I went through your earlier comments & was finding ways to crack this on binary files & from where i learned that the archive i am searching is a ustar format. Anyways, i am working on your comments & will get back to you in case any further help is required.

siramitsharma

View Public Profile for siramitsharma

Find all posts by siramitsharma

Shell Programming and Scripting

awk not working for calculating no of lines with criteria

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need To Delete Lines Based On Search Criteria

Discussion started by: angshuman

2. Shell Programming and Scripting

Select lines from a file based on a criteria

Discussion started by: Tlcm sam

3. Shell Programming and Scripting

Calculating the epoch time from standard time using awk and calculating the duration

Discussion started by: ks_reddy

4. Shell Programming and Scripting

Merging Lines based on criteria

Discussion started by: losttouch

5. Shell Programming and Scripting

Print lines that match certain criteria

Discussion started by: gautig

6. Shell Programming and Scripting

[Solved] awk calculating between lines

Discussion started by: origamisven

7. Shell Programming and Scripting

Calculating 12th working day

Discussion started by: ami_smart

8. Shell Programming and Scripting

Replacing lines which match certain criteria

Discussion started by: legolad

9. Shell Programming and Scripting

Delete new lines based on search criteria

Discussion started by: jayarkay

10. Windows & DOS: Issues & Discussions

selection criteria in Access query not working

Discussion started by: hipockets