awk not working for calculating no of lines with criteria


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk not working for calculating no of lines with criteria
# 8  
Old 07-16-2013
OK. I have looked up the tar header format. The tar header contains lots of nul bytes, so any attempt to process a tar archive using the shell, awk, sed, or any other Linux or UNIX text processing utilities produces undefined results. The 1st 100 bytes in a tar header may contain the file's name (if it is <= 100 bytes long), may contain one or more directory names from the file's pathname (if they fit along with the file's name in 100 bytes), and may contain complete garbage left over from archiving a previous file. If the file's name is longer than 100 bytes, but the complete stored pathname is <= 155 bytes, the pathname (including the final component) may be saved in bytes 345-499 (with the 1st byte numbered 0). So your awk script seems to be looking for "02" and "07" at specific points in the middle of a pathname that ends with a newline character and that is somewhere between 86 and 100 bytes long. If these conditions are met in the 1st file archived in the tar file, you may get the results you want for that file; otherwise, all bets are off.

If you will show us what I asked for in my last message (or at least the 1st several lines of output from the tar command and the corresponding output you want to be produced for those lines), we may be able to help you parse the output of a tar archive listing command to get what you want. Otherwise, I don't see how we can help.
# 9  
Old 07-16-2013
Hi don,
as required out from both zcat & the below code needs is as follows

OUTPUT FROM ZCAT filename.tar.gz

Code:
20130701/
0001750020745500000000000082010060000000000                                                USSDlike                                        0000000000000429496704040
5899136999995
000000000000002148063927402YD-MTSBAL               519132008926477227        1120130701074546201307020745460000000001121005060000000001
  405891369335696         MTSCHNAOC               2471                    00000000000004294967040405899136999995
000000000000003148064263403YD-MTSBAL               519131878925724626        1120130701074550201307020745500000000001134005060000000000
                          MTSCHNAOC                                       00000000000004294967040405899136999995

Output from the code
Code:
000000000000002155850114502YD-MTSBAL               519132008641092603        1120130714101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995
000000000000003155849253702YD-MTSBAL               519132009153053234        1120130714101512201307151015120000000001122005050820600001
  405891360052922         MTSCHNAOC               2471                    00000000000004294967040405899136999995

Moderator's Comments:
Mod Comment You have repeatedly been asked to use CODE tags. Without CODE tags, spacing gets lost in the HTML output. Given the context, the data shown here is probably incorrect.

Last edited by Don Cragun; 07-16-2013 at 04:41 AM.. Reason: Add CODE tags
# 10  
Old 07-16-2013
Quote:
Originally Posted by siramitsharma
Hi don,
as required out from both zcat & the below code needs is as follows

OUTPUT FROM ZCAT filename.tar.gz

Code:
20130701/
0001750020745500000000000082010060000000000                                                USSDlike                                        0000000000000429496704040
5899136999995
000000000000002148063927402YD-MTSBAL               519132008926477227        1120130701074546201307020745460000000001121005060000000001
  405891369335696         MTSCHNAOC               2471                    00000000000004294967040405899136999995
000000000000003148064263403YD-MTSBAL               519131878925724626        1120130701074550201307020745500000000001134005060000000000
                          MTSCHNAOC                                       00000000000004294967040405899136999995

Output from the code
Code:
000000000000002155850114502YD-MTSBAL               519132008641092603        1120130714101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995
000000000000003155849253702YD-MTSBAL               519132009153053234        1120130714101512201307151015120000000001122005050820600001
  405891360052922         MTSCHNAOC               2471                    00000000000004294967040405899136999995

Moderator's Comments:
Mod Comment You have repeatedly been asked to use CODE tags. Without CODE tags, spacing gets lost in the HTML output. Given the context, the data shown here is probably incorrect.
OK. This is not what I asked for, but it is informative.

I take back everything I said before. I made the wild assumption that your filename filename.tar.gz followed normal UNIX and Linux conventions (i.e., it was a tar output file that had been compressed using gzip. But, the output from the zcat clearly shows that this is not a tar archive. So, exactly what command line was used to create filename.tar.gz?

And, no matter what created this file, the awk script you have been showing us would never produce the four lines of output you have shown above. Two of these lines seem to meet your criteria, although the text I marked in red (that you showed in bold) can't both be from input columns 84 and 85. (Although both lines do contain 07 in columns 84 and 85.) But, the other two lines don't contain the strings "02" or "07" anywhere that I can see.

So. Forget about the awk code. Tell us in English what criteria you used to decide that the four lines of output shown above are the output that you want?
# 11  
Old 07-16-2013
command line used for creating filename.tar.gz is as follows:
Code:
tar -zcvf filename.tar.gz file*.*

OUTPUT FROM ZCAT filename.tar.gz
Code:
20130701/
0001750020745500000000000082010060000000000                                                USSDlike                                        0000000000000429496704040
5899136999995
000000000000002148063927402YD-MTSBAL               519132008926477227        1120130701074546201307020745460000000001121005060000000001
  405891369335696         MTSCHNAOC               2471                    00000000000004294967040405899136999995
000000000000003148064263403YD-MTSBAL               519131878925724626        1120130701074550201307020745500000000001134005060000000000
                          MTSCHNAOC                                       00000000000004294967040405899136999995

Above is the input

Now for required output, i have placed a check for printing those lines which only have 02 in 26th field of the input line & 07 in the 84th field with 2 as length.
Code:
 if(substr($0,26,2)=="02" && substr($0,84,2) == mon)

So in case it matches then i print the output in a file, count no of match & also the filename from where condition has matched,i.e,
if
Code:
filename.tar.gz

is having 10 files with file names say file1, file2... file10, then for every condition matched above should print something like this
Code:
000000000000002155850114502YD-MTSBAL               519132008641092603        1120130714101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995

000000000000002155850114502YD-MTS               519132008641092603        1120130715101521201307151015210000000001038005060000000001
  405891743536224         MTSCHNAOC               2458                    00000000000004294967040405899136999995

Since the above is the matched condition so match counter will be increased accordingly.

In the end i would need match & not match count for each file & for match condition output to be in a.txt. Content of countfile should look something like this
Code:
file1 match count, notmatch count
file2 match count, notmatch count
file3 match count, notmatch count
.
.
.
file10 match count, notmatch count

Content of
Code:
a.txt

should look as mentioned above

since i have space constraints so untar cannot be done Smilie.
Hope this clarifies....
# 12  
Old 07-21-2013
Since you sent me private mail asking me to help you on this again, I take it that you ignored my previous messages in this thread. The archive files produced by awk contain lots of NULL bytes; so by definition tar archive files are binary, not text, files. The shell and awk utilities are built to work with text files; not binary files, so there is no way to do what you're trying to do with a standard awk. (Some implementations may provide extensions to awk enabling it to work on binary files, but I do not have access to any such implementation. You might also be able to write a perl program to do this, but I am not fluent enough in perl to help you try this.)

It would be easy to extract the files from the archive and walk through the regular files in the extracted file hierarchy to get what you want. But, you say you don't have the room to do that.

The output format produced by tar -t and tar -tv is not standardized (and varies from implementation to implementation). It may be possible for you to use tar -t or tar -tv to get a list of regular files stored in the archive and then use tar -xO pathname in a loop with pathname set to a different regular file in the archive each time through the loop so you can feed the contents of that file through your awk script without saving a copy of the file on disk.

That will require reading the archive n+1 times if there are n regular files in the archive and even this only works if all of the regular files in the archive are text files. I encourage you to play with tar to see if you can make this work. (On some implementations, tar -tf archive will list directories in the archive with a trailing slash on the name and other files without a trailing slash. If the implementation of tar on your system does this; you can use the trailing slash to determine whether to skip that file or to extract it and feed it to your awk script.)
This User Gave Thanks to Don Cragun For This Post:
# 13  
Old 07-21-2013
hey don,
thanks for the input, when i am in need i dont ignore other remarks. I went through your earlier comments & was finding ways to crack this on binary files & from where i learned that the archive i am searching is a ustar format. Anyways, i am working on your comments & will get back to you in case any further help is required.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need To Delete Lines Based On Search Criteria

Hi All, I have following input file. I wish to retain those lines which match multiple search criteria. The search criteria is stored in a variable seperated from each other by comma(,). SEARCH_CRITERIA = "REJECT, DUPLICATE" Input File: ERROR,MYFILE_20130214_11387,9,37.75... (3 Replies)
Discussion started by: angshuman
3 Replies

2. Shell Programming and Scripting

Select lines from a file based on a criteria

Hi I need to select lines from a txt file, I have got a line starting with ZMIO:MSISDN= and after a few line I have another line starting with 'MOBILE STATION ISDN NUMBER' and another one starting with 'VLR-ADDRESS' I need to copy these three lines as three different columns in a separate... (3 Replies)
Discussion started by: Tlcm sam
3 Replies

3. Shell Programming and Scripting

Calculating the epoch time from standard time using awk and calculating the duration

Hi All, I have the following time stamp data in 2 columns Date TimeStamp(also with milliseconds) 05/23/2012 08:30:11.250 05/23/2012 08:30:15.500 05/23/2012 08:31.15.500 . . etc From this data I need the following output. 0.00( row1-row1 in seconds) 04.25( row2-row1 in... (5 Replies)
Discussion started by: ks_reddy
5 Replies

4. Shell Programming and Scripting

Merging Lines based on criteria

Hello, Need help with following scenario. A file contains following text: {beginning of file} New: This is a new record and it is not on same line. Since I have lost touch with script take this challenge and bring all this in one line. New: Hello losttouch. You seem to be struggling... (4 Replies)
Discussion started by: losttouch
4 Replies

5. Shell Programming and Scripting

Print lines that match certain criteria

Hi all I have a text file with the following format: id col1 col2 col3 col4 col5 col6 col7 ... row1 0 0 0 0 0 0 0 row2 0 0 0 0 0 0 0 row3 0 0 0 0 0 0.2 0 row4 0 0 0 0 0 0 0 row5 0 0 0 0 0 0 0 row6 0 0 0 0.1 0 0 0 row7 0 0 0 0 0 0 0 row8 0 0 0 0 0 0 0 row9 0 0 0 0 0 0 0 ... The file... (2 Replies)
Discussion started by: gautig
2 Replies

6. Shell Programming and Scripting

[Solved] awk calculating between lines

Hey guys, maybe you can help me with this... I want to read input.dat line by line, while doing a simple calculation between the second column value of the current line and the second column value of the next line (like a difference). input is something like this: 0 3.945757 1 ... (1 Reply)
Discussion started by: origamisven
1 Replies

7. Shell Programming and Scripting

Calculating 12th working day

I have a business requirement in my project where I need to calculate the 12th working day of every month. Can any please tell me the solution to my problem. Thanks in advance (7 Replies)
Discussion started by: ami_smart
7 Replies

8. Shell Programming and Scripting

Replacing lines which match certain criteria

Hi, I have code which is like this <TABLE name="UsageDetail_24> <ROW> <Date24><!]></Date24> <Time24><!]></Time24> <Destination24><!]></Destination24> <Rate24><!]></Rate24> <Duration24><!]></Duration24> <Cost24><!]></Cost24> <Allowance24><!]></Allowance24> </ROW> <ROW>... (3 Replies)
Discussion started by: legolad
3 Replies

9. Shell Programming and Scripting

Delete new lines based on search criteria

Hi all! A bit of background: I am trying to create a script that formats SQL statements. I have gotten so far as to add new lines based on certain match criteria like commas, keywords etc. In the process, I end up adding newlines where I don't want. For example: substr(colName, 1, 10)... (3 Replies)
Discussion started by: jayarkay
3 Replies

10. Windows & DOS: Issues & Discussions

selection criteria in Access query not working

Attached are views of the components of a dummy Access database. The database represents an example of the problem that has reared its ugly head. The query example is a simple "Selection" query, which, after getting it to work, will become an "Append" query. The selected data will be appended... (1 Reply)
Discussion started by: hipockets
1 Replies
Login or Register to Ask a Question