Getting the lines with nth column non-null


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Getting the lines with nth column non-null
# 1  
Old 02-08-2016
Getting the lines with nth column non-null

Hi,

I have a huge list of archives (.gz). Each archive is about 40MB. A file is generated every minute so if I want to analyze the data for 1 hour I get already 60 files for example.
These are text files, ';' separated, each line having about 300 fields (columns).

What I need to do is to extract only the lines where a given field (e..g 252) is non-null. Then all this lines I can export in a txt file and analyze the data.

I tried to use cut command, but this one does not allow to put any condition on the field value. So I took the awk command. But this one is limited to 99 fields and I need field 252 for example.
Finally I am using both commands in pipe.
This is what I am using:

Code:
gzip -c -d *.gz | cut -d";" -f2,252 | awk -F';' '$2!=""'

Now this gives me only 2 fields, but not entire line. Even so it is ok as f2 is the ID of the line so I can get later the entire line.

My question is:
1. how to make the above command work faster?
2. how to get directly the entire lines for which a given field is non-null? I might need to do the same for other fields too. And I would like to get all the lines in one shot from several .gz files.

I must say I am pretty new in bash...

Thank you!
Nenad.

Last edited by Don Cragun; 02-08-2016 at 10:59 PM.. Reason: Fix existing CODE tags; add missing CODE and ICODE tags.
# 2  
Old 02-08-2016
What operating system are you using?

What version of awk are you using?

What is the maximum number of bytes in a line (including the terminating <newline> character) in your unzipped files?

What is the output from the command:
Code:
getconf LINE_MAX

on your system?
# 3  
Old 02-09-2016
If you have zcat and Perl, please try the following:

Code:
#!/usr/bin/env perl

use strict;

for (glob '*.gz') {
    open my $fh, sprintf("zcat %s |", $_ ) or die $!;

    {
        local $_;
        while (<$fh>) {
            print if (split ";")[251];
        }
    }
    close $fh or die $!;
}

Save as search.pl
Run as perl search.pl > result.txt

Last edited by Aia; 02-09-2016 at 02:23 AM.. Reason: Added separated by ";"
# 4  
Old 02-09-2016
Hi Don,

Thank you for editing and updating my post.
The command you sent gives this:

Code:
bash-3.2$ getconf LINE_MAX
2048

I am not sure what is the max number of bytes of one line.
How can I find this?

If I try to use
Code:
awk

to get field 252 I get this:
Code:
awk: trying to access field 252

If I try to get field number 99:
Code:
bash-3.2$ awk -F';' '{print $99}' myFile
awk: record `9;02bbba84;311:480:f...' has too many fields
 record number 1

.

The OS version is
Code:
SunOS 5.10

.

I tried to get the
Code:
awk

version too but couldn't:
Code:
bash-3.2$ awk -W version
awk: syntax error near line 1
awk: bailing out near line 1
bash-3.2$

Nenad
# 5  
Old 02-09-2016
Quote:
Originally Posted by Nenad
Hi Don,
Thank you for editing and updating my post.
The command you sent gives this:
Code:
bash-3.2$ getconf LINE_MAX
2048

I am not sure what is the max number of bytes of one line.
How can I find this?

If I try to use
Code:
awk

to get field 252 I get this:
Code:
awk: trying to access field 252

If I try to get field number 99:
Code:
bash-3.2$ awk -F';' '{print $99}' myFile
awk: record `9;02bbba84;311:480:f...' has too many fields
 record number 1

.
The OS version is
Code:
SunOS 5.10

.
I tried to get the
Code:
awk

version too but couldn't:
Code:
bash-3.2$ awk -W version
awk: syntax error near line 1
awk: bailing out near line 1
bash-3.2$

Nenad
Hello Nenad,

On a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk , or nawk else you will see errors like as follows.
Quote:
awk: syntax error near line 1
awk: bailing out near line 1
Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 02-09-2016
Aia I tried the perl script but it gives the following error:
Code:
bash-3.2$ perl searchField.pl >resPerl.txt
myFile.gz.Z: No such file or directory
Died at searchField.pl line 14.
bash-3.2$

# 7  
Old 02-09-2016
See what happens if you try the commands:
Code:
gzip -c -d *.gz | /usr/xpg4/bin/awk -F';' '$252 != ""'

and:
Code:
gzip -c -d *.gz | nawk -F';' '$252 != ""'

If neither of them work, check to see if the GNU utilities have been installed on your system. If they have been installed, try gawk (from whatever directory they were installed into) instead of /usr/xpg4/bin/awk and nawk.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Taking nth column and putting its value in n+1 column using awk

Hello Members, Need your expert opinion how to tackle below. I have an input file that looks like below: USS|AWCC|AFGAW|93|70 USSAA|Roshan TDCA|AFGTD|93|72,79 ALB|Vodafone|ALBVF|355|69 ALGEE|Wataniya (Nedjma)|DZAWT|213|50,550 I like output file in below format: ... (7 Replies)
Discussion started by: umarsatti
7 Replies

2. Shell Programming and Scripting

Break Column nth in a CSV file into two

Hi Guys, Need help with logic to break Column nth in a CSV file into two for e.g Refer below the second column as the nth column "abcd","","type/beta-version" need output in a following format "abcd","/place/asia/india/mumbai","/product/sw/tomcat","type/beta-version" ... (5 Replies)
Discussion started by: awk-admirer
5 Replies

3. Shell Programming and Scripting

How to remove mth and nth column from a file?

Hi, i need to remove mth and nth column from a csv file. here m and n is not a specific number. it is a variable ex. m=2 n=5 now i need to remove the 2nd and 5th line.. Please help how to do that. Thanks!!! (18 Replies)
Discussion started by: zaq1xsw2
18 Replies

4. Shell Programming and Scripting

Find for line with not null values at nth place in pipe delimited file

Hi, I am trying to find the lines in a pipe delimited file where 11th column has not null values. Any help is appreciated. Need help asap please. thanks in advance. (3 Replies)
Discussion started by: manikms
3 Replies

5. Shell Programming and Scripting

Calculating average for every Nth line in the Nth column

Is there an awk script that can easily perform the following operation? I have a data file that is in the format of 1944-12,5.6 1945-01,9.8 1945-02,6.7 1945-03,9.3 1945-04,5.9 1945-05,0.7 1945-06,0.0 1945-07,0.0 1945-08,0.0 1945-09,0.0 1945-10,0.2 1945-11,10.5 1945-12,22.3... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

6. Shell Programming and Scripting

Using AWK to find top Nth values in Nth column

I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column. Here is the script that works for the maximum value. awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

7. Shell Programming and Scripting

Finding Nth Column

Please help me how can I display every nth field present in a "|" delimited file. Ex: If a have a file with data as a|b|c|d|e|f|g|h|k|l|m|n I want to display every 3rd feild which means the output should be c f k n Please help me. (1 Reply)
Discussion started by: ngkumar
1 Replies

8. Shell Programming and Scripting

get 3rd column of nth line

hi; i have a file.txt and its 9th, 10th and 11th line lines are: RbsLocalCell=S2C1 maxPortIP 4 (this is 9th line) RbsLocalCell=S3C1 maxPortIP 4 (this is 10th line) RbsLocalCell=S1C1 ... (11 Replies)
Discussion started by: gc_sw
11 Replies

9. Shell Programming and Scripting

Editing 1st or nth column

Hi, I have a file whick is pipe delimited : 100| alpha| tabgo|watch| |||| 444444 | alpha| tabgo|watch| |||| 444444 | sweden |tabgo|watch| |||| 444444 | US| tabgo|watch| |||| 444444 100| factory| tabgo|watch| |||| 444444 | ABC| tabgo|watch| |||| 444444 | launch| tabgo|watch| ||||... (4 Replies)
Discussion started by: darshanw
4 Replies

10. Shell Programming and Scripting

How to check Null values in a file column by column if columns are Not NULLs

Hi All, I have a table with 10 columns. Some columns(2nd,4th,5th,7th,8th and 10th) are Not Null columns. I'll get a tab-delimited file and want to check col by col and generate seperate error code for each col eg:102 if 2nd col value is NULL and 104 if 4th col value is NULL so on... I am a... (7 Replies)
Discussion started by: Mandab
7 Replies
Login or Register to Ask a Question