Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Getting the lines with nth column non-null Post 302966145 by Nenad on Monday 8th of February 2016 09:29:22 PM
Old 02-08-2016
Getting the lines with nth column non-null

Hi,

I have a huge list of archives (.gz). Each archive is about 40MB. A file is generated every minute so if I want to analyze the data for 1 hour I get already 60 files for example.
These are text files, ';' separated, each line having about 300 fields (columns).

What I need to do is to extract only the lines where a given field (e..g 252) is non-null. Then all this lines I can export in a txt file and analyze the data.

I tried to use cut command, but this one does not allow to put any condition on the field value. So I took the awk command. But this one is limited to 99 fields and I need field 252 for example.
Finally I am using both commands in pipe.
This is what I am using:

Code:
gzip -c -d *.gz | cut -d";" -f2,252 | awk -F';' '$2!=""'

Now this gives me only 2 fields, but not entire line. Even so it is ok as f2 is the ID of the line so I can get later the entire line.

My question is:
1. how to make the above command work faster?
2. how to get directly the entire lines for which a given field is non-null? I might need to do the same for other fields too. And I would like to get all the lines in one shot from several .gz files.

I must say I am pretty new in bash...

Thank you!
Nenad.

Last edited by Don Cragun; 02-08-2016 at 10:59 PM.. Reason: Fix existing CODE tags; add missing CODE and ICODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to check Null values in a file column by column if columns are Not NULLs

Hi All, I have a table with 10 columns. Some columns(2nd,4th,5th,7th,8th and 10th) are Not Null columns. I'll get a tab-delimited file and want to check col by col and generate seperate error code for each col eg:102 if 2nd col value is NULL and 104 if 4th col value is NULL so on... I am a... (7 Replies)
Discussion started by: Mandab
7 Replies

2. Shell Programming and Scripting

Editing 1st or nth column

Hi, I have a file whick is pipe delimited : 100| alpha| tabgo|watch| |||| 444444 | alpha| tabgo|watch| |||| 444444 | sweden |tabgo|watch| |||| 444444 | US| tabgo|watch| |||| 444444 100| factory| tabgo|watch| |||| 444444 | ABC| tabgo|watch| |||| 444444 | launch| tabgo|watch| ||||... (4 Replies)
Discussion started by: darshanw
4 Replies

3. Shell Programming and Scripting

get 3rd column of nth line

hi; i have a file.txt and its 9th, 10th and 11th line lines are: RbsLocalCell=S2C1 maxPortIP 4 (this is 9th line) RbsLocalCell=S3C1 maxPortIP 4 (this is 10th line) RbsLocalCell=S1C1 ... (11 Replies)
Discussion started by: gc_sw
11 Replies

4. Shell Programming and Scripting

Finding Nth Column

Please help me how can I display every nth field present in a "|" delimited file. Ex: If a have a file with data as a|b|c|d|e|f|g|h|k|l|m|n I want to display every 3rd feild which means the output should be c f k n Please help me. (1 Reply)
Discussion started by: ngkumar
1 Replies

5. Shell Programming and Scripting

Using AWK to find top Nth values in Nth column

I have an awk script to find the maximum value of the 2nd column of a 2 column datafile, but I need to find the top 5 maximum values of the 2nd column. Here is the script that works for the maximum value. awk 'BEGIN { subjectmax=$1 ; max=0} $2 >= max {subjectmax=$1 ; max=$2} END {print... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

6. Shell Programming and Scripting

Calculating average for every Nth line in the Nth column

Is there an awk script that can easily perform the following operation? I have a data file that is in the format of 1944-12,5.6 1945-01,9.8 1945-02,6.7 1945-03,9.3 1945-04,5.9 1945-05,0.7 1945-06,0.0 1945-07,0.0 1945-08,0.0 1945-09,0.0 1945-10,0.2 1945-11,10.5 1945-12,22.3... (3 Replies)
Discussion started by: ncwxpanther
3 Replies

7. Shell Programming and Scripting

Find for line with not null values at nth place in pipe delimited file

Hi, I am trying to find the lines in a pipe delimited file where 11th column has not null values. Any help is appreciated. Need help asap please. thanks in advance. (3 Replies)
Discussion started by: manikms
3 Replies

8. Shell Programming and Scripting

How to remove mth and nth column from a file?

Hi, i need to remove mth and nth column from a csv file. here m and n is not a specific number. it is a variable ex. m=2 n=5 now i need to remove the 2nd and 5th line.. Please help how to do that. Thanks!!! (18 Replies)
Discussion started by: zaq1xsw2
18 Replies

9. Shell Programming and Scripting

Break Column nth in a CSV file into two

Hi Guys, Need help with logic to break Column nth in a CSV file into two for e.g Refer below the second column as the nth column "abcd","","type/beta-version" need output in a following format "abcd","/place/asia/india/mumbai","/product/sw/tomcat","type/beta-version" ... (5 Replies)
Discussion started by: awk-admirer
5 Replies

10. Shell Programming and Scripting

Taking nth column and putting its value in n+1 column using awk

Hello Members, Need your expert opinion how to tackle below. I have an input file that looks like below: USS|AWCC|AFGAW|93|70 USSAA|Roshan TDCA|AFGTD|93|72,79 ALB|Vodafone|ALBVF|355|69 ALGEE|Wataniya (Nedjma)|DZAWT|213|50,550 I like output file in below format: ... (7 Replies)
Discussion started by: umarsatti
7 Replies
CUT(1)							    BSD General Commands Manual 						    CUT(1)

NAME
cut -- cut out selected portions of each line of a file SYNOPSIS
cut -b list [-n] [file ...] cut -c list [file ...] cut -f list [-w | -d delim] [-s] [file ...] DESCRIPTION
The cut utility cuts out selected portions of each line (as specified by list) from each file and writes them to the standard output. If no file arguments are specified, or a file argument is a single dash ('-'), cut reads from the standard input. The items specified by list can be in terms of column position or in terms of fields delimited by a special character. Column and field numbering start from 1. The list option argument is a comma or whitespace separated set of increasing numbers and/or number ranges. Number ranges consist of a num- ber, a dash ('-'), and a second number and select the columns or fields from the first number to the second, inclusive. Numbers or number ranges may be preceded by a dash, which selects all columns or fields from 1 to the last number. Numbers or number ranges may be followed by a dash, which selects all columns or fields from the last number to the end of the line. Numbers and number ranges may be repeated, overlap- ping, and in any order. It is not an error to select columns or fields not present in the input line. The options are as follows: -b list The list specifies byte positions. -c list The list specifies character positions. -d delim Use delim as the field delimiter character instead of the tab character. -f list The list specifies fields, separated in the input by the field delimiter character (see the -d option). Output fields are separated by a single occurrence of the field delimiter character. -n Do not split multi-byte characters. Characters will only be output if at least one byte is selected, and, after a prefix of zero or more unselected bytes, the rest of the bytes that form the character are selected. -s Suppress lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified. -w Use whitespace (spaces and tabs) as the delimiter. Consecutive spaces and tabs count as one single field separator. ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of cut as described in environ(7). EXIT STATUS
The cut utility exits 0 on success, and >0 if an error occurs. EXAMPLES
Extract users' login names and shells from the system passwd(5) file as ``name:shell'' pairs: cut -d : -f 1,7 /etc/passwd Show the names and login times of the currently logged in users: who | cut -c 1-16,26-38 SEE ALSO
colrm(1), paste(1) STANDARDS
The cut utility conforms to IEEE Std 1003.2-1992 (``POSIX.2''). HISTORY
A cut command appeared in AT&T System III UNIX. BSD
August 8, 2012 BSD
All times are GMT -4. The time now is 01:53 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy