Checking the Column values in a file.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Checking the Column values in a file.
# 1  
Old 08-02-2011
Checking the Column values in a file.

Hi All,

I have a file that has ~2.9Millions lines with 32 columns respectively.
The columns numbers 23,27 are the primary Keys for the file.
The fields are delimited by TAB.

I need to check the condition If
Column number: 20 is NOT NULL
Column number: 21 is not 0
Column number: 22 is not 0

If the above is TRUE (Satisfied) Then Write the entire line to an OUTFILE..
Else
Ignore the line and proceed further.

In order to check the above, I thought the following:

Since I have the primary keys for the file i.e.23,27 , I would load the file into a hash and then check the conditions i.e.
Column number: 20 is NOT NULL
Column number: 21 is not 0
Column number: 22 is not 0

i.e.
Code:
$FIELD_SEPARATOR = '    ';
$FIELD_NUMBER_LIST = $[                                   -- (23,27)
my @primary_set= split(/,/, $field_number_list);
my $line = '';
my %hash;

open(FILE, "<$old_file") or die "Can't open file $old_file";
my @file = <FILE>;
close(FILE);

my @vals;
foreach (@file) {
$line = $_;
@vals = split(/$field_separator/, $line);

   my $key;
   my $primary = "";
   foreach $key (@primary_set) {
      $primary = $primary.$vals[$key];
   }

   $hash{$primary} = $line;
}

while (<FILE>) {
$line = $_;
@vals = split(/$field_separator/, $line);
    my $key;
    my $primary = "";
    foreach $key (@primary_set) {
        $primary = $primary.$vals[$key];
    }
 
<<<< Here I need to check the condition for the Columns i.e.
Column number: 20 is NOT NULL
 Column number: 21 is not 0
 Column number: 22 is not 0
>>>>>>>>>>>>>

Could someone please help me out how to check the condition for a particular column. and how do I check for NOT NULL.

Really appreciate your thoughts.

Last edited by pludi; 08-02-2011 at 05:47 PM..
# 2  
Old 08-02-2011
Seems more an awkish problem! There is no NULL outside RDBMS, there is only a zero pointer to a string object or character array (in C/C++), a good string pointer to a null 0x00 character (empty string "") and blank (empty string or either space characters only or white space only -- definitions vary). String compares vary on whether '"" = " " and such. Try some and see. The safest is to make a max length blanks and say: > " " Thus, it has to have a higher character than ' ' somewhere. Similarly, any 9 digit number in a string not zero is: > "000000000" In RDBMS settings, this approach also should exclude NULLs, as NULL is not supposed to ever admit anything true, but some play fast and loose with blank and empty and NULL, so just pick a test that does not care, like: >= " !" (! is the low char above space in ASCII/UTF-8).
# 3  
Old 08-02-2011
See if this works for you:
Code:
#!/usr/bin/ksh
while read mLine; do
  mFld20=$(echo ${mLine} | cut -d'|' -f20)
  mFld21=$(echo ${mLine} | cut -d'|' -f21)
  mFld22=$(echo ${mLine} | cut -d'|' -f22)
  if [[ "${mFld20}" != "" && "${mFld21}" != "0" && "${mFld21}" != "0" ]]; then
    echo ${mLine}
  fi
done < Input_File

This User Gave Thanks to Shell_Life For This Post:
# 4  
Old 08-02-2011
For big files, the fork and exec cost per line can get ugly. If you want to try shell, read to many variables with $IFS including the separator. If the spaces and zeros are well behaved, sed might be a big winner, with a very busy regex.

The real slick trick is to map it into RDBMS as a file table and use SQL. There are free JDBC, ODBC and unixODBC tools for this, and MS Access, of course, short of getting a real RDBMS running.
# 5  
Old 08-03-2011
Hi,
Thanks a lot for your replies.
Quote:
Seems more an awkish problem! There is no NULL outside RDBMS, there is only a zero pointer to a string object or character array (in C/C++), a good string pointer to a null 0x00 character (empty string "") and blank (empty string or either space characters only or white space only -- definitions vary). String compares vary on whether '"" = " " and such. Try some and see. The safest is to make a max length blanks and say: > " " Thus, it has to have a higher character than ' ' somewhere. Similarly, any 9 digit number in a string not zero is: > "000000000" In RDBMS settings, this approach also should exclude NULLs, as NULL is not supposed to ever admit anything true, but some play fast and loose with blank and empty and NULL, so just pick a test that does not care, like: >= " !" (! is the low char above space in ASCII/UTF-8).
Yeah you are correct there is no NULL outside the RDBMS. I think the match should be done with respect to the space ("").

Hi Shell_Life,
The script was taking very long to complete. Using Perl regx would be more faster.
Could you please suggest me whether this can be done using PERL.

Quote:
For big files, the fork and exec cost per line can get ugly. If you want to try shell, read to many variables with $IFS including the separator. If the spaces and zeros are well behaved, sed might be a big winner, with a very busy regex.

The real slick trick is to map it into RDBMS as a file table and use SQL. There are free JDBC, ODBC and unixODBC tools for this, and MS Access, of course, short of getting a real RDBMS running.
I am not sure how this can be done using Perl . I thought shell would be slower.
I cannot use the slick trick because i cannot connect to the RDBMS.

Could anyone please suggest me how this can be done using Perl. (may be using regx).
# 6  
Old 08-03-2011
How abt this?

Code:
 
awk -F"\t" '$20!=""&&$21>0&&$22>0' input_file

My only concern is whether awk handles that big file or not!!

You might need to do a loop for bunch of lines ( like around 5 lakh rows) and repeat it for next five lakhs etc.

Last edited by panyam; 08-03-2011 at 01:18 PM.. Reason: changed the parameter names
This User Gave Thanks to panyam For This Post:
# 7  
Old 08-03-2011
In PERL, use split to make line into array of strings:
Delimited text files

The process does not care about file size, generally, being a line at a time thing. Some old machines do not properly read all of files over 2GB without an open64(), but is some cases we fixed this by letting the shell open the file with '<' !

However, to be practical, speed is good. Shell loops are usually not as fast as PERL, and both can be hobbled by shell-out (`cmd`) costs, which in any case may lose out ot JAVA, and none can touch C, especially with a mmap64() of the input to avoid copying cost.

Last edited by DGPickett; 08-03-2011 at 01:20 PM..
This User Gave Thanks to DGPickett For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace column values from other file

I have one file as it has the following format File1 S No Site IP Address 1 Australia 192.168.0.1/26 2 Australia 192.168.0.2/26 3 Australia 192.168.0.3/26 I need awk/sed command to replace the column2 value ( under Site) with some other... (8 Replies)
Discussion started by: samaritan
8 Replies

2. Shell Programming and Scripting

Remove the values from a certain column without deleting the Column name in a .CSV file

(14 Replies)
Discussion started by: dhruuv369
14 Replies

3. Shell Programming and Scripting

Adding column values in a file

Hi, I am having a file in the following format. for aaaa 1111 1234 2222 3434 for bbbb 1111 3434.343 2222 2343 for cccc 3333 2343.343 4444 89000 for dddd 1111 5678.343 2222 890.3 aaaa 2343.343 bbbb 34343.343 (5 Replies)
Discussion started by: jpkumar10
5 Replies

4. UNIX for Dummies Questions & Answers

shift values in one column as header for values in another column

Hi Gurus, I have a tab separated text file with two columns. I would like to make the first column values as headings for the second column values. Ex. >value1 subjects >value2 priorities >value3 requirements ...etc and I want to have a file >value1 subjects >value2 priorities... (4 Replies)
Discussion started by: Unilearn
4 Replies

5. UNIX for Dummies Questions & Answers

replace a column with values from another file

Dear all, I have a file1.pdb in pdb format and a dat file2 containing values, corresponding to the atoms in the pdb file. these values (file2.dat) need to be in the column instead of the 0.00 (file1) values for each atom in file1.pdb .(the red values must be replaced by the blue ones,in order)... (11 Replies)
Discussion started by: chen.xiao.po
11 Replies

6. UNIX for Dummies Questions & Answers

Replace values in a specified column of a file

Hello, I have a file with four columns and I would like to replace values in the second column only. An arbitrary example is: 100 A 105 B 200 B 205 C 300 C 305 D 400 D 405 E 500 E 505 F I need to replace the second column as shown below: ... (4 Replies)
Discussion started by: Gussifinknottle
4 Replies

7. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

8. UNIX for Dummies Questions & Answers

Create file with column values

Hi, I have a data file looks like the following ID STARTDATE ENDDATE 101 20090520 20090521 102 20090521 20090522 103 20090522 20090523 104 20090523 20090524 105 20090524 20090525 106 20090525 20090526 107 ... (3 Replies)
Discussion started by: naveen.kuppili
3 Replies

9. Shell Programming and Scripting

How to check Null values in a file column by column if columns are Not NULLs

Hi All, I have a table with 10 columns. Some columns(2nd,4th,5th,7th,8th and 10th) are Not Null columns. I'll get a tab-delimited file and want to check col by col and generate seperate error code for each col eg:102 if 2nd col value is NULL and 104 if 4th col value is NULL so on... I am a... (7 Replies)
Discussion started by: Mandab
7 Replies
Login or Register to Ask a Question