I have a file that has ~2.9Millions lines with 32 columns respectively.
The columns numbers 23,27 are the primary Keys for the file.
The fields are delimited by TAB.
I need to check the condition If
Column number: 20 is NOT NULL
Column number: 21 is not 0
Column number: 22 is not 0
If the above is TRUE (Satisfied) Then Write the entire line to an OUTFILE..
Else
Ignore the line and proceed further.
In order to check the above, I thought the following:
Since I have the primary keys for the file i.e.23,27 , I would load the file into a hash and then check the conditions i.e.
Column number: 20 is NOT NULL
Column number: 21 is not 0
Column number: 22 is not 0
i.e.
Could someone please help me out how to check the condition for a particular column. and how do I check for NOT NULL.
Seems more an awkish problem! There is no NULL outside RDBMS, there is only a zero pointer to a string object or character array (in C/C++), a good string pointer to a null 0x00 character (empty string "") and blank (empty string or either space characters only or white space only -- definitions vary). String compares vary on whether '"" = " " and such. Try some and see. The safest is to make a max length blanks and say: > " " Thus, it has to have a higher character than ' ' somewhere. Similarly, any 9 digit number in a string not zero is: > "000000000" In RDBMS settings, this approach also should exclude NULLs, as NULL is not supposed to ever admit anything true, but some play fast and loose with blank and empty and NULL, so just pick a test that does not care, like: >= " !" (! is the low char above space in ASCII/UTF-8).
For big files, the fork and exec cost per line can get ugly. If you want to try shell, read to many variables with $IFS including the separator. If the spaces and zeros are well behaved, sed might be a big winner, with a very busy regex.
The real slick trick is to map it into RDBMS as a file table and use SQL. There are free JDBC, ODBC and unixODBC tools for this, and MS Access, of course, short of getting a real RDBMS running.
Seems more an awkish problem! There is no NULL outside RDBMS, there is only a zero pointer to a string object or character array (in C/C++), a good string pointer to a null 0x00 character (empty string "") and blank (empty string or either space characters only or white space only -- definitions vary). String compares vary on whether '"" = " " and such. Try some and see. The safest is to make a max length blanks and say: > " " Thus, it has to have a higher character than ' ' somewhere. Similarly, any 9 digit number in a string not zero is: > "000000000" In RDBMS settings, this approach also should exclude NULLs, as NULL is not supposed to ever admit anything true, but some play fast and loose with blank and empty and NULL, so just pick a test that does not care, like: >= " !" (! is the low char above space in ASCII/UTF-8).
Yeah you are correct there is no NULL outside the RDBMS. I think the match should be done with respect to the space ("").
Hi Shell_Life,
The script was taking very long to complete. Using Perl regx would be more faster.
Could you please suggest me whether this can be done using PERL.
Quote:
For big files, the fork and exec cost per line can get ugly. If you want to try shell, read to many variables with $IFS including the separator. If the spaces and zeros are well behaved, sed might be a big winner, with a very busy regex.
The real slick trick is to map it into RDBMS as a file table and use SQL. There are free JDBC, ODBC and unixODBC tools for this, and MS Access, of course, short of getting a real RDBMS running.
I am not sure how this can be done using Perl . I thought shell would be slower.
I cannot use the slick trick because i cannot connect to the RDBMS.
Could anyone please suggest me how this can be done using Perl. (may be using regx).
The process does not care about file size, generally, being a line at a time thing. Some old machines do not properly read all of files over 2GB without an open64(), but is some cases we fixed this by letting the shell open the file with '<' !
However, to be practical, speed is good. Shell loops are usually not as fast as PERL, and both can be hobbled by shell-out (`cmd`) costs, which in any case may lose out ot JAVA, and none can touch C, especially with a mmap64() of the input to avoid copying cost.
Last edited by DGPickett; 08-03-2011 at 01:20 PM..
I have one file as it has the following format
File1
S No Site IP Address
1 Australia 192.168.0.1/26
2 Australia 192.168.0.2/26
3 Australia 192.168.0.3/26
I need awk/sed command to replace the column2 value ( under Site) with some other... (8 Replies)
Hi,
I am having a file in the following format.
for aaaa
1111 1234
2222 3434
for bbbb
1111 3434.343
2222 2343
for cccc
3333 2343.343
4444 89000
for dddd
1111 5678.343
2222 890.3
aaaa 2343.343
bbbb 34343.343 (5 Replies)
Hi Gurus,
I have a tab separated text file with two columns. I would like to make the first column values as headings for the second column values.
Ex.
>value1 subjects
>value2 priorities
>value3 requirements
...etc
and I want to have a file
>value1
subjects
>value2
priorities... (4 Replies)
Dear all,
I have a file1.pdb in pdb format and a dat file2 containing values, corresponding to the atoms in the pdb file. these values (file2.dat) need to be in the column instead of the 0.00 (file1) values for each atom in file1.pdb .(the red values must be replaced by the blue ones,in order)... (11 Replies)
Hello,
I have a file with four columns and I would like to replace values in the second column only.
An arbitrary example is:
100 A 105 B
200 B 205 C
300 C 305 D
400 D 405 E
500 E 505 F
I need to replace the second column as shown below:
... (4 Replies)
Hi All,
I have a file which is having 3 columns as (string string integer)
a b 1
x y 2
p k 5
y y 4
.....
.....
Question:
I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Hi,
I have a data file looks like the following
ID STARTDATE ENDDATE
101 20090520 20090521
102 20090521 20090522
103 20090522 20090523
104 20090523 20090524
105 20090524 20090525
106 20090525 20090526
107 ... (3 Replies)
Hi All,
I have a table with 10 columns. Some columns(2nd,4th,5th,7th,8th and 10th) are Not Null columns. I'll get a tab-delimited file and want to check col by col and generate seperate error code for each col eg:102 if 2nd col value is NULL and 104 if 4th col value is NULL so on... I am a... (7 Replies)