Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Shell Programming and Scripting


Tags
interpretation, linux, migration, solved

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 05-14-2017
chill3chee chill3chee is offline
Registered User
 
Join Date: Apr 2015
Last Activity: 18 May 2017, 2:00 PM EDT
Location: India
Posts: 28
Thanks: 24
Thanked 1 Time in 1 Post
AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts,
We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX.


Code:
awk -F "|" 'BEGIN { OFS="|" ; null_blank_flag="NO"; duplicate_flag="NO"} {
if (NR == 1) # Header record, get the key column names
  {if (substr(FILENAME,length(FILENAME)-6,3) == "abc") { a_key_cols[$1 OFS $2]=$1 "," $2 }
  {if (substr(FILENAME,length(FILENAME)-6,3) == "def") { a_key_cols[$1 OFS $2 OFS $3]=$1 "," $2 "," $3}
  {if (substr(FILENAME,length(FILENAME)-6,3) == "ghi") { a_key_cols[$1 OFS $2 OFS $3 OFS $4]=$1 "," $2 "," $3 "," $4}
}
if (NR >= 2)
{  if (substr(FILENAME,length(FILENAME)-6,3) == "abc") { a[$1 OFS $2]++ }
  {if (substr(FILENAME,length(FILENAME)-6,3) == "def") { a[$1 OFS $2 OFS $3]++ }
  {if (substr(FILENAME,length(FILENAME)-6,3) == "ghi") { a[$1 OFS $2 OFS $3 OFS $4]++}
}
}
END { for (i in a) { n=split(i,arry,OFS);
for (k=1;k<=n;k++) { 
gsub(" ","",arry[k]);
if ( (! arry[k]) && ( arry[k] != "0" )) { null_blank_flag="YES" } }
if ( a[i] >= 2 ) { duplicate_flag="YES" }
print "Filename " FILENAME " null/blank flag: " null_blank_flag " and duplicate flag: " duplicate_flag
} ' file_name_*.txt

I had to use the condition arry[k] != "0" because as AWK is treating 0 as NULL and it may be valid value in the file. Please note that the key values change depending on the file name and the key column has SHA_encrypted account numbers like 02djfdf93ikdkjdfkdf3 and 0e123458939393 etc. Please ignore any syntax issues in script as I am not able to copy/paste the working code as it in another machine and the number of characters in the encrypted account_number

When ran this script in Linux, the encrypted account numbers which follows the format 0e[0-9]+ is being interpreted as scientific notation as 0*10 power of [0-9]+ where as in AIX this is interpreted as string itself and doesn't set the null_blank_flag to yes.


Code:
**AIX**
echo "abc|0e123456789|xyz|1234|kdkd|dfs" | awk -F "|" '{ if ( ! $2 ) { print $2 " is empty or NULL " } else { print $2 " is not empty "}}'
output: 0e123456789 is not empty

**RHEL LINUX**
echo "abc|0e123456789|xyz|1234|kdkd|dfs" | awk -F "|" '{ if ( ! $2 ) { print $2 " is empty or NULL " } else { print $2 " is not empty "}}'
output: 0e123456789 is empty or NULL

grep may be a choice and I think I need to read the file twice to check for the null/blank and duplicates on the keys and hence resorted to awk.

As a[$1 OFS $2 OFS $3 OFS $4]++ array index is string for most of the data in the file, I had expected the array index that was built will also be string $1 OFS $2 OFS $3 OFS $4. Will it change from string to integers even though the first record file entry after header is string. Will the strings convert to integer during the split n=split(i,arry,OFS); ...; arry[k] The interesting part is the output from the Linux version is also 0e123456789 which confirms that $2 is 0e123456789. Now where is the issue occurring which transforms $2 value to 0

Can you please explain how to make awk/shell to interpret 0e[0-9]+ as string instead of number. Thank you for your time.
Sponsored Links
    #2  
Old Unix and Linux 05-14-2017
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 22 July 2017, 7:00 AM EDT
Location: Aachen, Germany
Posts: 11,057
Thanks: 284
Thanked 3,408 Times in 3,138 Posts
Try a string conversion:

Code:
echo "abc|0e123456789|xyz|1234|kdkd|dfs" | awk -F "|" '{ if ( ! ($2 "")) { print $2 " is empty or NULL " } else { print $2 " is not empty "}}'
0e123456789 is not empty

But - no integer conversion is done for sheer assignments or usage in e.g. constucting an array index .
The Following User Says Thank You to RudiC For This Useful Post:
chill3chee (05-14-2017)
Sponsored Links
    #3  
Old Unix and Linux 05-14-2017
MadeInGermany MadeInGermany is offline Forum Advisor  
Registered User
 
Join Date: May 2012
Last Activity: 23 July 2017, 10:10 AM EDT
Location: Simplicity
Posts: 3,636
Thanks: 285
Thanked 1,217 Times in 1,100 Posts
Comparing with a string should work as well.

Code:
echo "abc|0e123456789|xyz|1234|kdkd|dfs" | awk -F "|" '{ if ($2=="") { print $2 " is empty or NULL " } else { print $2 " is not empty "}}'

The Following User Says Thank You to MadeInGermany For This Useful Post:
chill3chee (05-14-2017)
    #4  
Old Unix and Linux 05-14-2017
chill3chee chill3chee is offline
Registered User
 
Join Date: Apr 2015
Last Activity: 18 May 2017, 2:00 PM EDT
Location: India
Posts: 28
Thanks: 24
Thanked 1 Time in 1 Post
Thank you RudiC and MadeInGermany. That works. Just wanted to know that during reading $2 is 0e123456789 but why does awk interpret $2 to 0 i.e., process $2 as scientific notation during ! $2 in earlier code. Can you please explain this behavior. Is this again specific to Linux as this was executing without issues in AIX. Thank you for your time.
Sponsored Links
    #5  
Old Unix and Linux 05-14-2017
RudiC RudiC is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 22 July 2017, 7:00 AM EDT
Location: Aachen, Germany
Posts: 11,057
Thanks: 284
Thanked 3,408 Times in 3,138 Posts
As I said - usage alone, be it for assignment or reference in an index, takes the value as is. Only evaluation, e.g. for a boolean expression or a numerical computation, converts the string to a number, using the starting characters up to a non-convertible one.
The Following User Says Thank You to RudiC For This Useful Post:
chill3chee (05-14-2017)
Sponsored Links
    #6  
Old Unix and Linux 05-14-2017
MadeInGermany MadeInGermany is offline Forum Advisor  
Registered User
 
Join Date: May 2012
Last Activity: 23 July 2017, 10:10 AM EDT
Location: Simplicity
Posts: 3,636
Thanks: 285
Thanked 1,217 Times in 1,100 Posts
awk has automatic type conversion.
The ! operator is a boolean that is rather a number (being 1 or 0) than a string, so there is a decent hint to treat the variable as a number.
In border cases there can be differences between awk versions. E.g. the awk in AIX is a derivate from nawk, where awk in Linux is mostly gawk, but some Linux have mawk that is again little different.
The bottom line is, make the data type clear, e.g. typical casts are
var "" append a null string, result is a string
var+0 add a zero, result is a number
The Following User Says Thank You to MadeInGermany For This Useful Post:
chill3chee (05-15-2017)
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
AIX Migration issue with EMC ODM sets JME2015 AIX 7 11-11-2015 01:12 PM
Solaris 10 p2v migration issue sb200 Solaris 9 11-13-2014 11:21 PM
iSCSI issue on RHEL 5 Jeevanm Red Hat 2 07-06-2012 09:24 AM
Treating string as date ? Sara_84 Shell Programming and Scripting 4 02-15-2012 10:19 AM
RHEL Installation issue. Hari_Ganesh Red Hat 1 05-15-2009 11:37 AM



All times are GMT -4. The time now is 12:56 PM.