How to replace and remove few junk characters from a specific field?

09-30-2014

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Maybe any of these two:

Code:

awk -v OFS="\t" '{$1=$1;sub(/%.*\)/, "", $4)}1' filename

perl -wnla -e '$F[3] =~ s/%.*\)// if $#F > 2; print join "\t", @F;' filename

if not, please, post the unadulterated output of the following command:

Code:

head filename | od -c

Last edited by Aia; 09-30-2014 at 08:49 PM.. Reason: grammar correction

Aia

View Public Profile for Aia

Find all posts by Aia

09-30-2014

Registered User

5, 0

Join Date: Sep 2014

Last Activity: 1 October 2014, 11:59 AM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Aia,
Thank you so much - both of your latest solutions work.
As I am in the learning phase, please help me to interpret few of your magical coding.
I would appreciate if you can put few words for this piece of code:

1)
'{$1=$1;sub(/%.*\)/, "", $4)}1'
2)
perl -wnla -e '$F[3] =~ s/%.*\)// if $#F > 2; print join "\t", @F;'

snemuk14

View Public Profile for snemuk14

Find all posts by snemuk14

10-01-2014

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

As I suspected you do not use tabs between fields, you use multiple spaces imitating a tab.

awk

Code:

awk -v OFS="\t" '{$1=$1;sub(/%.*\)/, "", $4)}1' filename

-v OFS="\t" # sets the built-in Output Field Separator to a tab, instead of the default single space when outputting (which is used in the next part)

$1=$1 # changing a field rebuilds the $0 (whole record), which by default is a line, however the output separator is a tab now, substituting any conbination of spaces into a tab

sub(/%.*\)/, "", $4) # this is a built-in function in awk that takes three argument: 1st the regular expression to match, 2nd the string to substitute instead, and 3rd the field or string to look into it, in this case the 4th field.

1 # evaluate to true will print the default $0

Perl

Code:

perl -wnla -e '$F[3] =~ s/%.*\)// if $#F > 2; print join "\t", @F;' filename

-wnla -e # -w is for warnings, -n is for reading but not automatically printing, -l automatically adds an output separator after the print and when used in combination with -n (like here), it takes away any new line or input separator from the line, -a tells Perl to create an array name F and uses it to hold fields, -e tells Perl that what it comes next should be interpreted or executed as Perl code.

$F[3] =~ s/%.*\)// # take the 4th field stored in subscript 3 of array F and substitutes the first match of the regex between / and / with (empty) final /

if $#F > 2 # do the previous only if the array F has more than 3 elements (we are looking for the 4th)

print join "\t", @F # add new tab between each element stored in array F and then display it

Aia

View Public Profile for Aia

Find all posts by Aia

10-01-2014

Registered User

5, 0

Join Date: Sep 2014

Last Activity: 1 October 2014, 11:59 AM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

Aia,
Simply awesome! The way you explained enlightened me. Your support is well appreciated as I am always trying to learn.
Thanks again.

snemuk14

View Public Profile for snemuk14

Find all posts by snemuk14

UNIX for Dummies Questions & Answers

How to replace and remove few junk characters from a specific field?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find records with specific characters in 2 nd field

Discussion started by: ashwin3086

2. UNIX for Beginners Questions & Answers

Need to remove Junk characters

Discussion started by: spradeep86

3. Shell Programming and Scripting

Remove all junk characters from a text file

Discussion started by: Talari

4. Shell Programming and Scripting

Remove first n characters from specific columns

Discussion started by: pathunkathunk

5. Shell Programming and Scripting

[Solved] Counting specific characters within each field

Discussion started by: Homa

6. UNIX for Dummies Questions & Answers

How to remove JUNK characters (FROMï¿½)

Discussion started by: arukuku

7. Shell Programming and Scripting

Remove the special characters from field

Discussion started by: koti_rama

8. Shell Programming and Scripting

Replace specific field on specific line sed or awk

Discussion started by: crownedzero

9. Shell Programming and Scripting

Remove junk characters using Perl

Discussion started by: mohan_xunil

10. HP-UX

extract field of characters after a specific pattern - using UNIX shell script

Discussion started by: jansat