Unix Remove repetitive alphabets


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Unix Remove repetitive alphabets
# 22  
Old 03-15-2010
What about a field that starts with a Z anyway? Like ZZZZimbabwe Imports? Will it always be 2 or more Z at the start or end without the field having a Z?
# 23  
Old 03-15-2010
It will always be -Z<number>. The word can be in beginning or end or in middle and needs to be taken out. Just looking for -Z<numbers> .
Please advise.
# 24  
Old 03-15-2010
Quote:
Originally Posted by msalam65
...I now have a rule which indicates removing anything that starts with a -Z followed by number, or -Z by itself (word). Do not remove -Z followed by alphas (-ZXXXX should be retained). ...
Here's a Perl solution (that also takes into account the previous requirement) -

Code:
perl -lne 's/^Z{2,}| Z{2,}$|-Z\d+|-Z(?=\b)//g; print' your_file

Here's a test run -

Code:
$ 
$ cat -n f4
     1  HHS/CDC
     2  55304
     3  ZZZIBM Corporation
     4  ZZZIBM Corporation ZZZZZ
     5  IBM ZZZ Corporation
     6  IBM ZZZ Corporation ZZ
     7  ZZIBM ZZZ Corporation ZZ
     8  -Z123IBM -Z12X Corporation
     9  IBM -Z123 Corporation
    10  IBM Corporation -Z123
    11  -ZZZZZ IBM Corporation
    12  IBM -ZZZZZ Corporation
    13  IBM Corporation -ZZZZZ
    14  -Z IBM Corporation
    15  IBM -Z Corporation
    16  IBM Corporation -Z
$ 
$ perl -lne 'chomp($x=$_); s/^Z{2,}| Z{2,}$|-Z\d+|-Z(?=\b)//g; printf("%-30s <==>  %-30s\n",$x,$_)' f4
HHS/CDC                        <==>  HHS/CDC                       
55304                          <==>  55304                         
ZZZIBM Corporation             <==>  IBM Corporation               
ZZZIBM Corporation ZZZZZ       <==>  IBM Corporation               
IBM ZZZ Corporation            <==>  IBM ZZZ Corporation           
IBM ZZZ Corporation ZZ         <==>  IBM ZZZ Corporation           
ZZIBM ZZZ Corporation ZZ       <==>  IBM ZZZ Corporation           
-Z123IBM -Z12X Corporation     <==>  IBM X Corporation             
IBM -Z123 Corporation          <==>  IBM  Corporation              
IBM Corporation -Z123          <==>  IBM Corporation               
-ZZZZZ IBM Corporation         <==>  -ZZZZZ IBM Corporation        
IBM -ZZZZZ Corporation         <==>  IBM -ZZZZZ Corporation        
IBM Corporation -ZZZZZ         <==>  IBM Corporation -ZZZZZ        
-Z IBM Corporation             <==>   IBM Corporation              
IBM -Z Corporation             <==>  IBM  Corporation              
IBM Corporation -Z             <==>  IBM Corporation               
$ 
$

HTH,
tyler_durden
# 25  
Old 03-16-2010
Thanks Tyler. This worked but I need to take out the entire word that is -Z<number>. In the above case it is leaving the words after cleaning up -Z<number>. For example.

Code:
Input : 
IBM -Z12X Corporation

Current Output : 
IBM X Corporation

Expected Output : 
IBM Corporation

Please advise.
# 26  
Old 03-16-2010
Hi,

Perhaps this tweaked version of durden_tyler's perl one-liner will do the trick:
Code:
perl -lne 's/^Z{2,} *| *Z{2,}$|(^|(?<= ))-Z(\d+[^ ]*)?( +|$)//g; print' your_file'

Sample run:
Code:
cat $ cat data
ZZZ ZZZ
ZZZIBM Corporation
IBM -Z12X Corporation
-Z12X IBM -Z12X Corporation
ZZZIBM Corporation ZZZZZ
IBM ZZZ Corporation -ZAB ZZZ
IBM ZZZ Corporation -Z1AB ZZZ

$ perl -lne 's/^Z{2,} *| *Z{2,}$|(^|(?<= ))-Z(\d+[^ ]*)?( +|$)//g; print' data
 
IBM Corporation
IBM Corporation
IBM Corporation
IBM Corporation 
IBM ZZZ Corporation -ZAB 
IBM ZZZ Corporation

Regards,
Alister

Last edited by alister; 03-16-2010 at 10:59 PM..
# 27  
Old 03-16-2010
Quote:
Originally Posted by msalam65
... I need to take out the entire word that is -Z<number>. In the above case it is leaving the words after cleaning up -Z<number>. For example.

Code:
Input : 
IBM -Z12X Corporation

Current Output : 
IBM X Corporation

Expected Output : 
IBM Corporation

...
One side-effect of this rule would be that "IBM" would be wiped out from a line like this -

Code:
-Z123IBM -Z12X Corporation

I hope this is what you want.

The modified script is as follows:

Code:
perl -lne 's/^Z{2,}| Z{2,}$|-Z\d+.*?\b|-Z(?=\b)//g; print' your_file

Test run:

Code:
$ 
$ cat -n f4
     1  HHS/CDC
     2  55304
     3  ZZZIBM Corporation
     4  ZZZIBM Corporation ZZZZZ
     5  IBM ZZZ Corporation
     6  IBM ZZZ Corporation ZZ
     7  ZZIBM ZZZ Corporation ZZ
     8  -Z123IBM -Z12X Corporation
     9  IBM -Z123 Corporation
    10  IBM Corporation -Z123
    11  -ZZZZZ IBM Corporation
    12  IBM -ZZZZZ Corporation
    13  IBM Corporation -ZZZZZ
    14  -Z IBM Corporation
    15  IBM -Z Corporation
    16  IBM Corporation -Z
$ 
$ perl -lne 'chomp($x=$_); s/^Z{2,}| Z{2,}$|-Z\d+.*?\b|-Z(?=\b)//g; printf("%-30s <==>  %-30s\n",$x,$_)' f4
HHS/CDC                        <==>  HHS/CDC                       
55304                          <==>  55304                         
ZZZIBM Corporation             <==>  IBM Corporation               
ZZZIBM Corporation ZZZZZ       <==>  IBM Corporation               
IBM ZZZ Corporation            <==>  IBM ZZZ Corporation           
IBM ZZZ Corporation ZZ         <==>  IBM ZZZ Corporation           
ZZIBM ZZZ Corporation ZZ       <==>  IBM ZZZ Corporation           
-Z123IBM -Z12X Corporation     <==>    Corporation                 
IBM -Z123 Corporation          <==>  IBM  Corporation              
IBM Corporation -Z123          <==>  IBM Corporation               
-ZZZZZ IBM Corporation         <==>  -ZZZZZ IBM Corporation        
IBM -ZZZZZ Corporation         <==>  IBM -ZZZZZ Corporation        
IBM Corporation -ZZZZZ         <==>  IBM Corporation -ZZZZZ        
-Z IBM Corporation             <==>   IBM Corporation              
IBM -Z Corporation             <==>  IBM  Corporation              
IBM Corporation -Z             <==>  IBM Corporation               
$ 
$

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove repetitive lines in a file with sed?

Hello, My goal is the make all x times repeated lines into a single line. I need to attain the expected output with sed -i , I need to overwrite the MyFile MyFile: Hello World Welcome Hello World Welcome Back This is my test Expected output: Hello World Welcome Welcome Back This is... (6 Replies)
Discussion started by: baris35
6 Replies

2. Shell Programming and Scripting

How to remove alphabets/special characters/space in the 5th field of a tab delimited file?

Thank you for 4 looking this post. We have a tab delimited file where we are facing problem in a lot of funny character. I have tried using awk but failed that is not working. In the 5th field ID which is supposed to be a integer only of that file, we are getting corrupted data as below. I... (12 Replies)
Discussion started by: Srithar
12 Replies

3. UNIX for Dummies Questions & Answers

Replace alphabets from certain positions

Hi all, I have column 2 full of values like HIVE4A-56 and HIVE4-56. I want to convert all values like HIVE4A-56 to HIVE4-56. So basically I want to delete all single alphabets before the '-' which is always preceded by a number. Values already in the desired format should remain unchanged... (4 Replies)
Discussion started by: ames1983
4 Replies

4. Shell Programming and Scripting

Print combinations of alphabets in a sequence

Hi Friends, I have a series of alphabets like this AGCAA The values inside the square brace indicate that either of them can be present at that position. And those ones without a brace, means that they are the only ones that could be printed at that location. Now, I would like to know... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

5. Shell Programming and Scripting

filter unique alphabets

Filter unique alphabets (bold) from input Thanx +SRR015270.1 HWI-B10_3_6069:2:1:653:875 length=32 SZZZZZZZZZZZZXZZZXZZZOECZZIZHUEM +SRR015270.2 HWI-B10_3_6069:2:1:455:450 length=32 ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZTT @SRR015270.3 HWI-B10_3_6069:2:1:453:499 length=32... (2 Replies)
Discussion started by: repinementer
2 Replies

6. Shell Programming and Scripting

Filter certain number of alphabets

filter the ones (ex:>1279_17_27_F3) that have 50letters (ABABABACACACACACAADADADADABABABABAACACACACACACAACAC) in input. And others that are less than 50 have to be ignore and the ones with more than 50 have to trimmed to first 50 letters. Thanx >1279_16_1960_F3 A >1279_16_2010_F3 BCCC... (2 Replies)
Discussion started by: ruby_sgp
2 Replies

7. Shell Programming and Scripting

Checking for Alphabets

echo -n "read this also:" read NewAuthor if ]' ) ] ; then echo "its a digit" else echo "something else" fi Hey guys , i am trying to do a search to check if the input is using alphabets and nothing else. I tried using ] and ] but none seems to work When i use digit, it read 22.k... (5 Replies)
Discussion started by: gregarion
5 Replies

8. Shell Programming and Scripting

Omitting the last 2 alphabets in the words

Hi Guys, Bit new to Unix shell scripting so this question might seems little kiddish for you. what im trying to achieve here is : I have file which is compressed like Account_52320090605076_log.Z so in my shell script i call this file also as one of my parameters like ... (4 Replies)
Discussion started by: coolrekz
4 Replies

9. UNIX for Advanced & Expert Users

Extracting only Alphabets from a value

Hi, I have file name (abcd001). I want to extract on the alphabets from this file name. I don't want the numeric part of it. Once i extract the alphabets the i can search for all those file. Could anyone help on this. Thanks in advance (2 Replies)
Discussion started by: amitkhiare
2 Replies

10. UNIX for Dummies Questions & Answers

Repetitive Tasks

Could someone tell me how I can simplify the script that follows!!! I know that there must be a way how to grep Average from sar01.................. sar02 ....................... sar03....................... sar04... (3 Replies)
Discussion started by: JairGuerra
3 Replies
Login or Register to Ask a Question