Sponsored Content
Top Forums Shell Programming and Scripting Perl regex to remove a segment in a line Post 302688983 by gary_w on Monday 20th of August 2012 03:26:57 PM
Old 08-20-2012
Perl regex to remove a segment in a line

Hello, ksh on Sun5.8 here. I have a pipe-delimited, variable length record file with sub-segments identified with a tilda that we receive from a source outside of our control. The records are huge, and Perl seems to be the only shell that can handle the huge lines. I am new to Perl, and am trying to come up with a regex to find segments > 15 and remove them. Some of these segments have sub-segments that should be ignored. i.e. ~DRG segments can have multiple ~DCT segments, and are followed by other segments, some of which are optional..

Here's a sample BEFORE:

Code:
|~DRG|15|qwe|qwe|qwe|~DCT|efs|efs|243545|~DRG|16|qwe|qwe|qwe|~DCT|efs|efs|243545|~DRG|17|fgh|fgg|dfg|~DCT|fgg|fhh|`123|~MSP|etc|

And the desired AFTER:
Code:
|~DRG|15|qwe|qwe|qwe|~DCT|efs|efs|243545|~MSP|etc|

What I need to do is match ~DRG segments where the next field is > 15, up to the next non- ~DRG or non-~DCT segment. I believe I am getting caught up using negative search vs a read-ahead method, etc.

I have tried many ways, with this one being the closest:
Code:
 $str =~ s/\|~DRG\|(1[6-9]|2[0-9]).*?\((?!~DRG)|(?!~DCT)\)/\1/g;

But this is not going all the way to the next non- ~DRG or non-~DCT segment. In the output below, ~DRG 15 only has one ~DCT but the match is not going all the way to the ~MSP segment:

Output (bad as it shows a ~DCT from one of the ~DRG's > 15)
(Lines wrapped for readability)
Code:
|~DRG|15|03|599942600|DECYL METHYL SULFOXIDE|0.060|I|0|O|
DECYL METHYL SULFOXIDE POWDER|1|99
|MISCELLANEOUS|U6W|BULK CHEMICALS|960000
|PHARMACEUTICAL AIDS|O||MISCELL.|POWDER|89.8 %|0|N||||~DCT|STD|0.00|01|AWPA|AWPA|38.5000016G||0|N
||||~DCT|STD|0.00|09||AWPA|2.50000
|~MSP|1|93392900|~MSP|2|72900
|~MSP|3|7512900
|~MSP|4|964850|~MSP|5|96500
|~MSP|6|96802900
|~MSP|7|6610000|~MSP|8|967900|~MSP|9|9932900
|~MSP|10|9680002900|~MSP|11|9662900
|~MSP|12
|79403800|~MSP|13|964900|~MSP|14|96700
|~MSP|15|9640|~MSP|16|96200|~MSP|17|96200037

If you have a suggestion on the regex or if there is a better approach, I will be grateful!

Gary

Last edited by gary_w; 08-20-2012 at 05:27 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how do i strip this line using perl regex.

I have a variable dynamically generated $batch = /dataload/R3P/interface/Bowne/reports/RDI00244.rpt Now I'd like to strip '/dataload/R3P/interface/Bowne/reports/RDI' and '.rpt' from this variable my output should be only 00244 how to do this using perl regex.I'm a newbie to perl and would... (1 Reply)
Discussion started by: ramky79
1 Replies

2. Shell Programming and Scripting

Perl REGEX - How do extract a string in a line?

Hi Guys, In the following line: cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br I need to extract this string: portal.090710.191533.428571000 As you can see this string always will be bettween "cn=" and "," strings. Someone know one regular expression to... (4 Replies)
Discussion started by: maverick-ski
4 Replies

3. Shell Programming and Scripting

how to remove line from /etc/vfstab using shell / perl

Hi, could someone help me on this i want to remove line from /etc/vfstab in the system how to do that it is rite now like this /dev/vx/dsk/appdg1/mytestvol /dev/vx/rdsk/appdg1/mytestvol /mytest vxfs 3 no largefiles /dev/vx/dsk/appdg1/mytestvol1 ... (2 Replies)
Discussion started by: tarunn.dubeyy
2 Replies

4. Shell Programming and Scripting

perl regex multi line cut

hello mighty all there's a file with lots of comments.. some of them looks like: =comment blabla blablabla bla =cut i'm trying to cut this out completely with this code: $line=~s/^=.+?=cut//sg; but no luck also tryed to change it abit but still I don't understand how the... (9 Replies)
Discussion started by: tip78
9 Replies

5. Shell Programming and Scripting

Remove repeated line using Perl

I am new to Perl and in text file of around 1000 lines having around 500 repeated line which I felt is no use and want to remove these line.so can somebody help in same for providing sample code how can i remove these repeated line in a file. (11 Replies)
Discussion started by: dinesh.4126
11 Replies

6. Shell Programming and Scripting

Using Sed to remove part of line with regex

Greetings everyone. Right now I am working on a script to be used during automated deployment of servers. What I have to do is remove localhost.localdomain and localhost6.localdomain6 from the /etc/hosts file. Simple, right? Except most of the examples I've found using sed want to delete the entire... (4 Replies)
Discussion started by: msarro
4 Replies

7. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

8. Programming

Data segment or Text segment

Hi, Whether the following piece of code is placed in the read-only memory of code (text) segment or data segment? char *a = "Hello"; I am getting two different answers while searching in google :( that's why the confusion is (7 Replies)
Discussion started by: royalibrahim
7 Replies

9. Shell Programming and Scripting

Need to remove first 6 lines and last line in a array ---- perl scripting

Hi I have stored a command output in an array like below @a = `xyz`; actually xyz comnad will give the output like this tracker date xxxxxxx xxxxxxx --------------------- 1 a 2 b ---------------------- i have stored the "xyz" output to an... (3 Replies)
Discussion started by: siva kumar
3 Replies

10. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language. Could you help me to understand the regular Perl expression: ^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{ ------ This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Discussion started by: alex_5161
2 Replies
All times are GMT -4. The time now is 02:26 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy