Hello, ksh on Sun5.8 here. I have a pipe-delimited, variable length record file with sub-segments identified with a tilda that we receive from a source outside of our control. The records are huge, and Perl seems to be the only shell that can handle the huge lines. I am new to Perl, and am trying to come up with a regex to find segments > 15 and remove them. Some of these segments have sub-segments that should be ignored. i.e. ~DRG segments can have multiple ~DCT segments, and are followed by other segments, some of which are optional..
Here's a sample BEFORE:
And the desired AFTER:
What I need to do is match ~DRG segments where the next field is > 15, up to the next non- ~DRG or non-~DCT segment. I believe I am getting caught up using negative search vs a read-ahead method, etc.
I have tried many ways, with this one being the closest:
But this is not going all the way to the next non- ~DRG or non-~DCT segment. In the output below, ~DRG 15 only has one ~DCT but the match is not going all the way to the ~MSP segment:
Output (bad as it shows a ~DCT from one of the ~DRG's > 15)
(Lines wrapped for readability)
If you have a suggestion on the regex or if there is a better approach, I will be grateful!
Your first sample shows a single line record, but the second, larger sample appears to span multiple lines. Can a single record span multiple lines? If so, how is the end of record determined?
In the second sample you highlight a segment that according to your explanation should not be modified. The field after ~DRG is not greater than 15. Also, there do appear to be two ~DCT segments highlighted in blue, but your text mentions only one.
Edit: Hmm. Perhaps ther was a ~DRG|16 in that second sample that was deleted, and the second ~DCT belongs to it. Not sure. This is why it's good to show both the before and after with sample data.
In addition to answers for those questions, it would help if for each data sample you showed us the before and the (desired) after.
The first record is just a sample showing the DRG and DCT layout.
The second example output is split to multiple lines for readability. The actual records are separated by carriage returns and have TONS of columns so this is just a sample of the relevant part of the record. DRG 15 has only one DCT but it is showing another DCT from one of the records >15. My regex is not going all the way to the MSP record.
I will update my first example to show a before and desired after. The record is too huge to show the whole thing.
To confirm that your AWK can't handle these records, does the following fail to print the number of fields in each record?
On Solaris, make sure to test with /usr/xpg4/bin/awk.
Also, do the records always begin and/or end with a pipe symbol? If yes, be specific, begin or end or both.
Sweet! It works on my test file. I would be grateful if you could give an explanation on the regex? I need to do some similar operations on other parts of the file and want to understand it.
I am not a big expert in regex and have just little understanding of that language.
Could you help me to understand the regular Perl expression:
^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{
------
This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Hi
I have stored a command output in an array like below
@a = `xyz`;
actually xyz comnad will give the output like this
tracker
date
xxxxxxx
xxxxxxx
---------------------
1 a
2 b
----------------------
i have stored the "xyz" output to an... (3 Replies)
Hi,
Whether the following piece of code is placed in the read-only memory of code (text) segment or data segment?
char *a = "Hello";
I am getting two different answers while searching in google :( that's why the confusion is (7 Replies)
I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly:
if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Greetings everyone. Right now I am working on a script to be used during automated deployment of servers. What I have to do is remove localhost.localdomain and localhost6.localdomain6 from the /etc/hosts file. Simple, right? Except most of the examples I've found using sed want to delete the entire... (4 Replies)
I am new to Perl and in text file of around 1000 lines having around 500 repeated line which I felt is no use and want to remove these line.so can somebody help in same for providing sample code how can i remove these repeated line in a file. (11 Replies)
hello mighty all
there's a file with lots of comments.. some of them looks like:
=comment
blabla
blablabla
bla
=cut
i'm trying to cut this out completely with this code:
$line=~s/^=.+?=cut//sg;
but no luck
also tryed to change it abit but still I don't understand how the... (9 Replies)
Hi,
could someone help me on this i want to remove line from /etc/vfstab in the system how to do that
it is rite now like this
/dev/vx/dsk/appdg1/mytestvol /dev/vx/rdsk/appdg1/mytestvol /mytest vxfs 3 no largefiles
/dev/vx/dsk/appdg1/mytestvol1 ... (2 Replies)
Hi Guys,
In the following line:
cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br
I need to extract this string: portal.090710.191533.428571000
As you can see this string always will be bettween "cn=" and "," strings.
Someone know one regular expression to... (4 Replies)
I have a variable dynamically generated
$batch = /dataload/R3P/interface/Bowne/reports/RDI00244.rpt
Now I'd like to strip '/dataload/R3P/interface/Bowne/reports/RDI' and '.rpt' from this variable
my output should be only 00244
how to do this using perl regex.I'm a newbie to perl and would... (1 Reply)