Perl: How do I remove leading non alpha characters


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl: How do I remove leading non alpha characters
# 1  
Old 02-22-2008
Perl: How do I remove leading non alpha characters

Hi,

Sorry for silly question, but I'm trying to write a perl script to operate a log file that is in following format:

(4)ab=1234/(10)bc=abcdef9876/cd=0....

The number in the brackets is the lenghts of the field, "/" is the field separator. Brackets are not leading every field.

What I'm trying to do is print the log in format:

ab=1234
bc=abcdef9876
cd=0

So far I've written the code below:

Code:
#!/bin/perl

$LOGFILE = "/path/to/logfile/filename.txt";
open(LOGFILE) or die("Could not open log file.");
foreach $line (<LOGFILE>) {
    
    @splitted = split(/\//, $line);
    
    foreach $element (@splitted){
        print "$element\n";
    }
}
close(LOGFILE);

However this prints out the leading brackets as well.

How can I get rid of the leading brackets?

Also the field may contain "/" e.g. "ef=a/b" how do I avoid this to be misinterpreted as the field separator?

Thanks! Smilie
# 2  
Old 02-22-2008
Please show some of the real data, there might be a clue in it to help figure out a rule to use to split the lines up correctly. By what you posted it looks like you could use the field "names" (ab, bc, dc) to help split the fields up correctly, but I have a feeling that is psuedo data, not real data.

What kind of log file is this? There might already be a module written that understands the log format.
# 3  
Old 02-22-2008
Anyways, for the lines with no forward slash in the values:

Code:
#!/bin/perl
use strict;
use warnings;

my $LOGFILE = "/path/to/logfile/filename.txt";
open(LOGFILE, $LOGFILE) or die "Could not open log file :$!";
while (<LOGFILE>) {
   chomp;
   my @fields = split(/\//);
   s/^\(\d*?\)// for @fields;
   print "$_\n" for @fields;
}
close(LOGFILE);

# 4  
Old 02-22-2008
Quote:
Originally Posted by Juha
Also the field may contain "/" e.g. "ef=a/b" how do I avoid this to be misinterpreted as the field separator?
the simplest way to tackle this problem is at the source, by not using "/" as the field separator.
# 5  
Old 02-22-2008
Input File:
Code:
$ cat line.txt
(4)ab=1234/(10)bc=abcdef9876/cd=0
(4)ty=5234/(10)bc=abcdef9876/cd=0

Code:
Code:
perl -nle '/(\w+)=(\w+)/&&print "$1=$2"foreach split "/"' < line.txt

Output:
Code:
ab=1234
bc=abcdef9876
cd=0
ty=5234
bc=abcdef9876
cd=0

HTH
# 6  
Old 02-22-2008
Quote:
Originally Posted by rikxik
Input File:
Code:
$ cat line.txt
(4)ab=1234/(10)bc=abcdef9876/cd=0
(4)ty=5234/(10)bc=abcdef9876/cd=0

Code:
Code:
perl -nle '/(\w+)=(\w+)/&&print "$1=$2"foreach split "/"' < line.txt

Output:
Code:
ab=1234
bc=abcdef9876
cd=0
ty=5234
bc=abcdef9876
cd=0

HTH
Considering he said the values can contain a forward slash it seems doubtful it will work.
# 7  
Old 02-22-2008
Thanks for the good tips given already Smilie

The data is just records of users accessing data. I don't think there is existing modules for this data as it is very specific for this log and not generally used.

the fields could have e.g. mt=image/gif (media type downloaded, could be any mime type really...)

Also there is field for browser type e.g. "bt=Mozilla/4"

A real example would look like this:

at=200802221200/cs=59278/(9)mt=image/gif/(9)bt=Mozilla/4...

Which tells the time of access to media (at) the content size (cs) media type (mt) and browser that was used to access the content (bt). There is about 100 different field names and they all are 2 letter combinations followed by "=" and then the value ending with the field separator "/", which I btw can't unfortunately change.

Maybe the data could be split somehow with the fieldnames like KevinADC suggested.

Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Trying to remove leading spaces

OS : RHEL 6.7 Shell : bash I am trying to remove the leading the spaces in the below file $ cat pattern2.txt hello1 hello2 hello3 hello4 Expected output is shown below. $ cat pattern2.txt hello1 hello2 hello3 hello4 (2 Replies)
Discussion started by: John K
2 Replies

2. Shell Programming and Scripting

How does this sed expression to remove non-alpha characters work?

Hello! I know that this expression gets rid of non-alphanumeric characters: sed 's///g' and I understand that it is replacing them with nothing - hence the '//'-, but I don't understand how it's doing it. It seems it's finding strings that begin with alphanumeric and replacing them with... (2 Replies)
Discussion started by: bgnersoon2be#1
2 Replies

3. Shell Programming and Scripting

Perl: Pattern to remove words with less than 2 characters.

Hello. I've been thinking about how to go about this. I know I'm close but still does not work. I need to remove any word in that is not at least 2 characters long. I've removed all the non-alphabetic characters already (numbers included). Here's an example: my $string = "This string is a... (4 Replies)
Discussion started by: D2K
4 Replies

4. Shell Programming and Scripting

perl regular expression to remove the special characters

I had a string in perl script as below. Tue Augáá7 03:54:12 2012 Now I need to replace the special character with space. After removing the special chaacters Tue Aug 7 03:54:12 2012 Could anyone please help me here for writing the regular expression? Thanks in advance.. Regards, GS (1 Reply)
Discussion started by: giridhar276
1 Replies

5. Shell Programming and Scripting

Sed or trim to remove non alphanumeric and alpha characters?

Hi All, I am new to Unix and trying to run some scripting on a linux box. I am trying to remove the non alphanumeric characters and alpha characters from the following line. <measResults>883250 869.898 86432.4 809875.22 804609 60023 59715 </measResults> Desired output is: 883250... (6 Replies)
Discussion started by: jackma
6 Replies

6. Shell Programming and Scripting

Need an awk / sed / or perl one-liner to remove last 4 characters with non-unique pattern.

Hi, I'm writing a ksh script and trying to use an awk / sed / or perl one-liner to remove the last 4 characters of a line in a file if it begins with a period. Here is the contents of the file... the column in which I want to remove the last 4 characters is the last column. ($6 in awk). I've... (10 Replies)
Discussion started by: right_coaster
10 Replies

7. Shell Programming and Scripting

Remove junk characters using Perl

Guys, can you help me in removing the junk character "^S" from the below line using perl Reference Data Not Recognised ^S Where a value is provided by the consuming system, which is not reco Thanks, M.Mohan (1 Reply)
Discussion started by: mohan_xunil
1 Replies

8. Shell Programming and Scripting

Trim leading zeros to make field 6 characters long

Hi all- I've got a file that will have multiple columns. In one column there will be a string that is 10 digits in length, but I need to trim the first four zeros to make it 6 characters? example: 0000001234 0000123456 0000234566 0000000321 output: 001234 123456 234566 000321 (5 Replies)
Discussion started by: Cailet
5 Replies

9. Shell Programming and Scripting

Not able to remove leading spaces

Hi Experts, In a file tht i copied from the web , i am not able to remove the leading white spaces. I tried the below , none of them working . I opened the file through vi to check for the special characters if any , but no such characters found. Your advice will be greatly appreciated. sed... (5 Replies)
Discussion started by: panyam
5 Replies

10. Shell Programming and Scripting

remove special characters from text using PERL

Hi, I am stuck with a problem here. Suppose i have a variable which is assigned some string containing special charatcers. for eg: $a="abcdef^bbwk#kdbcd@"; I have to remove the special characters using Perl. The text is assigned to the variable implicitly. How to do it? (1 Reply)
Discussion started by: agarwal
1 Replies
Login or Register to Ask a Question