Visit Our UNIX and Linux User Community


Reg Ex question


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Reg Ex question
# 1  
Old 12-02-2008
Reg Ex question

Hi All,

If I had a string that was a combination of plain text and quoted text - For ex

String: This "sentence is" a combination of "multiple words"

I wanted to know how I can write a reg-ex that splits the above string into the following

result[0] = This
result[1] = sentence is
result[2] = a
result[3] = combination
result[4] = of
result[5] = multiple words

Any help is welcome, thanks.

Regards,
garric
# 2  
Old 12-02-2008
Doesnt make sense as to why you would do it for that sentence.

Is this a homework question?
# 3  
Old 12-02-2008
No. That was just an example. Basically, I want to split a string on /\s+/ but do not want to split strings within quotes.

I guess it's too tough for homework, or atleast I feel so. Anyways, I'm too old for homework.
# 4  
Old 12-02-2008
echo 'This "sentence is" a combination of "multiple words"' | nawk -f garric.awk

garric.awk:
Code:
# setcsv(str, sep) - parse CSV (MS specification) input
# str, the string to be parsed. (Most likely $0.)
# sep, the separator between the values.
#
# After a call to setcsv the parsed fields are found in $1 to $NF.
# setcsv returns 1 on sucess and 0 on failure.
#
# By Peter Str\366mberg aka PEZ.
# Based on setcsv by Adrian Davis. Modified to handle a separator
# of choice and embedded newlines. The basic approach is to take the
# burden off of the regular expression matching by replacing ambigious
# characters with characters unlikely to be found in the input. For
# this the characters "\035".
#
# Note 1. Prior to calling setcsv you must set FS to a character which
#         can never be found in the input. (Consider SUBSEP.)
# Note 2. If setcsv can't find the closing double quote for the string
#         in str it will consume the next line of input by calling
#         getline and call itself until it finds the closing double
#         qoute or no more input is available (considered a failiure).
# Note 3. Only the "" representation of a literal quote is supported.
# Note 4. setcsv will probably missbehave if sep used as a regular
#         expression can match anything else than a call to index()
#         would match.
BEGIN { FS=SUBSEP; OFS="|" }

{
  result = setcsv($0, " ")
  for(i=1;i<=NF;i++)
    printf("result[%d] = %s\n", i-1, $i)
  #print
}

function setcsv(str, sep, i) {
  gsub(/""/, "\035", str)
  gsub(sep, FS, str)

  while (match(str, /"[^"]*"/)) {
    middle = substr(str, RSTART+1, RLENGTH-2)
    gsub(FS, sep, middle)
    str = sprintf("%.*s%s%s", RSTART-1, str, middle,
      substr(str, RSTART+RLENGTH))
  }

  if (index(str, "\"")) {
    return ((getline) > 0) ? setcsv(str (RT != "" ? RT : RS) $0, sep) : !setcsv(str "\"", sep)
  } else {
    gsub(/\035/, "\"", str)
    $0 = str

    for (i = 1; i <= NF; i++)
      if (match($i, /^"+$/))
        $i = substr($i, 2)

    $1 = $1 ""
    return 1
  }
}

# 5  
Old 12-02-2008
Thanks. But I was looking for a simpler reg-ex. In perl or Java.
# 6  
Old 12-02-2008
This is gonna be the simpliest you can get, i believe:
Code:
my $string = 'This "sentence is" a combination of "multiple words"';

my %items;

push @{$items{
 $1 =~ /"/ ? 'quoted' : 'unquoted'
}}, $1 while $string =~ /(".*?"|\S+)/g;

print 'Quoted: ',
 join(', ', @{$items{'quoted'}}),
  "\n";

print 'Unquoted: ',
 join(', ', @{$items{'unquoted'}}),
  "\n";

# 7  
Old 12-02-2008
New one:
Code:
#!/usr/bin/perl

$string = 'This "sentence is" a combination of "multiple words"';

@list = split/\s+/,$string;
foreach my $w (@list) {
   if ($w =~ /^"([^"]+)"$/) { # starts and ends with double-quotes
      print "$1\n";
   }
   elsif ($w =~ /^"([^"]+)$/) {  # only starts with a double quote
      print "$1 ";
   }
   elsif ($w =~ /^([^"]+)"$/) { # only ends with a double-quote
      print "$1\n";
   }
   else { # no quotes at all (fall-through condition)
      print "$w\n";
   }
}


Last edited by Ikon; 12-02-2008 at 06:27 PM..

Previous Thread | Next Thread
Test Your Knowledge in Computers #751
Difficulty: Medium
The first IBM 5150 PCs had two 5.25-inch 360 KB single sided double density (SSDD) floppy disk drives
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

REG Expression

Need your help in creating regular expression for particular set. let say I have given two dates 20130623 to 20140625. I need to create regular for the dates which fall in between above two dates. (4 Replies)
Discussion started by: gvkumar25
4 Replies

2. Windows & DOS: Issues & Discussions

Question regarding Reg entries

Since I cannot find a ffmpeg build that will automatically include a environment variable for the CMD ffmpeg command I'll probably have to do it myself. However I would like to do so by saving it inside a .reg file. For example if my path towards FFMPEG is: C:\RESOURCE\FFMPEG\ffmpeg.exe ... (5 Replies)
Discussion started by: pasc
5 Replies

3. Shell Programming and Scripting

Sorting - Reg.

Hi masters, I have one doubt, lets's say file1 has the following contents, 1 2.0 3.1 5.5 7 5.10 5.9 How to sort these contents to get the o/p like 1 2.0 3.1 5.5 5.9 5.10 7 (8 Replies)
Discussion started by: ecearund
8 Replies

4. Solaris

Reg. VXVM

Hi Guys, I have a doubt either to Reboot the server after Replacing the disk0. I have two disks under vxvm root mirrored and i had a problem with primary disk so i replace the disk0 failed primary disk and then mirrored. After mirroring is it reboot required ? (7 Replies)
Discussion started by: kurva
7 Replies

5. UNIX for Dummies Questions & Answers

Reg: MAILX

Hi all, I am trying to send a mail by using MAILX option to my YAHOO-Id. It is giving the following error. Can any one help me to find what is the problem? Do i need to get any kind of settings in my UNIX box for using MAILX? The bounce mail is as below: Message 1: From MAILER-DAEMON Tue... (2 Replies)
Discussion started by: Raamc
2 Replies

6. Shell Programming and Scripting

reg exp question

Hi, Should be a difference between ']]*' and ']+' ? I use them in bash with sed and grep. Thanks (1 Reply)
Discussion started by: ynir
1 Replies

7. Shell Programming and Scripting

Reg expression For

HI system.sysUpTime.0 : Timeticks: (1519411311) 175 days, 20:35:13.11 From the above output i need only 175days in a perl script.. Please Help (2 Replies)
Discussion started by: Harikrishna
2 Replies

8. Shell Programming and Scripting

need a help reg -d in shell

hi, I am using this to get previous month `date -d"1 month ago" "+%m"` But will it work for january?..will it return 12? Please advice. (2 Replies)
Discussion started by: vanathi
2 Replies

9. Shell Programming and Scripting

Reg: Gzip

Hi , I want gzip a folder te55 which has got 3 files test1.test2,test3 the name of the gzipped folder should be te55.gz with the 3 files as test1,test2,test3 itself... Is it possible... thanks in advance sam (5 Replies)
Discussion started by: sam99
5 Replies

10. Shell Programming and Scripting

reg files

Dear all, One of our jobs retrieves data from tables and writes it to files. This job was running for around 15 minutes for the past 8 months. Now, this job is runnig for 45-50 minutes. I checked with the DBA's and found no issues with database. The time taken by to job to write to the file is... (5 Replies)
Discussion started by: ranj@chn
5 Replies

Featured Tech Videos