How to prevent incorrect string using reg expr in Java?

Thread Tools Search this Thread
Top Forums Programming How to prevent incorrect string using reg expr in Java?
# 1  
Old 07-12-2011
Bug How to prevent incorrect string using reg expr in Java?

Hi All,

I need your input on how to mask out / ignore a string that does not match a working regular expression (continually refining) pattern in Java. Below is the code snippet which is picking up all the lines with the correct regular expression string except one known so far:

public static void main(String[] args)

          String correctPropertyDetail = "Los Angeles 4 Rose St 7 br h $350,000 J M&C Bunker Hill";
          String incorrectPropertyDetail = "Los Angeles 4 Rose St S 7 br h $350,000 J M&C Bunker Hill";
           Pattern pattern1 = Pattern.compile("\\A[A-Z][a-z]*|[A-Z][a-z]* [A-Z][a-z]* [A-Z]?[0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4}|[0-9]{0,4}[a-z] [A-Z][a-z]* [A-Z][a-z]* (?:St|Rd|Av|Sq|Cl|Pl|Cr|Dr|La) [0-9] br [hut] \\$([0-9]){0,3},([0-9]){0,3}|\\$([0-9]){0,3},([0-9]){0,3},([0-9]){0,3} ([A-Z][a-z]*){1,}\\Z");
           Pattern pattern2 = Pattern.compile("\\A\\b[A-Z][a-z]*\\b|\\b[A-Z][a-z]* [A-Z][a-z]*\\b \\b[A-Z]?[0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4}\\b|\\b[0-9]{0,4}[a-z]\\b \\b[A-Z][a-z]*\\b \\b[A-Z][a-z]*\\b \\bSt|Rd|Av|Sq|Cl|Pl|Cr|Dr|La)\\b \\b[0-9]\\b \\bbr\\b \\b[hut]\\b \\$([0-9]){0,3},([0-9]){0,3}|\\$([0-9]){0,3},([0-9]){0,3},([0-9]){0,3} ([A-Z][a-z]*){1,}\\Z");
           Pattern pattern3 = Pattern.compile("\\A(?:[A-Z][a-z]*|[A-Z][a-z]* [A-Z][a-z]*) (?:[A-Z]?[0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4}|[0-9]{0,4}[a-z]) [A-Z][a-z]* [A-Z][a-z]* \\b(?:St|Rd|Av|Sq|Cl|Pl|Cr|Dr|La)\\b \\b[0-9]\\b br [hut] \\$([0-9]){0,3},([0-9]){0,3}|\\$([0-9]){0,3},([0-9]){0,3},([0-9]){0,3} ([A-Z][a-z]*){1,}\\Z");
           Matcher matcher = pattern.matcher(propertyDetail);
           if (matcher.find())
               System.out.println("Property detail is " + propertyDetail);

The difference between correctPropertyDetail and incorrectPropertyDetail is the S' after Rose St. A sample of few hundred lines of data has been picked up properly but a few incorrect ones managed to slip through. Neither pattern1 nor 2 achieve the desired objective but appears to accept other correct strings, like the correctPropertyDetail. On the other hand, pattern3 successfully masked out incorrectPropertyDetail (good!), but also stopped many correct ones from being accepted.

Note that it is the second sub-pattern (?:[A-Z]?[0-9]{0,4}/?[0-9]{0,4}-?[0-9]{0,4}|[0-9]{0,4}[a-z]) of pattern3 that is responsible for causing the masking of incorrectPropertyDetail not to be picked up. However, it is also breaking the regular expression by no longer accepting the good strings from coming through as well. Can you see what is wrong with it or offer an alternative approach to achieving the same objective?

Regular expression is relatively new to me and can do with some advice.

Your assistance would be appreciated.


Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Red Hat

Incorrect Timezone in Java

Dear All, I have two Red hat Linux servers where the DST has picked the right time. unfortunately the Java Time in one Server is coming incorrect Correct info # java GetCurrentTimeZoneUsingCalendar Current TimeZone is : Eastern European Time Incorrect Info # java... (0 Replies)
Discussion started by: pistachio
0 Replies

2. Shell Programming and Scripting

perl: reg.expr: combine starting and ending removal in one exprecion

Hello, I am new in perl and in regular exprecion; so I am looking for help (or an experienced advise.) The target is a triming spaces from a string: i.e., remove spases from begining and from end of a string. One of main point of a searched solution is performance: for current task it is... (2 Replies)
Discussion started by: alex_5161
2 Replies

3. Shell Programming and Scripting

HowTo: reg expr doing grep "timestamp>$DesiredTime" logfile ?

I know I asked a similar question but I want to know if there is a regular expression existing that with a korn shell cmd, finds any timestamp data records in a file where it is greater then a timestamp in a shell variable ? something like : grep all records where it has a timestamp >... (5 Replies)
Discussion started by: Browser_ice
5 Replies

4. Shell Programming and Scripting

print column that match reg expr

Hi all, I want to cut a column which match the regular expression "beta", if I don't know the column number? cat test alpha;beta;gamma 11;22;33 44;55;66 77;88;99 should be command .... beta 22 55 (6 Replies)
Discussion started by: research3
6 Replies

5. Shell Programming and Scripting

PERL: Simple reg expr validate 6 digits number

Hi there! I'm trying to validate a simple 6 digits number with reg expr. I ONLY want 6 digits so when i type 7 digits the script should no validate the number. I've write this code: #!/usr/bin/perl while(<STDIN>){ if($_=~/\d{6}/){ print "Bingo!\n"; ... (2 Replies)
Discussion started by: BufferExploder
2 Replies

6. Shell Programming and Scripting

Find first digit in string using expr index

I have looked for hours for an answer, so I have decided to request your guidance. I want to substract the first number (series of digits) contained in a string. This string is the output of another command. The substring (number) can be located at any position inside the string. I want to... (4 Replies)
Discussion started by: jcd
4 Replies

7. UNIX for Dummies Questions & Answers

scipt dividing strings /reg expr

Hello! I've got txt-file containing lots of data in sentences like this: ;;BA;00:00:03:00;COM;CLOQUET-LAFOLLYE;SIMON; but sometime more than on in a line like this: ;;BA;00:00:03:00;COM;CLOQUET-LAFOLLYE;SIMON;;;BA;00:00:03:00;REA;RTL9;;;;BAC;:00;TIT;SEMAINE SPECIALE ~SSLOGAN~T DVD;; ... (3 Replies)
Discussion started by: maco_home
3 Replies

8. Shell Programming and Scripting

var substitution in a reg expr ?

In a shell script, how I can achieve substitution of shell script var to a regular expression, as shown below. var=`head -1 file1` awk '$0!~/$var/ {print $0}' file1 > file2 In the case above $var value literally considered for non-exists criteria. (3 Replies)
Discussion started by: videsh77
3 Replies

9. Shell Programming and Scripting

Text replace by position instead of reg expr.

Can we replace the contents the of the rows of file, from one position to another position by mentioning, some start position & the width? (4 Replies)
Discussion started by: videsh77
4 Replies
Login or Register to Ask a Question