Sponsored Content
Top Forums UNIX for Advanced & Expert Users Regular expression for finding OCR mistakes. Post 302642519 by Corona688 on Thursday 17th of May 2012 02:18:12 PM
Old 05-17-2012
Code:
$ cat data

This line contains a 1 but is not a mistake
The small brown fox jumped over the lazy dog.
This line contains a 1 but is a m1stake
How are you today?
mi|k
That's fine;  this isn't.
!nert
Hey hey hey!
cra6
chemica(

$ cat ocr.awk

{
        P=0
        for(N=1; (!P) && (N<=NF); N++)
        {
                # Ignore words that are pure numbers?
                if($N ~ /^[0-9]*$/) continue;
                # Flag words that contain non a-zA-Z'
                if($N ~ /[^a-zA-Z']./) P=1;
                # Flag words that end in non a-zA-Z.,;?!
                if($N ~ /[^a-zA-Z.,;?!]$/) P=1;
        }

        $0=NR"\t"$0;
} P

$ awk -f ocr.awk data

3       This line contains a 1 but is a m1stake
5       mi|k
7       !nert
9       cra6
10      chemica(

$

These 2 Users Gave Thanks to Corona688 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies

2. Shell Programming and Scripting

regular expression help

hello all.. I'm a bit new to this site.. and I hope to learn alot.. but I've been having a hard time figuring this out. I'm horrible with regular expressions.. so any help would be greatly appreciated. I have a file with a list of names like this: LASTNAME, FIRSTNAME, MIDDLEINITIAL how can... (5 Replies)
Discussion started by: mac2118
5 Replies

3. Shell Programming and Scripting

regular expression

Hi all, My log file is like 19:40:22 INFO :Total time taken to Service External Request---15ms 19:40:22 INFO : External service failed with status KO 19:40:22 FATAL: External service failed with status KO 19:40:22 DEBUG : Batch started with 19:40:22 ERROR: Member: dmidecode.x86_64... (1 Reply)
Discussion started by: subin_bala
1 Replies

4. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

5. UNIX for Dummies Questions & Answers

Regular expression help

HI All, I want to list a file with the below format : testfile_nnnnn.xxxx where n and x can be any digit 0-9. n repeats 5 times and x 4 times... I tried with something like below: ls -l testfile_/\{5\}/* to start with but its not working. Please could anyone help? Thanks D (1 Reply)
Discussion started by: deepakgang
1 Replies

6. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

7. Shell Programming and Scripting

Regular expression

I have a flat tab delimited file of the following format 1 A:23 A:45 A:789 2 A:2 A:47 3 A:78 A:345 A:9 A:10 4 A:34 A:98 I want to modify the file to the following format with insertions of "//" in between 1 A:23 // A:45 // A:789 2 A:2 // A:47 3 A:78 // A:345 // A:9 // A:10 4 A:34... (7 Replies)
Discussion started by: Lucky Ali
7 Replies

8. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

9. UNIX for Dummies Questions & Answers

Finding lines with a regular expression, replacing them with blank lines

So the tag for this forum says all newbies welcome... All I want to do is go through my file and find lines which contain a given string of characters then replace these with a blank line. I really tried to find a simple command to do this but failed. Here's what I did come up with though: ... (2 Replies)
Discussion started by: Golpette
2 Replies

10. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies
regex(3)						     Library Functions Manual							  regex(3)

Name
       re_comp, re_exec - regular expression handler

Syntax
       char *re_comp(s)
       char *s;

       re_exec(s)
       char *s;

Description
       The  subroutine	compiles  a string into an internal form suitable for pattern matching.  The subroutine checks the argument string against
       the last string passed to

       The subroutine returns 0 if the string s was compiled successfully; otherwise a string containing an  error  message  is  returned.  If	is
       passed 0 or a null string, it returns without changing the currently compiled regular expression.

       The  subroutine returns 1 if the string s matches the last compiled regular expression, 0 if the string s failed to match the last compiled
       regular expression, and -1 if the compiled regular expression was invalid (indicating an internal error).

       The strings passed to both and may have trailing or embedded newline characters; they are terminated by	nulls.	 The  regular  expressions
       recognized are described in the manual entry for given the above difference.

Diagnostics
       The subroutine returns -1 for an internal error.

       The subroutine returns one of the following strings if an error occurs:

       No previous regular expression
       Regular expression too long
       unmatched (
       missing ]
       too many () pairs
       unmatched )

See Also
       ed(1), ex(1), egrep(1), fgrep(1), grep(1)

																	  regex(3)
All times are GMT -4. The time now is 11:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy