Sponsored Content
Top Forums UNIX for Advanced & Expert Users Regular expression for finding OCR mistakes. Post 302642511 by gencon on Thursday 17th of May 2012 02:03:47 PM
Old 05-17-2012
Regular expression for finding OCR mistakes.

I have a large file of plain text, created using some OCR software. Some words have inevitably been got wrong. I've been trying to create grep or sed, etc., regular expressions to find them - but haven't quite managed to get it right. Here's what I'm trying to achieve:

Output all lines which contain a word which begins with, or contains, a number or non-alpha-numeric character. Eg. th1s, mi|k, !nert, etc.

Output all lines which contain a word which ends with a number or non-alpha-numeric character which is also not a common punctuation symbol like, '.', ','. Eg. Cra6, Chemica(, etc.

If possible it would be great to have the line numbers printed as well, but not essential at all.

Can you gurus help please? Thanks.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies

2. Shell Programming and Scripting

regular expression help

hello all.. I'm a bit new to this site.. and I hope to learn alot.. but I've been having a hard time figuring this out. I'm horrible with regular expressions.. so any help would be greatly appreciated. I have a file with a list of names like this: LASTNAME, FIRSTNAME, MIDDLEINITIAL how can... (5 Replies)
Discussion started by: mac2118
5 Replies

3. Shell Programming and Scripting

regular expression

Hi all, My log file is like 19:40:22 INFO :Total time taken to Service External Request---15ms 19:40:22 INFO : External service failed with status KO 19:40:22 FATAL: External service failed with status KO 19:40:22 DEBUG : Batch started with 19:40:22 ERROR: Member: dmidecode.x86_64... (1 Reply)
Discussion started by: subin_bala
1 Replies

4. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

5. UNIX for Dummies Questions & Answers

Regular expression help

HI All, I want to list a file with the below format : testfile_nnnnn.xxxx where n and x can be any digit 0-9. n repeats 5 times and x 4 times... I tried with something like below: ls -l testfile_/\{5\}/* to start with but its not working. Please could anyone help? Thanks D (1 Reply)
Discussion started by: deepakgang
1 Replies

6. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

7. Shell Programming and Scripting

Regular expression

I have a flat tab delimited file of the following format 1 A:23 A:45 A:789 2 A:2 A:47 3 A:78 A:345 A:9 A:10 4 A:34 A:98 I want to modify the file to the following format with insertions of "//" in between 1 A:23 // A:45 // A:789 2 A:2 // A:47 3 A:78 // A:345 // A:9 // A:10 4 A:34... (7 Replies)
Discussion started by: Lucky Ali
7 Replies

8. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

9. UNIX for Dummies Questions & Answers

Finding lines with a regular expression, replacing them with blank lines

So the tag for this forum says all newbies welcome... All I want to do is go through my file and find lines which contain a given string of characters then replace these with a blank line. I really tried to find a simple command to do this but failed. Here's what I did come up with though: ... (2 Replies)
Discussion started by: Golpette
2 Replies

10. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies
REGEX(3)						     Library Functions Manual							  REGEX(3)

NAME
re_comp, re_exec - regular expression handler SYNOPSIS
char *re_comp(s) char *s; re_exec(s) char *s; DESCRIPTION
Re_comp compiles a string into an internal form suitable for pattern matching. Re_exec checks the argument string against the last string passed to re_comp. Re_comp returns 0 if the string s was compiled successfully; otherwise a string containing an error message is returned. If re_comp is passed 0 or a null string, it returns without changing the currently compiled regular expression. Re_exec returns 1 if the string s matches the last compiled regular expression, 0 if the string s failed to match the last compiled regular expression, and -1 if the compiled regular expression was invalid (indicating an internal error). The strings passed to both re_comp and re_exec may have trailing or embedded newline characters; they are terminated by nulls. The regular expressions recognized are described in the manual entry for ed(1), given the above difference. SEE ALSO
ed(1), ex(1), egrep(1), fgrep(1), grep(1) DIAGNOSTICS
Re_exec returns -1 for an internal error. Re_comp returns one of the following strings if an error occurs: No previous regular expression, Regular expression too long, unmatched (, missing ], too many () pairs, unmatched ). 3rd Berkeley Distribution May 15, 1985 REGEX(3)
All times are GMT -4. The time now is 04:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy