Sponsored Content
Homework and Emergencies Emergency UNIX and Linux Support Regular expression (regex) clean up text Post 302601387 by jim mcnamara on Thursday 23rd of February 2012 01:55:16 PM
Old 02-23-2012
Define "Normal". Remove extraneous blank lines?
Paginate? This is your view at 10000 feet - we have to go a lot lower or we'll trash something you do not want trashed.

According to what I just read, wikimedia pages are xhtml, and the editor works just like editing a page in wikipedia. The formatting information simply refers html and xhtml formatting tags, etc.

Where is there documentation on using a regex to mass edit documents?
Either I don't get it or you are barking up the wrong tree.

A priori, I would get the datastream you used to import, clean it up, remove the junk and re-import. But that seems not feasible for some reason.

Since you want an answer:
Code:
 <br />

is the html tag for a line feed + carriage return (a new line in text in Windows). You apparently have those embedded everywhere.

Explain to me what regex you think you need (meaning what it looks for) and how the documentation says to use that regex, and we can help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies

2. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

3. Shell Programming and Scripting

Regular expression (regex) required

I want to block all special characters except alphanumerics.. and "."(dot ) character currently am using // I want to even block only single dot or multiple dots.. ex: . or .............. should be blocked. please provide me the reg ex. ---------- Post updated at 05:11 AM... (10 Replies)
Discussion started by: shams11
10 Replies

4. UNIX for Advanced & Expert Users

Regular expression / regex substition on Unicode text

I have a large file encoded in Unicode that I need to convert to CSV. In general, I know how to do this by regular expression substitutions using sed or Perl, but one problem I am having is that I need to put a quotation mark at the end of each line to protect the last field. The usual regex... (1 Reply)
Discussion started by: thomas.hedden
1 Replies

5. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

6. Shell Programming and Scripting

How can I get the matched text when using regular expression.

Hello: (exp) : match "exp",the matched text is stored in auto named arrays. How can I get the matched text ? What is the name of the auto named arrays on linux shell ? (4 Replies)
Discussion started by: 915086731
4 Replies

7. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

8. Shell Programming and Scripting

passing a regex as variable to awk and using that as regular expression for search

Hi All, I have a sftp session log where I am transferring multi files by issuing "mput abc*.dat". The contents of the logfile is below - ################################################# Connecting to 10.75.112.194... Changing to: /home/dasd9x/testing1 sftp> mput abc*.dat Uploading... (7 Replies)
Discussion started by: k_bijitesh
7 Replies

9. Shell Programming and Scripting

regular expression with shell script to extract data out of a text file

hi i am trying to extract some specific data out of a text file using regular expressions with shell script that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions for a sample data like this, i am trying to grab... (6 Replies)
Discussion started by: vemkiran
6 Replies

10. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies
RE_COMP(3)						     Linux Programmer's Manual							RE_COMP(3)

NAME
re_comp, re_exec - BSD regex functions SYNOPSIS
#define _REGEX_RE_COMP #include <sys/types.h> #include <regex.h> char *re_comp(char *regex); int re_exec(char *string); DESCRIPTION
re_comp() is used to compile the null-terminated regular expression pointed to by regex. The compiled pattern occupies a static area, the pattern buffer, which is overwritten by subsequent use of re_comp(). If regex is NULL, no operation is performed and the pattern buffer's contents are not altered. re_exec() is used to assess whether the null-terminated string pointed to by string matches the previously compiled regex. RETURN VALUE
re_comp() returns NULL on successful compilation of regex otherwise it returns a pointer to an appropriate error message. re_exec() returns 1 for a successful match, zero for failure. CONFORMING TO
4.3BSD. NOTES
These functions are obsolete; the functions documented in regcomp(3) should be used instead. SEE ALSO
regcomp(3), regex(7), GNU regex manual COLOPHON
This page is part of release 3.25 of the Linux man-pages project. A description of the project, and information about reporting bugs, can be found at http://www.kernel.org/doc/man-pages/. GNU
1995-07-14 RE_COMP(3)
All times are GMT -4. The time now is 06:53 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy