Sponsored Content
Homework and Emergencies Emergency UNIX and Linux Support Regular expression (regex) clean up text Post 302601387 by jim mcnamara on Thursday 23rd of February 2012 01:55:16 PM
Old 02-23-2012
Define "Normal". Remove extraneous blank lines?
Paginate? This is your view at 10000 feet - we have to go a lot lower or we'll trash something you do not want trashed.

According to what I just read, wikimedia pages are xhtml, and the editor works just like editing a page in wikipedia. The formatting information simply refers html and xhtml formatting tags, etc.

Where is there documentation on using a regex to mass edit documents?
Either I don't get it or you are barking up the wrong tree.

A priori, I would get the datastream you used to import, clean it up, remove the junk and re-import. But that seems not feasible for some reason.

Since you want an answer:
Code:
 <br />

is the html tag for a line feed + carriage return (a new line in text in Windows). You apparently have those embedded everywhere.

Explain to me what regex you think you need (meaning what it looks for) and how the documentation says to use that regex, and we can help.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies

2. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

3. Shell Programming and Scripting

Regular expression (regex) required

I want to block all special characters except alphanumerics.. and "."(dot ) character currently am using // I want to even block only single dot or multiple dots.. ex: . or .............. should be blocked. please provide me the reg ex. ---------- Post updated at 05:11 AM... (10 Replies)
Discussion started by: shams11
10 Replies

4. UNIX for Advanced & Expert Users

Regular expression / regex substition on Unicode text

I have a large file encoded in Unicode that I need to convert to CSV. In general, I know how to do this by regular expression substitutions using sed or Perl, but one problem I am having is that I need to put a quotation mark at the end of each line to protect the last field. The usual regex... (1 Reply)
Discussion started by: thomas.hedden
1 Replies

5. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

6. Shell Programming and Scripting

How can I get the matched text when using regular expression.

Hello: (exp) : match "exp",the matched text is stored in auto named arrays. How can I get the matched text ? What is the name of the auto named arrays on linux shell ? (4 Replies)
Discussion started by: 915086731
4 Replies

7. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

8. Shell Programming and Scripting

passing a regex as variable to awk and using that as regular expression for search

Hi All, I have a sftp session log where I am transferring multi files by issuing "mput abc*.dat". The contents of the logfile is below - ################################################# Connecting to 10.75.112.194... Changing to: /home/dasd9x/testing1 sftp> mput abc*.dat Uploading... (7 Replies)
Discussion started by: k_bijitesh
7 Replies

9. Shell Programming and Scripting

regular expression with shell script to extract data out of a text file

hi i am trying to extract some specific data out of a text file using regular expressions with shell script that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions for a sample data like this, i am trying to grab... (6 Replies)
Discussion started by: vemkiran
6 Replies

10. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies
GO-CLEAN(1)						      General Commands Manual						       GO-CLEAN(1)

NAME
go - tool for managing Go source code SYNOPSIS
go clean [-i] [-r] [-n] [-x] [ packages ] DESCRIPTION
Clean removes object files from package source directories. The go command builds most objects in a temporary directory, so go clean is mainly concerned with object files left by other tools or by manual invocations of go build. Specifically, clean removes the following files from each of the source directories corresponding to the import paths: _obj/ old object directory, left from Makefiles _test/ old test directory, left from Makefiles _testmain.go old gotest file, left from Makefiles test.out old test log, left from Makefiles build.out old test log, left from Makefiles *.[568ao] object files, left from Makefiles DIR(.exe) from go build DIR.test(.exe) from go test -c MAINFILE(.exe) from go build MAINFILE.go In the list, DIR represents the final path element of the directory, and MAINFILE is the base name of any Go source file in the directory that is not included when building the package. OPTIONS
-i The -i flag causes clean to remove the corresponding installed archive or binary (what 'go install' would create). -n The -n flag causes clean to print the remove commands it would execute, but not run them. -r The -r flag causes clean to be applied recursively to all the dependencies of the packages named by the import paths. -x The -x flag causes clean to print remove commands as it executes them. For more about specifying packages, see go-packages(7). AUTHOR
This manual page was written by Michael Stapelberg <stapelberg@debian.org>, for the Debian project (and may be used by others). 2012-05-13 GO-CLEAN(1)
All times are GMT -4. The time now is 10:42 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy