Regular expression (regex) clean up text

 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Regular expression (regex) clean up text
# 1  
Old 02-23-2012
Regular expression (regex) clean up text

Hi,

Server - MEDIAWIKI - MYSQL - CENTOS 5 - PHP5
I have a database import of close to a million pages into my wiki, mediawiki site,

the format that were left with is not pretty, and I need to find a way to clean this up and present it nicely...

I think regex is the best option as I can do a search and replace on text ony via a mediawiki extension, so I would need to know simple regex to accomplish this.

Here is a sample of the text and as such the problem.

Code:
{{USA Case Law |Court=1st Circuit |Docket No.=94-1950 |Case name=Clarke v. Kentucky Fried |Original Document=http://www.ca1.uscourts.gov/cgi-bin/getopn.pl?OPINION=94-1950.01A }}





July 5, 1995UNITED STATES COURT OF APPEALS
FOR THE FIRST CIRCUIT


No. 94-1950
KARIN CLARKE,

Plaintiff, Appellant,

v.

 KENTUCKY FRIED CHICKEN OF CALIFORNIA, INC.,

 Defendant, Appellee.





 ERRATA SHEET




 Theopinion ofthisCourt issuedonJune 14,1995,is
amended as follows:


 Cover sheet, underlisting ofcounsel, add: NanMyerson
Evans, Bon Tempo & Evans and David A. Robinson on brief of amicus
curiae National Employment Lawyers Association.































[Appendix not attached.Please contact Clerk's Office
for opinion with appendix.]
UNITED STATES COURT OF APPEALS
FOR THE FIRST CIRCUIT

No. 94-1950

KARIN CLARKE,

Plaintiff, Appellant,

v.

 KENTUCKY FRIED CHICKEN OF CALIFORNIA, INC.,

 Defendant, Appellee.



 APPEAL FROM THE UNITED STATES DISTRICT COURT

FOR THE DISTRICT OF MASSACHUSETTS

 [Hon. Edward F. Harrington, U.S. District Judge]


Selya, Circuit Judge,

 Campbell, Senior Circuit Judge,

 and Cyr, Circuit Judge.



 Kevin G. Powers, with whom Robert S. Mantell and LawOffice of Kevin G. Powers were on brief for appellant. Jeffrey G.Huvelle, withwhom MelissaCole,Covington&Burling,TerryPhilipSegal,BrendaR.Sharton and Segal & Feinberg were on brief for appellee.
 Nan Myerson Evans,Bon Tempo &Evans and DavidA.
Robinson on briefof amicuscuriae NationalEmploymentLawyers Association.

June 14, 1995














CYR,Circuit Judge. PlaintiffKarin Clarkeappeals CYR,Circuit Judge.
from adistrict court judgment dismissingher sexual harassment

claimagainst herformeremployer, KentuckyFried Chickenof

California,Inc. ("KFC"), forfailure to exhaust administrative

remedies,and dismissingher relatedstate-law tortclaims on

preemption grounds.We affirm the judgment.


I I

BACKGROUND BACKGROUND
While employed by defendantKFC at a fast-food restau-

rantinSaugus,Massachusetts, Clarkewassexually harassed,

physically assaulted,and subjectedto attempted rapeby other

KFC employees. Clarke quither job andinitiated thepresent

lawsuit in Massachusetts Superior Court,alleging sexual harass-

ment,negligent and recklessinfliction ofemotional distress,

and negligent hiring, retention and supervision.

After removing the caseto federal district court, see
28 U.S.C.1441, 1446; see also id. 1332 (diversity jurisdic-

tion), KFC filed a motion to dismiss all claims, see Fed. R. Civ.

P. 12(b)(6),contending thatthe sexual harassmentclaim under

Mass.Gen. L.Ann. ch.214,1C, wasbarred forfailure to

exhaustmandatory administrativeremedies beforethe Massachu-

setts Commission Against Discrimination ("MCAD"), see Mass.Gen.
L.ch. 151B,5 (prescribingsix-month limitationperiod for

MCAD claims),9 (making section 5procedure "exclusive"), and

that Clarke'scommonlawtortclaims werepreemptedbythe

Massachusetts Workers'Compensation Act,see Mass. Gen.L. ch.

2










152, 1 et seq. (Supp. 1994).The motion to dismiss was granted
in its entirety.Clarke v. Kentucky Fried Chicken of California,
Inc., No. 94-11101-EFH (D. Mass. Aug. 17, 1994).1

II II

DISCUSSION DISCUSSION
A. Sexual Harassment A. Sexual Harassment

Clarkefirst contends thatthe districtcourt should

nothavedismissed hersexualharassmentclaim, becausethe

"jurisdictional" clauseinMass. Gen.L.Ann. ch.214,1C

(1986) ("Thesuperior court shall have jurisdiction in equity to

enforcethisrightand toawarddamages.")evinces aclear

legislative intent to except such claims from compliance with the

otherwise mandatory MCAD exhaustion requirementimposed on other

employment-based discrimination claimsunder Massachusettslaw.

In order to place her contention in context, we examine pertinent

case law and statutes, see infra APPENDIX at pp. (i)-(iii). (1st Cir. 1990).

7

---------- Post updated 23-02-12 at 03:57 AM ---------- Previous update was 22-02-12 at 11:50 PM ----------

I hate to bump this, but I really could use some help here.
I need a regex search and replace to fix the format to just look normal...

thanks guys!

Last edited by lawstudent; 02-23-2012 at 12:58 AM..
# 2  
Old 02-23-2012
Define "Normal". Remove extraneous blank lines?
Paginate? This is your view at 10000 feet - we have to go a lot lower or we'll trash something you do not want trashed.

According to what I just read, wikimedia pages are xhtml, and the editor works just like editing a page in wikipedia. The formatting information simply refers html and xhtml formatting tags, etc.

Where is there documentation on using a regex to mass edit documents?
Either I don't get it or you are barking up the wrong tree.

A priori, I would get the datastream you used to import, clean it up, remove the junk and re-import. But that seems not feasible for some reason.

Since you want an answer:
Code:
 <br />

is the html tag for a line feed + carriage return (a new line in text in Windows). You apparently have those embedded everywhere.

Explain to me what regex you think you need (meaning what it looks for) and how the documentation says to use that regex, and we can help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

sed: -e expression #1, char 0: no previous regular expression

Hello All, I'm trying to extract the lines between two consecutive elements of an array from a file. My array looks like: problem_arr=(PRS111 PRS213 PRS234) j=0 while } ] do k=`expr $j + 1` sed -n "/${problem_arr}/,/${problem_arr}/p" problemid.txt ---some operation goes... (11 Replies)
Discussion started by: InduInduIndu
11 Replies

2. Shell Programming and Scripting

regular expression with shell script to extract data out of a text file

hi i am trying to extract some specific data out of a text file using regular expressions with shell script that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions for a sample data like this, i am trying to grab... (6 Replies)
Discussion started by: vemkiran
6 Replies

3. Shell Programming and Scripting

passing a regex as variable to awk and using that as regular expression for search

Hi All, I have a sftp session log where I am transferring multi files by issuing "mput abc*.dat". The contents of the logfile is below - ################################################# Connecting to 10.75.112.194... Changing to: /home/dasd9x/testing1 sftp> mput abc*.dat Uploading... (7 Replies)
Discussion started by: k_bijitesh
7 Replies

4. Programming

Perl: How to read from a file, do regular expression and then replace the found regular expression

Hi all, How am I read a file, find the match regular expression and overwrite to the same files. open DESTINATION_FILE, "<tmptravl.dat" or die "tmptravl.dat"; open NEW_DESTINATION_FILE, ">new_tmptravl.dat" or die "new_tmptravl.dat"; while (<DESTINATION_FILE>) { # print... (1 Reply)
Discussion started by: jessy83
1 Replies

5. Shell Programming and Scripting

How can I get the matched text when using regular expression.

Hello: (exp) : match "exp",the matched text is stored in auto named arrays. How can I get the matched text ? What is the name of the auto named arrays on linux shell ? (4 Replies)
Discussion started by: 915086731
4 Replies

6. Shell Programming and Scripting

Integer expression expected: with regular expression

CA_RELEASE has a value of 6. I need to check if that this is a numeric value. if not error. source $CA_VERSION_DATA if * ] then echo "CA_RELESE $CA_RELEASE is invalid" exit -1 fi + source /etc/ncgl/ca_version_data ++ CA_PRODUCT_ID=samxts ++ CA_RELEASE=6 ++ CA_WEEK_NO=7 ++... (3 Replies)
Discussion started by: ketkee1985
3 Replies

7. UNIX for Advanced & Expert Users

Regular expression / regex substition on Unicode text

I have a large file encoded in Unicode that I need to convert to CSV. In general, I know how to do this by regular expression substitutions using sed or Perl, but one problem I am having is that I need to put a quotation mark at the end of each line to protect the last field. The usual regex... (1 Reply)
Discussion started by: thomas.hedden
1 Replies

8. Shell Programming and Scripting

Regular expression (regex) required

I want to block all special characters except alphanumerics.. and "."(dot ) character currently am using // I want to even block only single dot or multiple dots.. ex: . or .............. should be blocked. please provide me the reg ex. ---------- Post updated at 05:11 AM... (10 Replies)
Discussion started by: shams11
10 Replies

9. Linux

Regular expression to extract "y" from "abc/x.y.z" .... i need regular expression

Regular expression to extract "y" from "abc/x.y.z" (2 Replies)
Discussion started by: rag84dec
2 Replies

10. Shell Programming and Scripting

Regular Expression + Aritmetical Expression

Is it possible to combine a regular expression with a aritmetical expression? For example, taking a 8-numbers caracter sequece and casting each output of a grep, comparing to a constant. THX! (2 Replies)
Discussion started by: Z0mby
2 Replies
Login or Register to Ask a Question