I have OCR'ed text that needs cleaning.
Lines are delimited by parts of speech (POS), for example,
each line will have either an
adj. OR s. f. OR s. m. etc
I need to uppercase all text before the POS
but all text within parentheses to be lowercase
Text after (and including) the POS to remain as is
filename: munge
I have uppercased everything before POS with
doup.sed
and tried to lowercase between the parentheses with
but this retains uppercaseing until first parentheses and lowercases everything else up the POS like:
Any GNU sed 4.2.2 or GAWK 4.1.3 solutions please
Thanks in advance
Moderator's Comments:
Please use CODE tags as required by forum rules!
Last edited by RudiC; 09-22-2016 at 04:54 AM..
Reason: Added CODE tags.
USERS="me you jim joe sue"
for user in ${USERS}; do
rmuser -p $user
usrdir=`cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'`
rm -fr `cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'`
echo Deleting: $user '\t' REMOVING: $usrdir
done
This is for AIX ONLY!!! but easily ported to... (0 Replies)
I wish to clean a text file of the following characters
1/2, 1/4, o (degrees)
I cant display these characters. I have tried ALT+189 etc (my terminal emulator is set to ASCII). How do I display the above ? I am using HP UX 10. (5 Replies)
I'm trying to find a way to automate cleanup of OCR for a large number of scanned pages - due to limitations of the access mechanism where these are to end up, I need to create pdf files that include the background text for searching.
Going in I have Tif images too dirty to OCR and re-keyed text... (2 Replies)
HI ,
I am getting the source data as below.
Source Data
CDR_Data,,,,,
F1,F2,F3,F4,F5,F6
5,5,6,7,8,7
6,6,g,,,
7,7,76,,,
8,8,gt,,,
9,9,df ,d,d,d
,,,,, (4 Replies)
Hi,
I have a file with multiple rows. each row has 8 columns.
Column 8 has entries separated by commas. I want to exclude all the rows in which column 8 has more than 3 commas.
1234#0/1 - ABC_1234 3 ATGCATGCATGC HHHIIIGIHVF 1 49:T>C,60:T>C,78:C>A,76:G>T,65:T>G
Thanks,
Diya (3 Replies)
I have a large file of plain text, created using some OCR software. Some words have inevitably been got wrong. I've been trying to create grep or sed, etc., regular expressions to find them - but haven't quite managed to get it right. Here's what I'm trying to achieve:
Output all lines which... (2 Replies)
I am trying to cleanup a directory with around 4000 files, and using the below command to delete all .gz files older than 60 days, I am having the same issue of arguments being too long. is there a way i can use the same command to do what I intend to do.
find /opt/et/logs/Archive/*.log.*.gz... (4 Replies)