How to delete corrupted characters and then do fuzzy searches?
Hi All
I have a whole block of pages that have come in from various sources, unfortunately the pages in many instances have blocks of corrupted text. What I'm trying to do is write a sed line that will just delete non alphanumeric characters if they're in a block of say three or four characters, i.e.
constipated would stay the same
con5tipated would stay the same
con%^|pated would stay the same
&*^%%pated would stay the same
^& would stay the same
&^*^ would get deleted
I was thinking along the lines of....
However this seems to delete anything with a punctuation character in the block even if they are valid alphanumerics.
I'm familiar with using /b for word blocks but unless I can get the core sed to work I'm stuck.
Could anyone possibly offer some pointers with an explanation of why my example doesn't work and there's does, that way it helps me learn.
Because there's fuzzy searching on keywords against the page to happen....
So you may be able to get a 'hit' on %&^%pated which will allow you to see manually if it's a match whereas ^*&^ (or any block of consecutive) non alpha characters are always a dud.
It's the results of the fuzzy search I'm really interested in so I don't want to delete too much data that may 'hit' but I do want to delete as much 'garbage' as possible to speed up search times.
Hope that makes it clearer?
Scrutinizer
Thanks for the code but it's not working the way I expected...for example
My sed is not working on deleting the entire special characters and leaving what is necessary.grep connections_per a|sed -e 's/\<\!\-\-//g'
INPUT:
<!-- <connections_per_instance>1</connections_per_instance> -->
<method>HALF</method>
<!--... (10 Replies)
I need to delete the last 11 characters from each number and they are all in the same line (each is in a different column):
-6.89080901827020800000 3.49348891708562325136 1.47988367839905286876 -2.29707635413510400000 -3.49342364708562325136 -4.43758473239905286876 -2.29707635413510400000... (14 Replies)
Ive been trying to google and tried sed and awk. BUt still getting no exact formula.
I would like to know how to parse this at:
From:
Compute Machin Appliance 3.2.9.10000 123456
To:
Compute Machin Appliance 3.2.9.123456 (5 Replies)
hi,
./R1_970330_210505.sard
./R1_970403_223412.sard
./R1_970626_115235.sard
./R1_970626_214344.sard
./R1_970716_234214.sard
...
...
...
for these strings, i wanna remove the ./ for each line
how can i do that?
i know it could possibly be done by sed, but i really have not idea how... (4 Replies)
Hello Everyone,
I need help in deleting first 10 characters from the filename in a directory
eg:
1234567890samplefile1.txt
1234567890samplefile2.txt
and so on..
need to get the output as
samplefile1.txt
Thanks in Advance!!!! (8 Replies)
Hi,
I have a file that has data in the following manner,
tt_0.00001.dat 123.000
tt_0.00002.dat 124.000
tt_0.00002.dat 125.000
This is consistent for all the entries in the file. I want to delete the 'tt_' and '.dat' from each line. Could anyone please guide me how to do this using awk or... (2 Replies)
Hi All,
I have a configuration file (file.cfg) in which data will be like this
;
,
_
+
a to z
A to Z
Now i have to read a textfile (file.txt) and i need to check whether there is any other character present in text file that is not existing in (file.cfg).
If other characters are present... (4 Replies)
Hi All,
I wanted to delete all the unwanted characters in the string. ie, to delete all the characters which are not alpha numeric values.
var1="a./bc"
var2='abc/\."123'
like to get the output as
print var1
abc
print var2
abc123
Could you guys help me out pls.
Your help is... (3 Replies)
I am receiving a file with 'M-^M' characters...how do I get rid of these characters.
I tried tr -d '\015' and sed '/^M//g', but they didnot work.
Appreciate if someone can help me with this (1 Reply)
Hi every1
Well i have a list of numbers e.g
12304
13450
01234
00123
14567
what i want is a command to check if the number is starting from 0 and then delete the 0 without doing anything else!!!!
any help wud b appreciated!!!!!!!!:( (4 Replies)