03-24-2019
Remove Special Characters and Numbers From a Wordlist
I sux at this type of stuff. I have a huge wordlist. I want to get rid of everything in each word except the letters. I want to remove numbers and all special characters. And since this list was created using cewl I some how picked up something like so Latin characters and would like to remove them as well. If there is a way to do this and someone gives me the string to use could you also drop down and explain to me how the above string works since I would love to learn how to do things like this myself.
Thanks in advance.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi,
How do I remove the lines where special characters or Unicode characters appear?
The following query does work but I wonder if there is a better way.
cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.'
The following lines show that my query is incomplete.
Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies
2. UNIX for Dummies Questions & Answers
Hi All,
I have a script written that creates a new directory within the shell program and if a parameter isn't passed in, it creates a strange directory name by mistake. So I have a directory like "-_12" and I am unable to remove it. I tried removing it using double quote and many others. I have... (12 Replies)
Discussion started by: datherriault
12 Replies
3. Shell Programming and Scripting
Hi there,
I'd like to write a script that removes any set of character from any string. The first argument would be the string, the second argument would be the characters to remove. For example:
$ myscript "My name's Santiago. What's yours?" "atu"
My nme's Snigo. Wh's yors?
I wrote the... (11 Replies)
Discussion started by: chebarbudo
11 Replies
4. UNIX for Dummies Questions & Answers
Dear Members,
We have a file which contains some special characters. I need to replace these special character by a new line character(\n).
The Special character is \x85.
I am not sure what this character means and how we can remove it.
Any inputs are greatly appreciated.
Thanks... (5 Replies)
Discussion started by: sandeep_1105
5 Replies
5. UNIX for Dummies Questions & Answers
Hi,
I have a directory that has a file which contained special characters in the filename. Can someone please advise how to remove the file, preferably with a rm -i ?
Thanks in advance.
Listing is as below:
{oracle}> ls -1b
bplog.bkup.001
bplog.bkup.002
bplog.bkup.003
bplog.bkup.004... (1 Reply)
Discussion started by: newbie_01
1 Replies
6. Shell Programming and Scripting
hello all
I am writing a perl code and i wish to remove the special characters for text.
I wish to remove all extended ascii characters. If the list of special characters is huge, how can i do this using substitute command
s/specialcharacters/null/g
I really want to code like... (3 Replies)
Discussion started by: vasuarjula
3 Replies
7. Shell Programming and Scripting
Hi All,
I have a variable like
AVAIL="\
BACK:bkpstg:testdb3.iad.expertcity.com:backtest|\
#AUTH:authstg:testdb3.iad.expertcity.com:authiapd|\
TEST:authstg:testdb3.iad.expertcity.com:authiapd|\
"
What I want to do here is that If a find # before any entry, remove the entire string... (5 Replies)
Discussion started by: engineermayur
5 Replies
8. Shell Programming and Scripting
Hi,
In source data few of columns are having special charates(like *) due to this i am not able to display the data into flat file.it's displaying the some of junk data into the flat file.
source dataExample:
Address1="XDERFTG * HYJUYTG"
how to remove the special charates in a string (2 Replies)
Discussion started by: koti_rama
2 Replies
9. Shell Programming and Scripting
Hi,
I have a file with this line, it's always in the first line:
I want to remove these special characters: ´╗┐
file1
´╗┐\\bar\c$\test2\;3.348.118 Bytes;160 ;3
\\bar\c$\test\;35 Bytes;2 ;1
I want the same file to be only
\\bar\c$\test2\;3.348.118 Bytes;160 ;3
\\bar\c$\test\;35... (4 Replies)
Discussion started by: nakaedu
4 Replies
10. Shell Programming and Scripting
Hi Gurus,
I have file which contains some unicode charachator like "ü". I want to replace it with some charactors. I searched in internet and got command sed "s/ü/-/g", but I don't know how to type ü in unix command line.
Please help me for this one.
Thanks in advance (7 Replies)
Discussion started by: ken6503
7 Replies
LEARN ABOUT SUSE
word-list-compress
WORD-LIST-COMPRESS(1) Aspell Abbreviated User's Manual WORD-LIST-COMPRESS(1)
NAME
word-list-compress - word list compressor/decompressor for GNU Aspell
SYNOPSIS
word-list-compress c[ompress] | d[ecompress]
DESCRIPTION
word-list-compress compresses or decompresses sorted word lists for use with the GNU Aspell spell checker.
COMMANDS
-c, c, compress
compress the plain text word list read from standard input.
-d, d, decompress
decompress the compressed word list read from standard input.
EXAMPLES
Here are a few examples of how you can use word-list-compress
word-list-compress d <wordlist.cwl >wordlist.txt
Decompress file wordlist.cwl to text file wordlist.txt
word-list-compress c <wordlist.wl >wordlist.cwl 2>errors.txt
Compress wordlist.wl to wordlist.cwl and send any error messages to a text file named errors.txt
LC_COLLATE=C sort -u <wordlist.txt | word-list-compress c >wordlist.cwl
Sort a word list, then pipe it to word-list-compress to create a compressed binary wordlist.cwl file.
word-list-compress d <words.cwl | aspell create master ./words.rws
Decompress a wordlist, then pipe it to aspell(1) to create a spelling list. Please check the aspell(1) info manual for proper usage
and options.
TIPS
Word-list-compress is best used with sorted word list type files. It is not a general purpose compression program since the resulting
files may actually increase in size.
Word-list-compress accepts up to 255 text characters in the range of {0x21...0xFF}. If your word list requires a larger character set for
certain languages or longer length for multi-word, scientific, medical, technical or other use, then it is recommended that you compress
your word list using prezip-bin(1)
DIAGNOSTICS
Word-list-compress normally exits with a return code of 0. If it encounters an error, a message is sent to standard error output (stderr),
and word-list-compress exits with a non-zero return value. Error messages are listed below:
(display help/usage message)
Unknown command given on the command line so word-list-compress displays a usage message to standard error output.
Corrupt Input
This is only for the decompression command d. The input file is of an unknown format or the input file/stream is corrupted. You
may have some valid output, but word-list-compress could not complete the process. If the input file is a compressed wordlist but
you have no output file, then it may be a newer prezip-bin(1) version of compressed file, if so, try decompressing the file with
prezip-bin(1) instead.
Output Data Error
The output is full, write protected, or has an error and can no longer be written to.
SEE ALSO
aspell(1), aspell-import(1), prezip-bin(1), run-with-aspell(1)
Aspell is fully documented in its Texinfo manual. See the `aspell' entry in info for more complete documentation.
REPORTING BUGS
For help, see the Aspell homepage at <http://aspell.net> and send bug reports/comments to the Aspell user list at the above address.
AUTHOR
This manual page was written by Aaron Lehmann <aaronl@vitelus.com>, Brian Nelson <pyro@debian.org> and Jose Da Silva <digital@joescat.com>.
GNU
2005-09-05 WORD-LIST-COMPRESS(1)