02-02-2009
Quote:
Originally Posted by
Qwerty123
Hi,
My file is very huge sometimes 1GB, instead of removing non-ascii characters from the file, can they be removed only from this string i.e, $var?
Please suggest.
Unfortunately, you really can't avoid the cost of fixing this file.
it's going to take some time and disk space.
What's the next step, loading it into a database?
If that's the case -- you will be able to do the read,
weird character removal and insert-into-the-database
all in perl -- that's be fairly cost effective.
If you're interested in that solution, lemme know.
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
I used following to add * at the end of the line in file1.
It adds * at the end but has a space before it for some lines but some other lines it adds exactly after the last character.
How do I take out the space ?
sed 's/$/*/' file1 > file2
example:
contents of file1 :
... (2 Replies)
Discussion started by: pitagi
2 Replies
2. Shell Programming and Scripting
Can any one plse help me writing shell script to removing some special character pattern (like / > -------, / > / > ------- etc....as shown below) from the text file ASAP.
/ > -------
<tag-normalization tag-name="EXECSERVPRODUCT" read-only="false" part="body">
... (3 Replies)
Discussion started by: bkc
3 Replies
3. Shell Programming and Scripting
Hi I am working on a bash script and would know how to use cut or sed to remove
(F/.M/d h) from a text file.
Before
1 text to save (F/.M/d h)
after
1 text to save
Thanks in advance (5 Replies)
Discussion started by: pelle
5 Replies
4. Shell Programming and Scripting
Hello ,
i have a text file like this :
A123 c12AB c32DD aaaa
B123 23DS 12QW bbbb
C123 2GR 3RG cccccc
i want to remove the numbers from second and third column only.
i tried this :
perl -pe 's///g' file.txt > newfile.txt
but it will remove the number from... (7 Replies)
Discussion started by: shelladdict
7 Replies
5. Shell Programming and Scripting
Hello
I've searched here and on the 'net for examples of a script or command line function that will remove the $ character from all file names only that can be done within the directory that contains the file names - which are all html files.
ie, I have a directory that contains html files... (6 Replies)
Discussion started by: competitions
6 Replies
6. Shell Programming and Scripting
Hi,
I hope someone can share there scripting fu on my problem,
I would like to delete the 3rd character from a random length of string starting from the end
Example
Output
Hope you can help me..
Thanks in advance.. (3 Replies)
Discussion started by: jao_madn
3 Replies
7. Shell Programming and Scripting
Hello
I have this special caracter after retreving rows from sql server:
"....spasses: • Entrem al valort 6050108002811 • El donem..."
I would like a sed command to remove it..or just know it's ascii code in order to replace it into my sql sentence.. Hope some one knows how to do that.... (7 Replies)
Discussion started by: ldiaz2106
7 Replies
8. Shell Programming and Scripting
Hi,
I am trying to remove lines once a string is found till another string is found including the start string and end string. I want to basically grab all the lines starting with color (closing bracket). PS: The line after the closing bracket for color could be anything (currently 'more').... (1 Reply)
Discussion started by: Dabheeruz
1 Replies
9. Shell Programming and Scripting
Hi Folks,
I have a huge data of the below format
abc #apple 1200 06/23
ghj #orange 1500 06/27
uyt #banana 2300 05/13
efg #vegetable 0700 04/16
After first 3 letters, i have 9 spaces and after fruit there are no specific fixed space, but it varies... (4 Replies)
Discussion started by: jayadanabalan
4 Replies
LEARN ABOUT SUSE
ao_string_tokenize
ao_string_tokenize(3) Programmer's Manual ao_string_tokenize(3)
NAME
ao_string_tokenize - tokenize an input string
SYNOPSIS
#include <your-opts.h>
cc [...] -o outfile infile.c -lopts [...]
token_list_t* ao_string_tokenize(char const* string);
DESCRIPTION
This function will convert one input string into a list of strings. The list of strings is derived by separating the input based on white
space separation. However, if the input contains either single or double quote characters, then the text after that character up to a
matching quote will become the string in the list.
The returned pointer should be deallocated with free(3C) when are done using the data. The data are placed in a single block of allocated
memory. Do not deallocate individual token/strings.
The structure pointed to will contain at least these two fields:
tkn_ct The number of tokens found in the input string.
tok_list An array of tkn_ct + 1 pointers to substring tokens, with the last pointer set to NULL.
There are two types of quoted strings: single quoted (') and double quoted ("). Singly quoted strings are fairly raw in that escape char-
acters () are simply another character, except when preceding the following characters:
double backslashes reduce to one
' incorporates the single quote into the string
0fP suppresses both the backslash and newline character
Double quote strings are formed according to the rules of string constants in ANSI-C programs.
string string to be tokenized
RETURN VALUE
pointer to a structure that lists each token
ERRORS
NULL is returned and errno will be set to indicate the problem:
EINVAL - There was an unterminated quoted string.
ENOENT - The input string was empty.
ENOMEM - There is not enough memory. @end itemize
EXAMPLES
#include <stdlib.h>
int ix;
token_list_t* ptl = ao_string_tokenize( some_string )
for (ix = 0; ix < ptl->tkn_ct; ix++)
do_something_with_tkn( ptl->tkn_list[ix] );
free( ptl );
Note that everything is freed with the one call to free(3C).
SEE ALSO
The info documentation for the -lopts library.
configFileLoad(3), optionFileLoad(3), optionFindNextValue(3), optionFindValue(3), optionFree(3), optionGetValue(3), optionLoadLine(3),
optionNextValue(3), optionOnlyUsage(3), optionProcess(3), optionRestore(3), optionSaveFile(3), optionSaveState(3), optionUnloadNested(3),
optionVersion(3), pathfind(3), strequate(3), streqvcmp(3), streqvmap(3), strneqvcmp(3), strtransform(3),
2010-07-05 ao_string_tokenize(3)