I have a strange Problem with writing umlauts like (ä, ü) to a file, which has an ISO-8859-1 Encoding.
My Shell-script is reading a file. The Encoding differs. Sometimes US-ASCII, UTF-8, ISO-8859-1. Then a I have to replace all "{" with a "ä".
I am reading the file line by line and do it with a sed on each line. Then I write the corrected line with an echo to a new file.
When the file is ready, within the hex Editor I can see, that the "ä" is represented as a "c3 a4" - thats an UTF-8 Encoding. What I Need is an ISO-8859 Encoding - a "e4".
Thats my code:
My env-variables are as follows:
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8
Is it possible to force to write an ISO-8859-1 encoded file?
How do you would handle the various encoded files for reading? Should I convert them first with "iconv" to ISO-8859-1?
Hi All.
I have the following simple shell program.
It reads a number from the "/user/amit/bldno";
for example: file "bldno" contains value "100"
After execution of the program the content should change to 101.
---------
#!/usr/bin/tcsh
V= `cat /user/amit/bldno`
echo $V
`rm -rf ... (1 Reply)
if test -z "$1"
then echo "you must give a filename or filepath"
else path=`dirname $1`
f_name =`basename $1`
if path="."
then path=`pwd`
fi
fi
cat $f_name $path >> index.txt
The only problem I am encountering with this is writing $path to index.txt
Keeps going gaga:
cat:... (1 Reply)
Hi All
I am new to C and trying to write a code to get a file as an output.
My text file should look like:
<var1>tab<var2>tab<var3>...upto the elements in an array
<varb1>tab<varb2>tab<varb3>...upto the elements in an array
Can someone please guide me how to write the code or a sample... (3 Replies)
Need to develop a unix shell script for the below requirement and I need your assistance:
1) search for file.log and file.bad file in a directory and read them
2) pull out "Load_Start_Time", "Data_File_Name", "Error_Type" from log file
4) concatinate each row from bad file as... (3 Replies)
Help needed...
Can you tell me how to compare the last two couple entries in a file and print their result in new file..:confused:
I have one file
Check1.txt
\abc1 12345
\abc2 12327
\abc1 12345
\abc2 12330
I want to compare the entries in Check1 and write to... (1 Reply)
I am looking to do a ls on a folder and have the output of the ls be structured so that is is modificaiton date, file name with the date in a format that is compatible with mysql. I am trying to build a table that stores the last modification date of certain files so I can display it on some web... (4 Replies)
Hi
I am trying to extract information out of a file but keep getting grep cant open errors
the code is below:
#bash
#extract orders with blank address details
#
# obtain the current date
# set today to the current date ccyymmdd format
today=`date +%c%m%d | cut -c24-31`
echo... (8 Replies)
Hi All,
We have a Unix program in oracle when we run the program this connects to specified ftp and will get the file into local server.
We are facing a problem like when file writing operations is not completed, this program is getting the incomplete file.
Could anyone please help me... (2 Replies)
Hi,
I have 1000 files names data1.txt through data1000.txt inside a folder. I want to write a script that will take each first line from the files and write them as output into a new file. How do I go about doing that? Thanks! (2 Replies)
Hello ,
I have comma delimited file with over 20 fileds that i need to do some validations on. I have to check if certain fields are null and then write the line containing the null field into a new file and then delete the line from the current file.
Can someone tell me how i could go... (2 Replies)
Discussion started by: goddevil
2 Replies
LEARN ABOUT DEBIAN
lingua::stopwords
Lingua::StopWords(3pm) User Contributed Perl Documentation Lingua::StopWords(3pm)NAME
Lingua::StopWords - Stop words for several languages.
SYNOPSIS
use Lingua::StopWords qw( getStopWords );
my $stopwords = getStopWords('en');
my @words = qw( i am the walrus goo goo g'joob );
# prints "walrus goo goo g'joob"
print join ' ', grep { !$stopwords->{$_} } @words;
DESCRIPTION
In keyword search, it is common practice to suppress a collection of "stopwords": words such as "the", "and", "maybe", etc. which exist in
in a large number of documents and do not tell you anything important about any document which contains them. This module provides such
"stoplists" in several languages.
Supported Languages
|-----------------------------------------------------------|
| Language | ISO code | default encoding | also available |
|-----------------------------------------------------------|
| Danish | da | ISO-8859-1 | UTF-8 |
| Dutch | nl | ISO-8859-1 | UTF-8 |
| English | en | ISO-8859-1 | UTF-8 |
| Finnish | fi | ISO-8859-1 | UTF-8 |
| French | fr | ISO-8859-1 | UTF-8 |
| German | de | ISO-8859-1 | UTF-8 |
| Hungarian | hu | ISO-8859-1 | UTF-8 |
| Italian | it | ISO-8859-1 | UTF-8 |
| Norwegian | no | ISO-8859-1 | UTF-8 |
| Portuguese | pt | ISO-8859-1 | UTF-8 |
| Spanish | es | ISO-8859-1 | UTF-8 |
| Swedish | sv | ISO-8859-1 | UTF-8 |
| Russian | ru | KOI8-R | UTF-8 |
|-----------------------------------------------------------|
FUNCTIONS
getStopWords
my $stoplist = getStopWords('en');
my $utf8_stoplist = getStopWords('en', 'UTF-8');
Retrieve a stoplist in the form of a hashref where the keys are all stopwords and the values are all 1.
$stoplist = {
and => 1,
if => 1,
# ...
};
getStopWords() expects 1-2 arguments. The first, which is required, is an ISO code representing a supported language. If the ISO code
cannot be found, getStopWords returns undef.
The second argument should be 'UTF-8' if you want the stopwords encoded in UTF-8. The UTF-8 flag will be turned on, so make sure you
understand all the implications of that.
SEE ALSO
The stoplists supplied by this module were created as part of the Snowball project (see <http://snowball.tartarus.org>,
Lingua::Stem::Snowball).
Lingua::EN::StopWords provides a different stoplist for English.
AUTHOR
Maintained by Marvin Humphrey <marvin at rectangular dot com>. Original author Fabien Potencier, <fabpot at cpan dot org>.
COPYRIGHT AND LICENSE
Copyright 2004-2008 Fabien Potencier, Marvin Humphrey
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or,
at your option, any later version of Perl 5 you may have available.
perl v5.10.0 2009-02-23 Lingua::StopWords(3pm)