I have a text file downloaded from the web, I want to count the unique words used in the file, and a person's speaking length during conversation by counting the words between the opening and closing quotation marks which differ from the standard ASCII code. Also I found out the file contains some weird blank characters that are invisible from stdout which are the entry that has 118391 and the one has 6380 occurrence in the example.
It seems to me the file was processed with Mac PC by the single/double quotes I can guess, but I am not sure. Here is the output of my Ubuntu terminal:
1) How do I find out the invisible "blank/empty" characters in the file so that I can get rid of them in order to count the words?
2) How do I count the speaking duration of a person at conversations by the opening/closing double quotation pair? What I tried is:
This regex is too greedy that sometime combines adjacent dialogues into single one.
Thanks!
Hello,
I read a file whose in lines are datas and between thses datas there is blank characters (10, 12 or 5 or 1 .......)
So when i use the command while read line in the script(see under) there is also only one character between the datas and the others blank characters are not here.
... (3 Replies)
Hi,
I am trying to do two things in my script. I will really appreciate any help in this regards.
Is there a way to delete a last line from a pipe delimited flat file if the last line is blank. If the line is not blank then do nothing.....
Is there a way to count a word that are starting... (4 Replies)
Does anyone of you know how to turn off color and weird characters on bash shell when using the command "script"? Everytime users on my server used that command to record their script, they either couldn't print it because lp kept giving the "unknown format character" messages or the print paper... (1 Reply)
Hi.
I have files in my OS that has weird file names with not-conventional ascii characters.
I would like to run them but I can't refer them.
I know the ascii # of the problematic characters.
I can't change their name since it belongs to a 3rd party program... but I want to run it.
is there... (2 Replies)
Hi everyone,
I'm trying to write a shell script that process a log file. The log format is generally:
(8 digit hex of unix time),(system ID),(state)\n
My shell script gets the file from the web, saves it in a local text directory. I then want to change the hex to decimal, convert from unix time... (7 Replies)
I just finish the shell script .
This shell can replace weird characters (such as #$%^@!'"...) in file or directory name by "_"
I spent long time on replacing apostrophe in file/directory name
added: 2012-03-14
the 124th line (/usr/bin/perl -i -e "s#\'#\\'#g" /tmp/rpdir_level$i.tmp) is... (5 Replies)
Hi,
I was trying to remove the blank from beginning of a line.
when I try:
sed 's/^ +//' filename
it does not work
but when I try
sed 's/^ *//' filename
it works
But I think the first command should have also replaced any line with one or more blanks.
Kindly help me in understanding... (5 Replies)
Dear all,
I have the files: xaa xab xac
and I try to paste them using $paste -d, xaa xab xac
I see:
output
3e-130
,6e-78
,5e-74
6e-124
,0,007
,0,026
2e-119
When I type: $ paste -d, xaa xab xac |less
I see:
output
3e-130^M,6e-78^M,5e-74
6e-124^M,0,007^M,0,026 (2 Replies)
I am using Korn shell on Linux 2.6x platform , and I am suing the following code to capture the lines which contain CONTROL CHARACTERS in my file :
awk '/]/ {print NR}' EROLLMENT_INPUT.txt
The problem is that this code shows the file has control characters when the file is in folder A ,... (2 Replies)
Hi All
Need Help
I have a file with the below format (ABC.TXT) :
®¿¿ABCDHEJJSJJ|XCBJSKK01|M|7348974982790
HDFLJDKJSKJ|KJALKSD02|M|7378439274898
KJHSAJKHHJJ|LJDSAJKK03|F|9898982039999
(cont......)
I need to write a script where it will check for : blank lines (between rows,before... (6 Replies)
Discussion started by: chatwithsaurav
6 Replies
LEARN ABOUT ULTRIX
lookbib
lookbib(1) General Commands Manual lookbib(1)Name
indxbib, lookbib - build inverted index for a bibliography, lookup bibliographic references
Syntax
indxbib database...
lookbib database
Description
The makes an inverted index to the named databases (or files) for use by and These files contain bibliographic references (or other kinds
of information) separated by blank lines.
A bibliographic reference is a set of lines, constituting fields of bibliographic information. Each field starts on a line beginning with
a ``%'', followed by a key-letter, then a blank, and finally the contents of the field, which may continue until the next line starting
with ``%''.
The command is a shell script that calls and The first program, truncates words to 6 characters, and maps upper case to lower case. It
also discards words shorter than 3 characters, words among the 100 most common English words, and numbers (dates) < 1900 or > 2000. These
parameters can be changed. The second program, inv, creates an entry file (.ia), a posting file (.ib), and a tag file (.ic), all in the
working directory.
The command uses an inverted index made by to find sets of bibliographic references. It reads keywords typed after the ``>'' prompt on the
terminal, and retrieves records containing all these keywords. If nothing matches, nothing is returned except another ``>'' prompt.
It is possible to search multiple databases, as long as they have a common index made by In that case, only the first argument given to is
specified to
If does not find the index files (the .i[abc] files), it looks for a reference file with the same name as the argument, without the suf-
fixes. It creates a file with a '.ig' suffix, suitable for use with It then uses this fgrep file to find references. This method is sim-
pler to use, but the .ig file is slower to use than the .i[abc] files, and does not allow the use of multiple reference files.
Files
x.ia, x.ib, x.ic, where x is the first argument, or if these are not present, then x.ig, x
See Alsoaddbib(1), lookbib(1), refer(1), roffbib(1), sortbib(1),
lookbib(1)