Hi everyone,
I'm trying to write a shell script that process a log file. The log format is generally:
(8 digit hex of unix time),(system ID),(state)\n
My shell script gets the file from the web, saves it in a local text directory. I then want to change the hex to decimal, convert from unix time to a day/month/year MST format and write out.
I have something that *mostly* works, by downloading the file, opening it with cat, piping the result to sed, using sed to get all the hex values and looping through them.
Unfortunately, there's a bug in the software that produces the log and for some systems the id isn't defined (someone probably forgot to initialize that variable), and it produces a line that looks like: 3B6A7227,››ù√剃,0
When I open this file with cat, the output for lines like that usually just contains a lot of question marks. This is the line I'm using to isolate the hex values:
Originally I just had the second "sed"; I added the first one in an attempt to remove all the "weird" characters. Unfortunately, when I run this, it comes out as a list of hex numbers EXCEPT for the weird entries. These entries now have their hex number, a comma, then a number of question marks (and sometimes a decimal number), then another comma and the state.
How can I get rid of these? I realize the bug in the logging code needs to be fixed, but I don't have control over that, i'm just trying to clean up the log file.
I find "cat -vt" pretty nice for making the invisible and odd more behaved. If they slip you a ^J or comma you are a goner, though:
If you write a C or PERL app, you can ignore ^j without 2 commas and reverse ignore comma if more than 2, pushing the user id back to a fixed value. Maybe sometimes they cannot determine the user name from id #, and write the binary id # ?
Untested idea to clean the file by removing any characters except 0-9 A-Z a-z commas and newlines using the unix "tr" command.
If this does not solve your problem, please post sample data which shows a couple of good lines and a couple of bad lines when displayed by the unix "od" command (which will show exactly what characters are in the file). We don't need the whole file.
I can't reproduce your "tr" error. Please post the command typed an the complete error message.
There are certainly some weird characters in the second comma-delimited field of certain records in this sample data. There is also a weird trailing null character at the end of the first record.
What Operating System and version are you running?
What Shell do you use?
My first guess would be that some locale-aware code in the underlying C library that tr is using does not approve of certain byte sequences. You could try running tr in the C/POSIX locale: LC_ALL=C tr ...
I have a text file downloaded from the web, I want to count the unique words used in the file, and a person's speaking length during conversation by counting the words between the opening and closing quotation marks which differ from the standard ASCII code. Also I found out the file contains some... (2 Replies)
I am using Korn shell on Linux 2.6x platform , and I am suing the following code to capture the lines which contain CONTROL CHARACTERS in my file :
awk '/]/ {print NR}' EROLLMENT_INPUT.txt
The problem is that this code shows the file has control characters when the file is in folder A ,... (2 Replies)
Dear all,
I have the files: xaa xab xac
and I try to paste them using $paste -d, xaa xab xac
I see:
output
3e-130
,6e-78
,5e-74
6e-124
,0,007
,0,026
2e-119
When I type: $ paste -d, xaa xab xac |less
I see:
output
3e-130^M,6e-78^M,5e-74
6e-124^M,0,007^M,0,026 (2 Replies)
I just finish the shell script .
This shell can replace weird characters (such as #$%^@!'"...) in file or directory name by "_"
I spent long time on replacing apostrophe in file/directory name
added: 2012-03-14
the 124th line (/usr/bin/perl -i -e "s#\'#\\'#g" /tmp/rpdir_level$i.tmp) is... (5 Replies)
Hi,
I am using Cygwin.I created a new file and type into it using cat > newfile. When I open this using vi editor, it contains loads of extra control characters.
Whats happening? (1 Reply)
Hello guys,
I have a list of files. For example:
/disk1/mediator_home/tmp/ntest/TSFILE00.8256.GGG1-U.0908250009.unp.20090824P8.is
/disk1/mediator_home/tmp/ntest/TSFILE00.8257.GGG1-U.0908250013.unp.20090825P1.is... (2 Replies)
hello
I am trying to run the following script to get the my-progam pid:
#!/bin/ksh
tt=`/usr/ucb/ps| grep -i $1| grep -v grep | awk '{print $2}'`
echo $tt
When I run the script I get the more PIDs
$./test.sh my-program
12033 15033 15034
Actually my-program's PID is 12033....I... (6 Replies)
I have a file called merge2.t:
Hi
Hello how are you.
</Endtag> <New> I am fine.</New>
This is a test.
freelong
how
Here is the SED:
sed -n ' /<\/Endtag>/ !{
H
}
/<\/Endtag>/ {
x
p
} (4 Replies)
Hi.
I have files in my OS that has weird file names with not-conventional ascii characters.
I would like to run them but I can't refer them.
I know the ascii # of the problematic characters.
I can't change their name since it belongs to a 3rd party program... but I want to run it.
is there... (2 Replies)
Does anyone of you know how to turn off color and weird characters on bash shell when using the command "script"? Everytime users on my server used that command to record their script, they either couldn't print it because lp kept giving the "unknown format character" messages or the print paper... (1 Reply)