Hi everyone,
I'm trying to write a shell script that process a log file. The log format is generally:
(8 digit hex of unix time),(system ID),(state)\n
My shell script gets the file from the web, saves it in a local text directory. I then want to change the hex to decimal, convert from unix time to a day/month/year MST format and write out.
I have something that *mostly* works, by downloading the file, opening it with cat, piping the result to sed, using sed to get all the hex values and looping through them.
Unfortunately, there's a bug in the software that produces the log and for some systems the id isn't defined (someone probably forgot to initialize that variable), and it produces a line that looks like: 3B6A7227,››ù√剃,0
When I open this file with cat, the output for lines like that usually just contains a lot of question marks. This is the line I'm using to isolate the hex values:
Code:
cat ~/Downloads/log.txt | sed 's/[^0-9A-Za-z,\n]//g' | sed 's/,.*,[0,1]$//'
Originally I just had the second "sed"; I added the first one in an attempt to remove all the "weird" characters. Unfortunately, when I run this, it comes out as a list of hex numbers EXCEPT for the weird entries. These entries now have their hex number, a comma, then a number of question marks (and sometimes a decimal number), then another comma and the state.
How can I get rid of these? I realize the bug in the logging code needs to be fixed, but I don't have control over that, i'm just trying to clean up the log file.
If you write a C or PERL app, you can ignore ^j without 2 commas and reverse ignore comma if more than 2, pushing the user id back to a fixed value. Maybe sometimes they cannot determine the user name from id #, and write the binary id # ?
Untested idea to clean the file by removing any characters except 0-9 A-Z a-z commas and newlines using the unix "tr" command.
Code:
cat ~/Downloads/log.txt | while read old_line
do
echo "${old_line}"| tr -dc '[0-9][A-Z][a-z],\n'
done
If this does not solve your problem, please post sample data which shows a couple of good lines and a couple of bad lines when displayed by the unix "od" command (which will show exactly what characters are in the file). We don't need the whole file.
$ cat ~/Downloads/log.txt | od -xc
0000000 4233 3936 3938 3036 652c 5753 7550 706d
3 B 6 9 8 9 6 0 , e S W P u m p
0000020 302c 000a 4233 3936 4338 3633 732c 6548
, 0 \n \0 3 B 6 9 8 C 3 6 , s H e
0000040 7461 7250 2c65 0a31 4233 3936 4338 3633
a t P r e , 1 \n 3 B 6 9 8 C 3 6
0000060 652c 4850 7550 706d 312c 330a 3642 3839
, e P H P u m p , 1 \n 3 B 6 9 8
0000100 3343 2c36 5365 5057 6d75 2c70 0a31 4233
C 3 6 , e S W P u m p , 1 \n 3 B
0000120 3936 4338 3442 ef2c fe21 c3ff ec38 2cef
6 9 8 C B 4 , 357 ! 376 377 303 8 354 357 ,
0000140 0a31 4233 3936 4338 3442 652c 5748 5052
1 \n 3 B 6 9 8 C B 4 , e H W R P
0000160 6d75 2c70 0a31 4233 3936 4338 3442 dd2c
u m p , 1 \n 3 B 6 9 8 C B 4 , 032
0000200 dd1a c39d e48c 2cc4 0a31 4233 3936 4338
032 335 235 Ì ** 344 304 , 1 \n 3 B 6 9 8 C
0000220 3442 652c 6f44 6e77 5248 2c56 0a31 4233
B 4 , e D o w n H R V , 1 \n 3 B
0000240 3936 4338 3442 652c 7055 5248 2c56 0a31
6 9 8 C B 4 , e U p H R V , 1 \n
0000260 4233 3936 4538 3139 732c 6548 7461 7250
3 B 6 9 8 E 9 1 , s H e a t P r
0000300 2c65 0a30 4233 3936 4538 3139 652c 4850
e , 0 \n 3 B 6 9 8 E 9 1 , e P H
0000320 7550 706d 302c 330a 3642 3839 3945 2c31
P u m p , 0 \n 3 B 6 9 8 E 9 1 ,
0000340 5365 5057 6d75 2c70 0030
e S W P u m p , 0
0000351
thanks!
Last edited by bencpeters; 08-03-2011 at 09:49 PM..
Reason: wrong option on od...
I can't reproduce your "tr" error. Please post the command typed an the complete error message.
There are certainly some weird characters in the second comma-delimited field of certain records in this sample data. There is also a weird trailing null character at the end of the first record.
What Operating System and version are you running?
What Shell do you use?
My first guess would be that some locale-aware code in the underlying C library that tr is using does not approve of certain byte sequences. You could try running tr in the C/POSIX locale: LC_ALL=C tr ...
I have a text file downloaded from the web, I want to count the unique words used in the file, and a person's speaking length during conversation by counting the words between the opening and closing quotation marks which differ from the standard ASCII code. Also I found out the file contains some... (2 Replies)
I am using Korn shell on Linux 2.6x platform , and I am suing the following code to capture the lines which contain CONTROL CHARACTERS in my file :
awk '/]/ {print NR}' EROLLMENT_INPUT.txt
The problem is that this code shows the file has control characters when the file is in folder A ,... (2 Replies)
Dear all,
I have the files: xaa xab xac
and I try to paste them using $paste -d, xaa xab xac
I see:
output
3e-130
,6e-78
,5e-74
6e-124
,0,007
,0,026
2e-119
When I type: $ paste -d, xaa xab xac |less
I see:
output
3e-130^M,6e-78^M,5e-74
6e-124^M,0,007^M,0,026 (2 Replies)
I just finish the shell script .
This shell can replace weird characters (such as #$%^@!'"...) in file or directory name by "_"
I spent long time on replacing apostrophe in file/directory name
added: 2012-03-14
the 124th line (/usr/bin/perl -i -e "s#\'#\\'#g" /tmp/rpdir_level$i.tmp) is... (5 Replies)
Hi,
I am using Cygwin.I created a new file and type into it using cat > newfile. When I open this using vi editor, it contains loads of extra control characters.
Whats happening? (1 Reply)
Hello guys,
I have a list of files. For example:
/disk1/mediator_home/tmp/ntest/TSFILE00.8256.GGG1-U.0908250009.unp.20090824P8.is
/disk1/mediator_home/tmp/ntest/TSFILE00.8257.GGG1-U.0908250013.unp.20090825P1.is... (2 Replies)
hello
I am trying to run the following script to get the my-progam pid:
#!/bin/ksh
tt=`/usr/ucb/ps| grep -i $1| grep -v grep | awk '{print $2}'`
echo $tt
When I run the script I get the more PIDs
$./test.sh my-program
12033 15033 15034
Actually my-program's PID is 12033....I... (6 Replies)
I have a file called merge2.t:
Hi
Hello how are you.
</Endtag> <New> I am fine.</New>
This is a test.
freelong
how
Here is the SED:
sed -n ' /<\/Endtag>/ !{
H
}
/<\/Endtag>/ {
x
p
} (4 Replies)
Hi.
I have files in my OS that has weird file names with not-conventional ascii characters.
I would like to run them but I can't refer them.
I know the ascii # of the problematic characters.
I can't change their name since it belongs to a 3rd party program... but I want to run it.
is there... (2 Replies)
Does anyone of you know how to turn off color and weird characters on bash shell when using the command "script"? Everytime users on my server used that command to record their script, they either couldn't print it because lp kept giving the "unknown format character" messages or the print paper... (1 Reply)