Junk character appearing after downloading the file from windows server


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Junk character appearing after downloading the file from windows server
# 8  
Old 09-07-2014
Hi Rudie,

Provided command is converting the file in some other format, seems the file which iam downloading is not UTF-16.

when checking the file type
#file filename
#filename: data or International Language text

here is the list for iconv, for the above type of file which conversion has to be used not sure , please advise.

My local box Format..
Code:
od -x -N 10 filename
0000000  fffe 2200 4500 7600 6500
0000012



Code:
iconv -l
ASCII-GR
BIG5-HKSCS
CNS11643.1986-1
CNS11643.1986-2
GB18030
GBK
IBM-1006
IBM-1046
IBM-1124
IBM-1129
IBM-1251
IBM-1252
IBM-1363
IBM-850
IBM-856
IBM-921
IBM-922
IBM-932
IBM-943
IBM-eucCN
IBM-eucJP
IBM-eucKR
IBM-eucTW
IBM-sbdTW
IBM-udcJP
IBM-udcTW
ISCII.1991
ISO8859-1
ISO8859-1-GL
ISO8859-1-GR
ISO8859-15
ISO8859-15-GL
ISO8859-15-GR
ISO8859-2
ISO8859-2-GL
ISO8859-2-GR
ISO8859-3
ISO8859-3-GL
ISO8859-3-GR
ISO8859-4
ISO8859-4-GL
ISO8859-4-GR
ISO8859-5
ISO8859-5-GL
ISO8859-5-GR
ISO8859-6
ISO8859-6-GL
ISO8859-6-GR
ISO8859-7
ISO8859-7-GL
ISO8859-7-GR
ISO8859-8
ISO8859-8-GL
ISO8859-8-GR
ISO8859-9
ISO8859-9-GL
ISO8859-9-GR
JISX0201.1976-0
JISX0208.1983-0
KSC5601.1987-0
KSC5601.1987-1
TIS-620
UCS-2
UNICODE-2
UTF-16
UTF-16le
UTF-32
UTF-8
big5
ct
fold7
fold8
uucode
IBM-037
IBM-1006
IBM-1025
IBM-1026
IBM-1027
IBM-1046
IBM-1047
IBM-1051
IBM-1097
IBM-1098
IBM-1112
IBM-1122
IBM-1123
IBM-1124
IBM-1125
IBM-1129
IBM-1130
IBM-1131
IBM-1132
IBM-1133
IBM-1140
IBM-1141
IBM-1142
IBM-1143
IBM-1144
IBM-1145
IBM-1146
IBM-1147
IBM-1148
IBM-1149
IBM-1250
IBM-1251
IBM-1252
IBM-1253
IBM-1254
IBM-1255
IBM-1256
IBM-1257
IBM-1258
IBM-12712
IBM-1275
IBM-1280
IBM-1281
IBM-1282
IBM-1283
IBM-1284
IBM-1285
IBM-273
IBM-275
IBM-277
IBM-278
IBM-280
IBM-284
IBM-285
IBM-290
IBM-297
IBM-420
IBM-424
IBM-437
IBM-4899
IBM-4909
IBM-4971
IBM-500
IBM-5346
IBM-5347
IBM-5348
IBM-5349
IBM-5350
IBM-5351
IBM-5352
IBM-5353
IBM-5354
IBM-737
IBM-803
IBM-838
IBM-850
IBM-850-GL
IBM-850-GR
IBM-852
IBM-855
IBM-856
IBM-857
IBM-858
IBM-860
IBM-861
IBM-862
IBM-863
IBM-864
IBM-865
IBM-866
IBM-867
IBM-868
IBM-869
IBM-870
IBM-871
IBM-874
IBM-875
IBM-880
IBM-897
IBM-9048
IBM-9061
IBM-918
IBM-921
IBM-922
IBM-924
ISO8859-1
ISO8859-1-GL
ISO8859-1-GR
ISO8859-15
ISO8859-2
ISO8859-5
ISO8859-6
ISO8859-7
ISO8859-8
ISO8859-

# 9  
Old 09-07-2014
Ignoring the 1st two bytes in your file, it looks like the UTF-16 encoding for "Eve. You haven't answered the question about what locale you're using.

What is the output from the commands:
Code:
uname -a
locale

If, instead of using:
Code:
od -x filename

you use:
Code:
od -cb filename

does the output look like the characters you're expecting with NUL bytes (displayed as \0) between them (which is exactly what we'd expect if your locale is based on a superset of ASCII code set and the data you're viewing is encoded using UTF-16).

What application is creating this file on Windows?

Last edited by Don Cragun; 09-07-2014 at 02:35 PM.. Reason: fix typo.
# 10  
Old 09-07-2014
Quote:
Originally Posted by Riverstone
Hi Rudie,

Provided command is converting the file in some other format, seems the file which iam downloading is not UTF-16.

when checking the file type
#file filename
#filename: data or International Language text

here is the list for iconv, for the above type of file which conversion has to be used not sure , please advise.

My local box Format..
Code:
od -x -N 10 filename
0000000  fffe 2200 4500 7600 6500
0000012

. . .
Look at what it yields on my system:
Code:
hd file
00000000  ff fe 22 00 45 00 76 00  65 00                    |..".E.v.e.|
file file
file: Little-endian UTF-16 Unicode text, with no line terminators
iconv -futf-16 -tutf-8 file
"Eve

At least for the snippet you posted it seems quite persuading to me to treat it as UTF-16.
# 11  
Old 09-07-2014
This is a little tongue-in-cheek and relies entirely in this particular case on a common junk character.
OSX 10.7.5, default bash terminal.
I have no idea if it will work on extremely huge files as it is read into variable and not streamed, however......
Using 'IFS' to your advantage and made as fully readable as possible...
Code:
#!/bin/bash
# jink1.sh
# Store the default IFS.
ifs_str="$IFS"
> /tmp/ascii
> /tmp/filename
# Generate a file with these common junk characters...
echo '▒▒1) "nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
21)        sdmnbfnmbdsf' > /tmp/filename
# Now read file into a variable...
textfile=$(cat < /tmp/filename)
# Check it is correct.
echo "$textfile"
# Now use IFS trick to get rid of them... ;o)
IFS="${textfile:0:1}"
# Make the variable an array.
textfile=($textfile)
# Now remove the junk characters.
n=0
while [ $n -lt ${#textfile[@]} ]
do
	printf "${textfile[$n]}" >> /tmp/ascii
	n=$((n+1))
done
# Ensure stripped newline is re-inserted...
echo "" >> /tmp/ascii
# Check it has been done.
cat < /tmp/ascii
# Reset IFS back to default.
IFS="$ifs_str"
exit 0

Results:-
Code:
Last login: Sun Sep  7 12:02:23 on ttys000
AMIGA:barrywalker~> ./junk1.sh
▒▒1) "nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf▒▒"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
21)        sdmnbfnmbdsf
1) "nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
        sdmnbfnmbdsf"nmdbfnmdsfsdf"
       nmdsbnmfmdsf
        nmsbmfdsbnmfbds
nmbdsnmfbsnmdbfnmds
21)        sdmnbfnmbdsf
AMIGA:barrywalker~> _

# 12  
Old 09-07-2014
I know it is a little tongue-in-cheek, but a couple of remarks:
---
Code:
textfile=$(cat < /tmp/filename)

The redirection is unnecessary:
Code:
textfile=$(cat /tmp/filename)

---
Code:
textfile=($textfile)

To use the same name for a variable and an array in the same assignment is a bit *erm* Smilie:

Why not do it all in one go:
Code:
textfile=( $(cat /tmp/filename ) )

or since you are using bash:
Code:
textfile=( $(< /tmp/filename ) )


One thing to be aware off is that not just one, but any of number of trailing newlines will be removed.
---
Code:
IFS="${textfile:0:1}"

The quotes have no function here..

--
Code:
n=$((n+1))

Since you are using bash, you may also use:
Code:
((n++))

--
Code:
printf "${textfile[$n]}" >> /tmp/ascii

Do not leave out the format field. Leaving it out may bring all kinds of surprises. use:
Code:
printf "%s" "..."

--
Code:
n=0
while [ $n -lt ${#textfile[@]} ]
do
	printf "${textfile[$n]}" >> /tmp/ascii
	n=$((n+1))
done

Since you are using arrays instead of a loop you can also use:
Code:
printf "%s" "${textfile[@]}" > /tmp/ascii


Last edited by Scrutinizer; 09-07-2014 at 07:11 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 13  
Old 09-07-2014
Hi Don,
Here is the locale setting

Code:
#uname -a
AIX PTR07t 1 6 00C29FB64C00

The file Which iam downloading from windows box is Scheduler log file.

Code:
C:\windows\tasks\SchedLgu.txt

Hi wisecracker,

Do you mean to work out my existing code ,Do i need to the add this complete new long code...
# 14  
Old 09-08-2014
Quote:
Originally Posted by Riverstone
Hi Don,
Here is the locale setting

Code:
#uname -a
AIX PTR07t 1 6 00C29FB64C00

The file Which iam downloading from windows box is Scheduler log file.

Code:
C:\windows\tasks\SchedLgu.txt

No. I asked you to run two commands; one to determine what OS you're using (which we now know is AIX) and one to determine what locale you're using (which you have not yet shown us). But it is now obvious (at least to me RudiC and me) that (despite your declaration that the file you have downloaded from Windows is not encoded using UTF-16); it is indeed encoded in UTF-16. If you will tell us what locale you're using, we can then tell you what option-argument to give to the iconv -t option to give you a file you can process in your locale.

I would expect that you will either want:
Code:
iconv -f UTF-16 -t UTF-8 filename > utf8.txt

or:
Code:
iconv -f UTF-16 -t ISO8859-1 filename > 8859.txt

where filename is the name of the file you downloaded from Windows.

Please don't post the results (since it obviously contains sensitive information), but please look at the output from the command:
Code:
od -bc filename

as I suggested before. Doesn't the output from od contain the output you're looking for with a null bytes between the characters you want?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Emergency UNIX and Linux Support

Help in viewing the junk character

Hello All, I have issues in unix file when I loaded that to database and do select * from table where description like '%'+char(13)+'%' on it I am able to get records. I tried to view the file in unix it is all having blank character which I think is all non ascii which I am not able view.... (11 Replies)
Discussion started by: arunkumar_mca
11 Replies

2. Shell Programming and Scripting

How to see junk character in file in.?

Hi I want to know how to see junk character in a file. i am not able to see junk character using vi or cat command. below is the junk char . which i see in host file 10.178.14.67▒▒▒ ac01sp02-vip actually it should be like this 10.178.14.67 ac01sp02-vip i am using secure CRT... (11 Replies)
Discussion started by: scriptor
11 Replies

3. UNIX for Dummies Questions & Answers

PS1 (Prompt character) appearing in cat output

RedHat Linux 5.8/Korn Shell I have text file name /etc/oracle/config.loc. It has the following text #Device/file getting replaced by device +OCR ocrconfig_loc=+DATA ocrmirrorconfig_loc=+OCRBut , when I open this file using cat , the PS1 character (for prompt) appears as the last character... (8 Replies)
Discussion started by: omega3
8 Replies

4. Shell Programming and Scripting

removing of junk character

Dear ALL, How to remove junk charecter ^M from unix file i am using sun solaris unix. I already tried few commands :%s/^M//g :%s/r//g but it didnt helped me. Any help appriciated. Thanks Ripudaman Please view this code tag video for how to use code tags when posting code... (5 Replies)
Discussion started by: ripudaman.singh
5 Replies

5. Shell Programming and Scripting

Check Junk character in sql file

Hello, I have two .sql files which I transferred from Windows to Unix (Linux Enterprise Linux Server release 5.3).I want to ensure that these two files have no junk characters in them.How do I do it in the simplest possible way? Many thanks DJ (1 Reply)
Discussion started by: Digjoy83
1 Replies

6. UNIX for Dummies Questions & Answers

[Solved] Count amount of times of appearing of character before a word?

Hello Is there a way to calculate how many times a particular symbol appeared in a string before a particular word. Desktop/Myfiles/pet/dog/puppy So, I want to count number of occurence of"/" in this directory before the word dog lets say. Cheers, Bob (3 Replies)
Discussion started by: FUTURE_EINSTEIN
3 Replies

7. Windows & DOS: Issues & Discussions

Downloading a file from Website to a Windows Folder

Hi, Is it possible to download a file using Wget or some other command from a Windows machine? Say I want to download something from https server to C:\ABC\abc.xls Any ideas, Thanks. (4 Replies)
Discussion started by: dohko
4 Replies

8. Shell Programming and Scripting

Remove all JUNK character from file.

Hi Team, I have a file having size greater than 1 GB. What i want to do is to check if it contains any JUNK character (ie any special charater thats not on the key board stroke). This file has 532 column & seperated with ^~^. I have found some solution from the file, but it is for a... (4 Replies)
Discussion started by: Amit.Sagpariya
4 Replies

9. Shell Programming and Scripting

Junk Character in file

Hi set filename "./GopiRun.sh" if } err] { writeLog "error in exec " writeLog $a } else { writeLog $a } The above code will execute a file GopiRun.sh,and will log the output of the exec to a file. The problem is the file has lot of junk character in it,how to avoid it. The... (2 Replies)
Discussion started by: nathgopi214
2 Replies

10. UNIX for Advanced & Expert Users

Get rid of junk character in a file

I have a file with one of the following lines, when opened with vi 33560010686GPT£120600GBPGBP10082007DS In the above line, I want to get rid of the junk character before the £ (pound sysmbol). When I tried copying £ from windows and copy in unix vi, it prints as £ and I tried pattern replace... (2 Replies)
Discussion started by: nskworld
2 Replies
Login or Register to Ask a Question