finding files with unicode chars in the filename


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting finding files with unicode chars in the filename
# 1  
Old 11-20-2009
Data finding files with unicode chars in the filename

I'm trying to check-in a repository to svn -- but the import is failing because some files waaaay down deep in some graphics-library folder are using unicode characters in the file name - which are masked using the ls command but picked up when piping output to more:

[root@dev-www-02 emoticons]# ls -l 1914*
-rwxrwxr-x 1 apache apache 1398 Dec 9 2008 1914OdiN_Presenta_-_o_?.bmp
[root@dev-www-02 emoticons]# ls -l 1914* | more
-rwxrwxr-x 1 apache apache 1398 Dec 9 2008 1914OdiN_Presenta_-_o_ò.bmp

Optimally, I'd like to be able to search and quarantine these files into a directory out of the repository tree, but I'm brickwalling Smilie trying to figure out the search string...

I've tried variants of grep '^[A-Za-z0-9]' but can't turn up the right combination.

tia...
# 2  
Old 11-20-2009
unicode has non-ASCII (>127) characters. This is not perfect but should find most files with wacky characters.

Code:
find /path/to/directory -print | grep '[^\x00-\x7F]'

# 3  
Old 11-20-2009
Wrench

Quote:
Originally Posted by mshallop
...
Optimally, I'd like to be able to search and quarantine these files into a directory out of the repository tree, but I'm brickwalling Smilie trying to figure out the search string...
...
The following Perl program, when run in the root directory, will go through all files and subdirectories recursively and move files that have special/non-printable characters to the /tmp directory. Special/non-printable characters for this particular case are all those except "\w", "." and "-".

Code:
$
$ cat -n processfiles.pl
     1  #!/usr/bin/perl -w
     2  # Usage: perl processfiles.pl "<full_path_till_root_directory>"
     3
     4  use File::Find;
     5  @ARGV = qw(.) unless @ARGV;
     6  find sub { $x = $File::Find::name;
     7             $x=~s/[\w.\/-]//g;
     8             if ($x ne "") {
     9               print "File: ",$File::Find::name," will be quarantined.\n" if $x ne "";
    10               `mv "$File::Find::name" /tmp`;
    11  #             `zip -gmT "$ARGV[0]/badlynamedfiles" "$File::Find::name" 1>/dev/null 2>&1`;
    12               print "Done...\n================================\n";
    13             }
    14           }, @ARGV;
    15
$
$

If you comment line 10 and uncomment line 11, then the program uses the native zip utility to add all such files into a zip file called "badlynamedfiles.zip" that is created in the root directory. The files are added to the zip archive and removed, leaving only the good ones behind.

In case of move (mv), the full paths of the moved files are not preserved. So the latest identically named file overwrites the previous one.
In case of zip, the full paths are preserved in the zip archive.

Testing for mv:
Code:
$ 
$ cat -n processfiles.pl
     1  #!/usr/bin/perl -w
     2  # Usage: perl processfiles.pl "<full_path_till_root_directory>"
     3                                                                 
     4  use File::Find;                                                
     5  @ARGV = qw(.) unless @ARGV;                                    
     6  find sub { $x = $File::Find::name;
     7             $x=~s/[\w.\/-]//g;
     8             if ($x ne "") {
     9               print "File: ",$File::Find::name," will be quarantined.\n" if $x ne "";
    10               `mv "$File::Find::name" /tmp`;
    11  #             `zip -gmT "$ARGV[0]/badlynamedfiles" "$File::Find::name" 1>/dev/null 2>&1`;
    12               print "Done...\n================================\n";
    13             }
    14           }, @ARGV;
    15
$
$ pwd
/home/r2d2/data/unixstuff/d02
$
$ perl processfiles.pl "/home/r2d2/data/unixstuff/d02"
File: /home/r2d2/data/unixstuff/d02/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d2/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d2/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d1/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d1/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
$
$ ls -1 /tmp/*.bmp
/tmp/1914OdiN_Presenta_-_o_?.bmp
/tmp/1914OdiN_Presenta_-_o_?.bmp
/tmp/1914OdiN_Presenta_-_o_?.bmp
/tmp/1914OdiN_Presenta_-_o_?.bmp
/tmp/1914OdiN_Presenta_-_o_?.bmp
/tmp/1914OdiN_Presenta_-_o_?.bmp
$
$

Testing for zip:
Code:
$ 
$ cat -n processfiles.pl
     1  #!/usr/bin/perl -w
     2  # Usage: perl processfiles.pl "<full_path_till_root_directory>"
     3                                                                 
     4  use File::Find;                                                
     5  @ARGV = qw(.) unless @ARGV;                                    
     6  find sub { $x = $File::Find::name;                             
     7             $x=~s/[\w.\/-]//g;                                  
     8             if ($x ne "") {                                     
     9               print "File: ",$File::Find::name," will be quarantined.\n" if $x ne "";
    10  #             `mv "$File::Find::name" /tmp`;                                        
    11               `zip -gmT "$ARGV[0]/badlynamedfiles" "$File::Find::name" 1>/dev/null 2>&1`;
    12               print "Done...\n================================\n";                       
    13             }                                                                            
    14           }, @ARGV;
    15
$
$ pwd
/home/r2d2/data/unixstuff/d02
$
$ perl processfiles.pl "/home/r2d2/data/unixstuff/d02"
File: /home/r2d2/data/unixstuff/d02/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d2/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d2/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d1/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
File: /home/r2d2/data/unixstuff/d02/d1/1914OdiN_Presenta_-_o_�.bmp will be quarantined.
Done...
================================
$
$ zip -T badlynamedfiles.zip
test of badlynamedfiles.zip OK
$
$ unzip -l *.zip
Archive:  badlynamedfiles.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  11-20-09 21:55   home/r2d2/data/unixstuff/d02/1914OdiN_Presenta_-_o_�.bmp
        0  11-20-09 21:55   home/r2d2/data/unixstuff/d02/1914OdiN_Presenta_-_o_�.bmp
        0  11-20-09 21:55   home/r2d2/data/unixstuff/d02/d2/1914OdiN_Presenta_-_o_�.bmp
        0  11-20-09 21:55   home/r2d2/data/unixstuff/d02/d2/1914OdiN_Presenta_-_o_�.bmp
        0  11-20-09 21:55   home/r2d2/data/unixstuff/d02/d1/1914OdiN_Presenta_-_o_�.bmp
        0  11-20-09 21:55   home/r2d2/data/unixstuff/d02/d1/1914OdiN_Presenta_-_o_�.bmp
 --------                   -------
        0                   6 files
$
$

Hope that helps,
tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding files with newlines in filename

I want to use grep to find files that have newlines in the filename. For example, I have a directory where I create three files: $ touch file1 $ touch "file 2" $ touch "file > with > newlines" $ find . ./file 2 ./file1 ./file?with?newlinesI now want to pipe the find output into grep and... (4 Replies)
Discussion started by: Ralph
4 Replies

2. Shell Programming and Scripting

Finding the part of a filename

Hi, I am writing an ebuild for Gentoo Linux operating system. Writing an ebuild is about Bash scripting where I am a newbie. So, my ebuild must find a part of a specific filename. Such a filaname my look like this: libvclient_release_x64.so.740and I must to find the number at the and of... (18 Replies)
Discussion started by: csanyipal
18 Replies

3. UNIX for Dummies Questions & Answers

Finding filename based on filecontent

Hi, I have been trying , to find the filename based on some pattern present inside the file My command is as follows: filename=`grep -l 'Pattern' path/*.txt ` Its strange that it works some times, but doesn't print anything some times . But my if test -f $filename is passing all the... (2 Replies)
Discussion started by: Prashanth19
2 Replies

4. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

5. Shell Programming and Scripting

Finding max number in filename and opening it

Hi, I have files named as energy.dat.1 energy.dat.2 energy.dat.3 ... energy.dat.2342 I would like to find the file with maximum number in the filename (ex. energy.dat.2342) and open it. Would you please share your expertize in writing the script? Thanks in advance. (8 Replies)
Discussion started by: rpd25
8 Replies

6. UNIX for Dummies Questions & Answers

finding and moving files based on the last three numerical characters in the filename

Hi, I have a series of files (upwards of 500) the filename format is as follows CC10-1234P1999.WGS84.p190, all in one directory. Now the last three numeric characters, in this case 999, can be anything from 001 to 999. I need to move some of them to a seperate directory, the ones I need to... (5 Replies)
Discussion started by: roche.j.mike
5 Replies

7. Shell Programming and Scripting

Finding files with filename format

hi all, i'm trying to find out how to show files having a particular format. i.e. files o570345.out o5703451.out XX_570345_1.RTF so when i search for files using ls *570345* it shows all three files but actually i don't like to see the second file o5703451.out because 5703451 is... (6 Replies)
Discussion started by: adshocker
6 Replies

8. Shell Programming and Scripting

comm command help with unicode chars in file

Hi, I have a Master file (file.txt) with good and bad records( records with unicode characters). I ahve a file with only bad records (bad.txt) I want the records in file.txt which are not present in bad.txt ie only the good records. I tried comm -23 file.txt bad.txt It is giving... (14 Replies)
Discussion started by: ashwin3086
14 Replies

9. Shell Programming and Scripting

Filename from splitting files to have the same filename of the original file with counter value

Hi all, I have a list of xml file. I need to split the files to a different files when see the <ko> tag. The list of filename are B20090908.1100-20090908.1200_CDMA=1,NO=2,SITE=3.xml B20090908.1200-20090908.1300_CDMA=1,NO=2,SITE=3.xml B20090908.1300-20090908.1400_CDMA=1,NO=2,SITE=3.xml ... (3 Replies)
Discussion started by: natalie23
3 Replies

10. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
Login or Register to Ask a Question