Perl script to sort an Excel file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl script to sort an Excel file
# 8  
Old 01-14-2010
Hey, degoor:

In the octal dump that follows, you can clearly tell the difference between the bulgarian and the english. The bulgarian is rendered as mostly octal byte values, and as expected there are plenty of pairs beginning with 320 and 321.

Code:
$ od -c text_sample.txt 
0000000    T   e   x   t   I   d  \t   T   e   x   t   D   e   s   c   r
0000020    i   p   t   i   o   n  \t   E   x   p   l   a   n   a   t   i
0000040    o   n  \t   S   i   g   n   a   t   u   r   e  \t   T   r   a
0000060    n   s   l   a   t   o   r       Q   u   e   s   t   i   o   n
0000100   \t   C   u   s   t   o   m   e   r       A   n   s   w   e   r
0000120   \t   F   a   k   e  \r  \n   3  \t 320 232 320 276 320 274 321
0000140  203 320 275 320 270 320 272 320 260 321 206 320 270 320 276 320
0000160  275 320 265 320 275     320 264 320 265 321 204 320 265 320 272
0000200  321 202   !  \t  \t  \t  \t  \t   1  \r  \n   1   0  \t 320 237
0000220  321 200 320 276 320 262 320 265 321 200 320 272 320 260 321 202
0000240  320 260     320 265     320 277 321 200 320 270 320 272 320 273
0000260  321 216 321 207 320 270 320 273 320 260   ,     321 201 320 270
0000300  321 201 321 202 320 265 320 274 320 260 321 202 320 260     320
0000320  265     320 236 320 232  \t  \t  \t  \t  \t   1  \r  \n   1   4
0000340   \t 320 230 321 201 320 272 320 260 321 202 320 265     320 273
0000360  320 270     320 264 320 260     320 270 320 267 321 202 321 200
0000400  320 270 320 265 321 202 320 265   ?      \t  \t  \t  \t  \t   1
0000420   \r  \n   4   6   5   5  \t   A   d   d       k   e   y  \t  \t
0000440    M   a   J  \t  \t  \t   1  \r  \n   4   6   5   6  \t   M   o
0000460    d   u   l   e  \t  \t   M   a   J  \t  \t  \t   1  \r  \n   4
0000500    6   5   7  \t   A   v   a   i   l   a   b   l   e  \t  \t   M
0000520    a   J  \t  \t  \t   1  \r  \n                                
0000530

I did however notice a little "problem" when looking at an octal dump of your file: your data uses \r\n line endings (which is expected from a DOS/Windows machine). Under the cygwin environment, when reading a line of input, \r\n pairs are consumed. By default, on output, AWK will separate records using a sole \n. Your file is being converted from DOS/Windows format to UNIX. Perhaps that is causing your postprocessing to fail?

I tested the following under Cygwin using a WinXP SP2 system and it works as expected (an octal dump of the script's output confirmed that the \r\n terminators are intact):

Code:
$ awk '{print $0 "\t" (/[\320-\323]/ ? "bg" : "en")}' ORS='\r\n' text_sample.txt 
TextId  TextDescription Explanation     Signature       Translator Question     Customer Answer Fake    en
3       Комуникационен дефект!                                  1       bg
10      Проверката е приключила, системата е ОК                                 1       bg
14      Искате ли да изтриете?                                  1       bg
4655    Add key         MaJ                     1       en
4656    Module          MaJ                     1       en
4657    Available               MaJ                     1       en

NOTE: The AWK above will NOT give correct results on a UNIX/Linux system. It will insert a tab followed by either "en" or "bg" and a "\r" between incoming \r\n pairs.

Hopefully, we've arrived at the end of this odyssey Smilie

Regards,
alister


P.S.

In case you aren't familiar with C string escape sequences I used above, and the differing ways that unix and windows terminate a line in a text file:

\r = carriage return
\n = line feed
\t = tab

windows uses \r\n to terminate lines.
unix uses \n
# 9  
Old 01-14-2010
Debian

alister:

Thank you very much for your effort and persistence.

Your code from the very beginning was quite all right.

The problem was the cygwin locale setting. Once I set it to bg_BG.CP1251 it worked OK.

(Technically that is done through adding a line to C:\cygwin\Cygwin.bat so that it looks like this:
Code:
@echo off

C:
chdir C:\cygwin\bin
set LANG=bg_BG.CP1251
bash --login -i

)

When I used your second command line
Code:
$ awk '{print $0 "\t" (/[\320-\323]/ ? "bg" : "en")}' ORS='\r\n' text_sample.txt

I got an extra empty line after every original line so I used your initial command.

Thanks again! Great job, alister!
# 10  
Old 01-14-2010
You're very welcome
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to sort a column in excel/csv file?

I have to sort the 4th column of an excel/csv file. I tried the following command sort -u --field-separator=, --numeric-sort -k 2 -n dinesh.csv > test.csv But, it's not working. Moreover, I have to do the same for more than 30 excel/csv file. So please help me to do the same. (6 Replies)
Discussion started by: dineshkumarsrk
6 Replies

2. Shell Programming and Scripting

Perl script to fill the entire row of Excel file with color based on pattern match

Hi All , I have to write one Perl script in which I need to read one pre-existing xls and based on pattern match for one word in some cells of the XLS , I need to fill the entire row with one color of that matched cell and write the content to another excel Please find the below stated... (2 Replies)
Discussion started by: kshitij
2 Replies

3. Shell Programming and Scripting

Tabbed multiple csv files into one single excel file with using shell script not perl

Hi Experts, I am querying backup status results for multiple databases and getting each and every database result in one csv file. so i need to combine all csv files in one excel file with separate tabs. I am not familiar with perl script so i am using shell script. Could anyone please... (4 Replies)
Discussion started by: ramakrk2
4 Replies

4. Shell Programming and Scripting

Script to generate Excel file or to SQL output data to Excel format/tabular format

Hi , i am generating some data by firing sql query with connecting to the database by my solaris box. The below one should be the header line of my excel ,here its coming in separate row. TO_CHAR(C. CURR_EMP_NO ---------- --------------- LST_NM... (6 Replies)
Discussion started by: dani1234
6 Replies

5. Shell Programming and Scripting

Perl script to Merge contents of 2 different excel files in a single excel file

All, I have an excel sheet Excel1.xls that has some entries. I have one more excel sheet Excel2.xls that has entries only in those cells which are blank in Excel1.xls These may be in different workbooks. They are totally independent made by 2 different users. I have placed them in a... (1 Reply)
Discussion started by: Anamika08
1 Replies

6. Shell Programming and Scripting

Writing excel file using perl : Excel file formatting changed

I am trying to create a program where user can input data in certain excel cells using user interface on internet....the programming is on perl and server is unix But when i parse data into excel the formatting of sheets is turned to default and all macro coding removed. What to do...Please... (7 Replies)
Discussion started by: mud_born
7 Replies

7. UNIX and Linux Applications

Perl Script to read an excel file into an array and search in the UNIX directories

Hi, I want the Perl script with versions 5.8.2 and 5.8.5 starting with #!/usr/bin/perl The Perl program should read the excel file or text file line by line and taking into an array and search in the UNIX directories for reference file of .jsp or .js or .xsl with path .The Object names... (2 Replies)
Discussion started by: pasam
2 Replies

8. Shell Programming and Scripting

Sort Excel File

Hi, I am new to scripting. I have a requirement 1.convert excel 2007 format to excel 2003 fromat 2.sort the excel file -( all columns in descending order). This needs to be done progarmmatically. Please let me know what would be the possible steps that I would need to take inorder to... (2 Replies)
Discussion started by: eva13
2 Replies

9. Shell Programming and Scripting

How to sort columns in excel(csv) file

i want sort columns with headers based on another file headers file1 eg: i'm having an empty file with only coumn names like lastname firstname title expirydate stlcno status etc... another file with same column names and some other as well but in different order... file2 eg:firstname... (2 Replies)
Discussion started by: Man83Nagesh
2 Replies

10. Shell Programming and Scripting

PERL: Split Excel Workbook to Indiv Excel files

Hi, I am trying to find a way to read an excel work book with multiple worksheets. And write each worksheet into a new excel file using perl. My environment is Unix. For example: I have an excel workbook TEST.xls and it has Sheet1, Sheet2, Sheet3 worksheets. I would like to create... (2 Replies)
Discussion started by: sandeep78
2 Replies
Login or Register to Ask a Question