how to read strings from a txt and put them into a spreadsheet?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting how to read strings from a txt and put them into a spreadsheet?
# 1  
Old 02-27-2012
how to read strings from a txt and put them into a spreadsheet?

i have hundreds of thousands of txt files as below,

Code:
         RADARSAT 1 SCENE DESCRIPTION
                    
SCENE_ID           c0005098
MDA ORDER NUMBER   
GEOGRAPHICAL AREA  CIS ScanSar Canada
SCENE START TIME   APR 02 1997 23:05:10.222
SCENE STOP TIME    APR 02 1997 23:02:49.695
ORBIT              7366 ASCENDING
ORBIT DATA TYPE    PREDICTED
APPLICATION LUT    ICE
BEAM MODE          (A) W1 W2 W3 S7
PRODUCT TYPE       SCANSAR WIDE (SCW)
FORMAT             RADARSAT CEOS
# OF IMAGE LINES   9280
# OF IMAGE PIXELS  5504
PIXEL SPACING      100.000 m
SCENE CENTRE       63°17'N   82°32'W
CORNER COORDINATES:
                    
       58°39'34.57" N      59°35'47.62" N
       85°27'10.00" W      76°00'25.94" W
  
       66°42'13.92" N      67°47'07.99" N
       90°46'18.26" W      78°18'44.20" W
  
************************* IMPORTANT *********************
ALL DESCENDING SCANSAR PRODUCTS ARE EAST-WEST INVERTED
For information on RADARSAT CEOS format see README.TXT
(c) Canadian Space Agency 1997 Agence spatiale canadienne. Processed and distributed
by MDA Geospatial Services Inc.


I'm trying to read information from these txts, such as, scene_id, coordinates, formats, etc. and put these information into a spreadsheet.

The problem is these information are in different formats. for example, scene_id is strings, MDA ORDER NUMBER is Null, start time and stop time is date and time format, etc. The most troubling problem is how to read the coordinates and keep them as coordinates in the spreadsheet without changing them into strings, especially the degree simpols.

The spreadsheet would look like this:
Code:
SCENE_ID	MDA ORDER NUMBER	GEOGRAPHICAL AREA	START DATE	START TIME	STOP TIME	ORBIT	etc......
c0005098	Null			CIS ScanSar Canada	1997-04-02	23:05:10	23:02:50	7366 ASCENDING	etc....

Thanks so much!

Last edited by sunnydanniel; 02-27-2012 at 06:22 PM..
# 2  
Old 02-27-2012
Hi sunnydanniel,

Put here the spreadsheet you want as output. It will be helpful for anyone who would like to help you.

Regards,
Birei
This User Gave Thanks to birei For This Post:
# 3  
Old 02-27-2012
That's a pretty ugly input file, unless it was originally tab-separated and ruined by copy-pasting it into a web postbox, but:

Code:
$ cat radar.awk

BEGIN {
        OFS=","
        # excel expects \r\n terminators for csv, etc
        ORS="\r\n"
        A[1]="SCENE_ID"
        A[2]="GEOGRAPHICAL AREA"
        A[3]="SCENE START TIME"
        A[4]="SCENE STOP TIME"
        A[5]="ORBIT DATA TYPE"
        A[6]="ORBIT"
        A[7]="APPLICATION LUT"
        A[8]="BEAM MODE"
        A[9]="PRODUCT TYPE"
        A[10]="FORMAT"
        A[11]="# OF IMAGE LINES"
        A[12]="# OF IMAGE PIXELS"
        A[13]="PIXEL SPACING"
        A[14]="SCENE CENTRE"
}

{       for(N=1; A[N]; N++)
        {
                R="^" A[N];
                if($0 ~ R)
                {
                        sub(R, "");
                        sub(/^[ \t]*/, "");
                        D[N]=$0
                        break
                }
        }
}

/^CORNER COORDINATES:/ {

        for(M=1; M<=2; M++)
        {
                getline

                for(N=1; N<=2; N++)
                {
                        getline
                        B[++L]=$1" "$2;
                        B[++L]=$3" "$4;
                }
        }
}

END {
        for(N=1; A[N]; N++)     $N=A[N];
        for(M=1; M<=L; M++)     $(N++)="Corner "M

        print;

        for(N=1; A[N]; N++)
        {
                if(!D[N]) D[N]="Null";
                $N=D[N];
        }

        for(M=1; M<=L; M++)     $(N++)=B[M];
        print
}

$ awk -f radar.awk data
SCENE_ID,GEOGRAPHICAL AREA,SCENE START TIME,SCENE STOP TIME,ORBIT DATA TYPE,ORBIT,APPLICATION LUT,BEAM MODE,PRODUCT TYPE,FORMAT,# OF IMAGE LINES,# OF IMAGE PIXELS,PIXEL SPACING,SCENE CENTRE,Corner 1,Corner 2,Corner 3,Corner 4,Corner 5,Corner 6,Corner 7,Corner 8
c0005098,CIS ScanSar Canada,APR 02 1997 23:05:10.222,APR 02 1997 23:02:49.695,PREDICTED,7366 ASCENDING,ICE,(A) W1 W2 W3 S7,SCANSAR WIDE (SCW),RADARSAT CEOS,9280,5504,100.000 m,63°17'N   82°32'W,58°39'34.57" N,59°35'47.62" N,85°27'10.00" W,76°00'25.94" W,66°42'13.92" N,67°47'07.99" N,90°46'18.26" W,78°18'44.20" W

$

It was able to import into Excel with degree symbols intact. Adjust OFS to taste.
This User Gave Thanks to Corona688 For This Post:
# 4  
Old 02-28-2012
Quote:
Originally Posted by Corona688
That's a pretty ugly input file, unless it was originally tab-separated and ruined by copy-pasting it into a web postbox, but:

Code:
$ cat radar.awk

BEGIN {
        OFS=","
        # excel expects \r\n terminators for csv, etc
        ORS="\r\n"
        A[1]="SCENE_ID"
        A[2]="GEOGRAPHICAL AREA"
        A[3]="SCENE START TIME"
        A[4]="SCENE STOP TIME"
        A[5]="ORBIT DATA TYPE"
        A[6]="ORBIT"
        A[7]="APPLICATION LUT"
        A[8]="BEAM MODE"
        A[9]="PRODUCT TYPE"
        A[10]="FORMAT"
        A[11]="# OF IMAGE LINES"
        A[12]="# OF IMAGE PIXELS"
        A[13]="PIXEL SPACING"
        A[14]="SCENE CENTRE"
}

{       for(N=1; A[N]; N++)
        {
                R="^" A[N];
                if($0 ~ R)
                {
                        sub(R, "");
                        sub(/^[ \t]*/, "");
                        D[N]=$0
                        break
                }
        }
}

/^CORNER COORDINATES:/ {

        for(M=1; M<=2; M++)
        {
                getline

                for(N=1; N<=2; N++)
                {
                        getline
                        B[++L]=$1" "$2;
                        B[++L]=$3" "$4;
                }
        }
}

END {
        for(N=1; A[N]; N++)     $N=A[N];
        for(M=1; M<=L; M++)     $(N++)="Corner "M

        print;

        for(N=1; A[N]; N++)
        {
                if(!D[N]) D[N]="Null";
                $N=D[N];
        }

        for(M=1; M<=L; M++)     $(N++)=B[M];
        print
}

$ awk -f radar.awk data
SCENE_ID,GEOGRAPHICAL AREA,SCENE START TIME,SCENE STOP TIME,ORBIT DATA TYPE,ORBIT,APPLICATION LUT,BEAM MODE,PRODUCT TYPE,FORMAT,# OF IMAGE LINES,# OF IMAGE PIXELS,PIXEL SPACING,SCENE CENTRE,Corner 1,Corner 2,Corner 3,Corner 4,Corner 5,Corner 6,Corner 7,Corner 8
c0005098,CIS ScanSar Canada,APR 02 1997 23:05:10.222,APR 02 1997 23:02:49.695,PREDICTED,7366 ASCENDING,ICE,(A) W1 W2 W3 S7,SCANSAR WIDE (SCW),RADARSAT CEOS,9280,5504,100.000 m,63°17'N   82°32'W,58°39'34.57" N,59°35'47.62" N,85°27'10.00" W,76°00'25.94" W,66°42'13.92" N,67°47'07.99" N,90°46'18.26" W,78°18'44.20" W

$

It was able to import into Excel with degree symbols intact. Adjust OFS to taste.
Thank you so much for your answer!
But i have some questions about your script.
The first line,
Code:
cat radar.awk

do you save the file as radar.awk or it has something to do with the awk command?
after that, i can understand your loop, but is it in perl? i tried copy paste and pops a number of errors.
and how the data is stored into a new file or only a change to the original text file?

thanks again!
# 5  
Old 02-28-2012
Like shown, it's a textfile named radar.awk

It's not perl, it's awk.

You run it as shown,
Code:
awk -f radar.awk filename

The data is printed to the screen, as shown.

Code:
awk -f radar.awk filename

would print
Code:
SCENE_ID,GEOGRAPHICAL AREA,SCENE START TIME,SCENE STOP TIME,ORBIT DATA TYPE,ORBIT,APPLICATION LUT,BEAM MODE,PRODUCT TYPE,FORMAT,# OF IMAGE LINES,# OF IMAGE PIXELS,PIXEL SPACING,SCENE CENTRE,Corner 1,Corner 2,Corner 3,Corner 4,Corner 5,Corner 6,Corner 7,Corner 8
c0005098,CIS ScanSar Canada,APR 02 1997 23:05:10.222,APR 02 1997 23:02:49.695,PREDICTED,7366 ASCENDING,ICE,(A) W1 W2 W3 S7,SCANSAR WIDE (SCW),RADARSAT CEOS,9280,5504,100.000 m,63°17'N   82°32'W,58°39'34.57" N,59°35'47.62" N,85°27'10.00" W,76°00'25.94" W,66°42'13.92" N,67°47'07.99" N,90°46'18.26" W,78°18'44.20" W

---------- Post updated at 01:07 PM ---------- Previous update was at 01:03 PM ----------

Since you say you have millions of files, I made an improved version:

Code:
# radar2.awk

BEGIN {
        OFS=","
        # excel expects \r\n terminators for csv, etc
        ORS="\r\n"
        A[1]="SCENE_ID"
        A[2]="MDA ORDER NUMBER"
        A[3]="GEOGRAPHICAL AREA"
        A[4]="SCENE START TIME"
        A[5]="SCENE STOP TIME"
        A[6]="ORBIT"
        A[7]="ORBIT DATA TYPE"
        A[8]="APPLICATION LUT"
        A[9]="BEAM MODE"
        A[10]="PRODUCT TYPE"
        A[11]="FORMAT"
        A[12]="# OF IMAGE LINES"
        A[13]="# OF IMAGE PIXELS"
        A[14]="PIXEL SPACING"
        A[15]="SCENE CENTRE"
}

function printline()
{
        OLD=$0

        for(N=1; A[N]; N++)
        {
                if(!D[N]) D[N]="Null";
                $N=D[N];
        }

        for(M=1; M<=L; M++)     $(N++)=B[M];
        print

        $0=OLD
}

# Clear out old data whe files change
(FNR == 1) && (NR != 1) {
        printline()
        for(N=1; A[N]; N++) D[N]=""
}

/^CORNER COORDINATES:/ {
        L=0
        for(M=1; M<=2; M++)
        {
                getline

                if(FNR==1) next;

                for(N=1; N<=2; N++)
                {
                        getline
                        if(FNR==1) next;
                        B[++L]=$1" "$2;
                        B[++L]=$3" "$4;
                }
        }
}

# Check lines for data
(FNR>2) {
        N=FNR-2
        R="^" A[N];
        if($0 ~ R)
        {
                sub(R, "");
                sub(/^[ \t]*/, "");
                D[N]=$0
        }
        else print "!!!FAIL!!!", FNR, R, $0;
}

END {
        if(NR < 1)
        {
                for(N=1; A[N]; N++)     $N=A[N];
                for(M=1; M<=8; M++)     $(N++)="Corner "M;
                print;
        }
        else
                printline();
}

Code:
awk -f radar2.awk file1 file2 ...

would print

Code:
c0005098,CIS ScanSar Canada,APR 02 1997 23:05:10.222,APR 02 1997 23:02:49.695,PREDICTED,7366 ASCENDING,ICE,(A) W1 W2 W3 S7,SCANSAR WIDE (SCW),RADARSAT CEOS,9280,5504,100.000 m,63°17'N   82°32'W,58°39'34.57" N,59°35'47.62" N,85°27'10.00" W,76°00'25.94" W,66°42'13.92" N,67°47'07.99" N,90°46'18.26" W,78°18'44.20" W
line from file2
line from file3
...

...all this assuming the lines come in the exact same order in every sat file. If this isn't true, it may break down.
This User Gave Thanks to Corona688 For This Post:
# 6  
Old 02-28-2012
it works great!!!

thank you!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Please Help! Need to put the lines of a txt to one line

Hi all, I'm quite newbie in shell scripting but I found a problem what I cant solve. I have a .txt file which looks like this: /22/ /23/ /24/ and so on and I'd need to make it look like this: /22/|/23/|/24/|...and so on. these numbers are growing and has lines like this /2a/ as well.... (15 Replies)
Discussion started by: gergo235
15 Replies

2. Shell Programming and Scripting

How to find lines in a .txt contains the strings I want

I have a .txt contains a lot of lines. Now I want to write a shell script to find out all the lines which contain the strings I want, and print these lines. For example: A.txt when you post any code you can easily do this highlighting your code and then click you should do a Google... (6 Replies)
Discussion started by: Henryyy
6 Replies

3. UNIX for Advanced & Expert Users

Find and replace txt between two strings in flat file

Hi There... I need to serach and replace strngs in a text file. My file has; books.amazon='Let me read' news.bestseller='xyz' expected output is books.amazon=NONFOUND news.bestseller=NONFOUND Can I first find the text between string1= books.amazon=' and string2= ' (locate the text... (1 Reply)
Discussion started by: Hiano
1 Replies

4. Shell Programming and Scripting

Comparing Strings in 2 .csv/txt files?

EDIT: My problems have been solved thanks to the help of bartus11 and pravin27 This code is just to help me learn. It serves no purpose other than that. Here's a sample csv that I'm working with - #listofpeeps.csv Jackie Chan,1954,M Chuck Norris,1930,M Bruce Lee,1940,M This code is... (13 Replies)
Discussion started by: chickeneaterguy
13 Replies

5. Shell Programming and Scripting

sed to read x.txt and grep from y.txt

How would I write a command(s) to read from a file (list) that looks like this: 29847374384 and grep from a second file (list) that looks like this: 29847374384, jkdfkjdf,3833,ddd:confused: (1 Reply)
Discussion started by: smellylizzard
1 Replies

6. Shell Programming and Scripting

Grep strings from file and put in Column

Dear Experts, My file contains below- GET:SUB:ISI,432350414557432; RESP:0:MD,019352020633:ISI,432350414557432:T11,1:T21,1:T22,1:B16,1:T62,1:BAIC,0:BAOC,1:BOIC,0:BIRO,0:BORO,0:PAID,1; GET:SUB:ISI,432350414581060;... (2 Replies)
Discussion started by: thepurple
2 Replies

7. UNIX for Advanced & Expert Users

Put strings around the file name

I have a file(old file) with names in it like Jolly Mirani Han Lee stuart Lloyd I want string to be added to these names . New file should look like 'jolly mirani' 'han lee' 'stuart lloyd' I tried using sed command but doesn't work for string but it do work for paranthesis. Can some one... (3 Replies)
Discussion started by: Jolly
3 Replies

8. Shell Programming and Scripting

How to read a line and put it into 3 variables

Hi All, I'll get a file whose 2nd line contains 3 fields: filename(variable length), file size char(10), and record count int(10). How do I cut it and put it into 3 variables? eg: abcd.csv01234567891111111111 now I want: $one = abcd.csv, $two = 0123456789, $three = 1111111111. I also... (8 Replies)
Discussion started by: Mandab
8 Replies

9. Shell Programming and Scripting

How to input .txt file into .xls spreadsheet

I need to take the totals from my script and input them into a excel spreadsheet. Right now, I just copy and paste. Is there an easier way? 3906 is the total jobs in ABEND state 4005 is the total jobs in SUCC state 1050 is the total jobs in HOLD state (1 Reply)
Discussion started by: wereyou
1 Replies

10. Shell Programming and Scripting

read a part of information from txt and put into the script

Hi, I have a name.txt which is stored: APPLE ORANGE RED BLUE GREEN and my script is: $name=`cat name.txt for file_number in `ls 1 /appl/CH_DATA/archive/CACHE/CDBACKUP$name*.archived however, my script cannot read name.txt and put into my scrip line, I would like the output is to... (18 Replies)
Discussion started by: happyv
18 Replies
Login or Register to Ask a Question