XML to TXT or CSV


 
Thread Tools Search this Thread
# 1  
Question XML to TXT or CSV

Hi all,

I am new to unix and even newer to XML Smilie

I have a dataset which I need to work on and extract data from but I cant even see things. its a XML file which i need to analyse and return the results in xml as well but need to filter some of them like i would do with excel file so not sure what is the right thing to do

Can you please help me?

my data looks like this

Code:
<?xml version="1.0" encoding="UTF-8" ?>
<archives>
  <archive id="ffghgsddes">
    <file line="1">
      <author>953b</author>
      <time>18:03</time>
      <text>this is an evidence regarding ...</text>
    </file>
    <file line="2">
      <author>04bfa</author>
      <time>18:03</time>
      <text>we have seen those documents before </text>
    </file>
     . 
     . 
     .
 </archive>
 <archive>
 .
 .
 .
 </archive>
</archives>

it can look anything really
i dont really need the data to be normalized
i can have them separated by archive or simply having it all in one line

Thank you in advance for your help

A-V
# 2  
I'll give you what you asked for then, though it doesn't seem very useful to me:

Code:
$ sed 's/<[^>]*>//g' xml |  tr -d '\n' ; echo
            953b      18:03      this is an evidence regarding ...              04bfa      18:03      we have seen those documents before          .      .      .   . . .

$

I think you'll have to be more specific on what you need your output to look like.
This User Gave Thanks to Corona688 For This Post:
# 3  
wow thanks for the help

I would like to have it in a file that I can covert it to excel where I can work around the columns

the file would have a line per file line

Archive idLine AuthorTime Text
Ffghgsddes1953b18:03this is an evidence regarding ...
  204bfa18:03we have seen those documents before
Fggsdfjrrrcf1Ggfv522:43Will you consider such an offer?
  2

is there a chance that it can be saved to another file instead of showing on the screen?
# 4  
Any noninteractive output that prints to a terminal can be saved to file. Just command > filename after a single command, or ( group; of; commands ) > filename to capture several.

Now that you've posted what you actually want I am working on it.
# 5  
Handling arbitrary XML isn't trivial. Hopefully this should be flexible and mold itself to your input data, since it discovers columns as it goes and tries to preserve order. It decides where a 'row' is by looking for two close-tags in a row.

If it doesn't work, try nawk. If it still doesn't work, post some of your actual, unmodified input data.

Code:
$ cat xmlg.awk

BEGIN { RS="<";         FS=">"; ORS="\r\n"  }

# Skip weird XML specification lines or blank records
/^\?/ || /^$/   {       next    }

# Handle close tags
/^[/]/  {
        N=D;    while((N>0) && ("/"STACK[N] != $1))     N--;

        if("/"STACK[N] == $1)   D=(N-1);
        POP++;

        if(POP == 2)
        {
                if(!HEADER++)
                {
                        split(ARG[1], Z, SUBSEP);
                        printf("%s %s", Z[2], Z[3]);
                        for(N=2; N<=ARG_; N++)
                        {
                                split(ARG[N], Z, SUBSEP);
                                printf("|%s %s", Z[2], Z[3]);
                        }

                        printf("\n");
                }

                printf("%s", DATA[ARG[1]]);
                for(N=2; N<=ARG_; N++)
                        printf("|%s", DATA[ARG[N]]);
                printf("\n");
        }
        next
}

# Handle open tags
{
        gsub(/^[ \r\n\t]*/, "", $2);    # Whitespace isn't data
        gsub(/[ \r\n\t]*$/, "", $2);

        # Reset parameters
        POP=0;

        M=split($1, A, " ");
        STACK[++D]=A[1];

        # Handle parameters
        Q=split(A[2], B, " ");
        for(N=1; N<=Q; N++)
        {
                split(B[N], C, "=");
                gsub(/['"]/,"", C[2]);
#               PARAM[C[1]]=C[2];
#               print C[1], "=", PARAM[C[1]];

                I=D SUBSEP STACK[D] SUBSEP C[1];
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=C[2];
        }

        if($2)
        {
                I=D SUBSEP STACK[D] SUBSEP "CDATA";
                if(!SEEN[I]++)
                        ARG[++ARG_]=I;

                DATA[I]=$2;
        }
}

$ awk -f xmlg.awk file.xml

archive id|file line|author CDATA|time CDATA|text CDATA
ffghgsddes|1|953b|18:03|this is an evidence regarding ...
ffghgsddes|2|04bfa|18:03|we have seen those documents before
jhljkhlasdf|1|953b|18:03|this is an evidence regarding ...
jhljkhlasdf|2|04bfa|18:03|we have seen those documents before

$ awk -f xmlg.awk file.xml > output.txt

ORS="\r\n" should make it more easily importable into excel or what have you.
# 6  
I am going you a standing ovation for this masterpiece. Smilie

After wondering around saxon and xslt and sed and millions of other unfamiliar things. It was hard to believe that it would be possible to do.

Thank you very much, this was a massive help to me. And the comments are so clear that a none programmer like me can understand it.

Really appreciate your help

Image

---------- Post updated 05-16-12 at 11:05 AM ---------- Previous update was 05-15-12 at 06:28 PM ----------

Hello again ...

one more question ...

I want to filter my document so it will not include x or y or z
i am using GREP -V but havnt find the way to combine it together

can you help me in this as well
# 7  
egrep -v "x|y|z" ?

If that's not sufficient, please post a new thread with details about the input you have and output you want.
This User Gave Thanks to Corona688 For This Post:
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #584
Difficulty: Easy
Binary files generally requires less space and are more efficient to process.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Using awk for converting xml to txt

Hi, I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes My xml script (Workorders.xml) is shown like below: <?xml version="1.0" encoding="utf-8" ?> <scbm-extract version="3.3">... (8 Replies)
Discussion started by: Viswanatheee55
8 Replies

2. UNIX for Dummies Questions & Answers

Need help converting txt to XML

I have a table as following Archive id Line Author Time Text 1fjj34 3 75jk5l 03:20 this is an evidence regarding ... 1fjj34 4 gjhhtrd 03:21 we have seen those documents before 1fjj34 10 645jmdvvb 04:00 Will you consider such an offer?... (0 Replies)
Discussion started by: A-V
0 Replies

3. UNIX for Dummies Questions & Answers

Help with a project. convert a txt to csv

Hi people. I've finally converted to linux, and I'm starting to explore the amazing capabilities of the terminal. At the moment in trying to learn how to extract text using the "grep" and "sed" command. I decided to learn by trying to figure out how to solve a practical problem. I have a schedule... (4 Replies)
Discussion started by: kugalskaper
4 Replies

4. Shell Programming and Scripting

txt file to CSV

hi.. I have a text file which looks likes this 2258 4569 1239 258 473 i need to convert it into comma seperated format eg:2258,4569,1239,258,437 pls help (8 Replies)
Discussion started by: born
8 Replies

5. Shell Programming and Scripting

.PDF and .TXT to .XML. Is it possible?

Hi! I need to realize this task. In folder i have such files: name1.txt name1.pdf name2.txt name2.pdf etc... I want to scan this folder, match files with same name (name1.txt with name1.pdf, name2.txt with name2.pdf) and create files name1.xml and name2.xml, based on it. i.e: i want... (13 Replies)
Discussion started by: optik77
13 Replies

6. Shell Programming and Scripting

Convert txt to csv

Hi - I am looking to convert the following text to csv. The columns may not always have data in them and they may have varying spaces but I still need to have a comma there anyway: Sample Data: ~~~~~~~ Name Email Location Phone Tom... (4 Replies)
Discussion started by: JPBovaird
4 Replies

7. Shell Programming and Scripting

Parsing txt, xml files and preparing csv file

Hi, I need to parse text, xml files to get the statistic numbers and prepare summary csv file. What is the best way to parse these file and prepare csv file. Any idea you have , please? Regards, (2 Replies)
Discussion started by: LinuxLearner
2 Replies

8. Shell Programming and Scripting

Converting txt file in csv

HI All, I have a text file memory.txt which has following values. Average: 822387 7346605 89.93 288845 4176593 2044589 51883 2.47 7600 i want to convert this file in csv format and i am using following command to do it. sed s/_/\./g <... (3 Replies)
Discussion started by: mkashif
3 Replies

9. Shell Programming and Scripting

Txt to csv convert

Hi, I was trying some split command to pull out values like "uid=abc,ou=INTERNAL,ou=PEOPLE" into a csv file. However because of erratic nature of occurrance of rows made me stopped. Could someone help me in this? and if someone has a one liner for this? The text file contain pattern like this... (14 Replies)
Discussion started by: john_prince
14 Replies

10. Shell Programming and Scripting

AWK CSV to TXT format, TXT file not in a correct column format

HI guys, I have created a script to read 1 column in a csv file and then place it in text file. However, when i checked out the text file, it is not in a column format... Example: CSV file contains name,age aa,11 bb,22 cc,33 After using awk to get first column TXT file... (1 Reply)
Discussion started by: mdap
1 Replies

Featured Tech Videos