Field delimited data to XML


 
Thread Tools Search this Thread
Top Forums Programming Field delimited data to XML
# 1  
Old 11-10-2013
Field delimited data to XML

Hi,

We need to produce a XML file based on a record/field delimited data file. At this point we could just script something out but I would like to ask the community what would be the best choice of programming language to do this, in terms of performance of execution, and in terms of complexity of implementation.

We can do this using any method but I´m looking for the best way of handling it.

Thanks.
# 2  
Old 11-26-2013
delimited by what?

Writing xml is easy, pick your favorite fast language. I like C.
# 3  
Old 11-27-2013
As usual the "best" way depends on what you want. Everyone says XML is easy but there's usually a few dozen "oh by the way" clauses that get tacked on since xml is so flexible and plastic.

Asking which is the 'best' language is more a religious discussion than anything.

So, input you have, output you want, please. Don't leave out anything important like "oh by the way the number of columns is variable".
# 4  
Old 11-27-2013
delimited implies delimiting characters? That argues toward awk or shell.
# 5  
Old 12-09-2013
All fields are delimited by a pipe character.

C/C++ should be fastest, however I was considering Perl or any variant given the data structure and the fact we´re on a Redhat machine.
# 6  
Old 12-09-2013
Again, how to do it depends on the output you want. XML is flexible enough for the answer to be anything from trivial to hellish, depending on your requirements.
# 7  
Old 12-10-2013
Ok I´ll drop an example to illustrate the sort of data we would be looking after.
Code:
With each occurence of Record_key_1 identifying a new object, all subsequent records referring to that object until a new object is defined.

Sample input:
Record_key_1 | A | B | C |
Record_key_2 | 0 | 1 |
Record_key_2 | 2 | 3 |
Record_key_2 | 4| 5 |
Record_key_3 | a | b | c | d |
Record_key_3 | e | f | g | h |
Record_key_1 | D | E | F |
Record_key_2 | 6 | 7 |
Record_key_2 | 8 | 9 |
Record_key_3 | i | j | k | l |

= 2 objects total

Sample output:
<Object>
   <Object Properties>
      <Properties_1>A</Properties_1>
      <Properties_2>B</Properties_2>
      <Properties_3>C</Properties_3>
   </Object Properties>
   <Sub-object>
      <Sub-object characteristics_1>0</Sub-object characteristics_1>
      <Sub-object characteristics_2>1</Sub-object characteristics_2>
   </Sub-object>
   <Sub-object>
      <Sub-object characteristics_1>2</Sub-object characteristics_1>
      <Sub-object characteristics_2>3</Sub-object characteristics_2>
   </Sub-object>
   <Sub-object>
      <Sub-object characteristics_1>4</Sub-object characteristics_1>
      <Sub-object characteristics_2>5</Sub-object characteristics_2>
   </Sub-object>
   <Object user>
      <User details_1>a</User details_1>
      <User details_2>b</User details_2>
      <User details_3>c</User details_3>
      <User details_4>d</User details_4>
   </Object user>
   <Object user>
      <User details_1>e</User details_1>
      <User details_2>f</User details_2>
      <User details_3>g</User details_3>
      <User details_4>h</User details_4>
   </Object user>
</Object>

<Object>
   <Object Properties>
      <Properties_1>D</Properties_1>
      <Properties_2>E</Properties_2>
      <Properties_3>F</Properties_3>
   </Object Properties>
   <Sub-object>
      <Sub-object characteristics_1>6</Sub-object characteristics_1>
      <Sub-object characteristics_2>7</Sub-object characteristics_2>
   </Sub-object>
   <Sub-object>
      <Sub-object characteristics_1>8</Sub-object characteristics_1>
      <Sub-object characteristics_2>9</Sub-object characteristics_2>
   </Sub-object>
   <Object user>
      <User details_1>i</User details_1>
      <User details_2>j</User details_2>
      <User details_3>k</User details_3>
      <User details_4>l</User details_4>
   </Object user>
</Object>

I´m wondering where to store the mapping configuration so it can be picked up by the program rather than relying on a large number of conditional statements based on the record key value and the field_ID to deduce the tags to use.

My initial approach was to break down the original input structure to form a file with one line per field comprising of the Record/Field ID and the value, then replace the Record/Field ID by it's corresponding XML tag and finally wrap it up all together. At this point awk should be able to deal with it. The question is how long it takes to process 100.000s of "objects".

About the comment on the fact picking a language comes down to personal preference, I mean I can not disagree with that but when you look at performance some languages will deal with this requirement faster than others. If it means I need to spend 0.5x more coding time to get there it's a price I can realistically pay for the benefits it gives us.
This User Gave Thanks to Indalecio For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace field in the delimited file

Hi, I have the requirement similar to the one mentioned in the below thread. https://www.unix.com/unix-for-dummies-questions-and-answers/128155-search-replace-string-only-particular-column-delimited-file.html The only difference is that I need to change the field for row 1,2 and the last... (14 Replies)
Discussion started by: chetanojha
14 Replies

2. Shell Programming and Scripting

How can i comma-delimited last field in line?

Awk gurus, Greatly appreciate for any kind of assistance from the expert community Input line: abc,11.22.33.44,xyz,7-8-9-10 pqr,111.222.333.444,wxy,1-2-3 def,22.33.44.55,stu,7-8 used the gsub function below but it changes all of the "-" delimiter: awk 'gsub("-",",")' Desired... (4 Replies)
Discussion started by: ux4me
4 Replies

3. Shell Programming and Scripting

Remove Last field from a delimited file

Hi, I have a '~' delimited file and i want to remove the last field using awk. Please find the sample records below: 1428128~1~0~1100426~003~50220~005~14~0~194801~11~0~3~14~0~50419052335~0~0820652001~2~00653862 ~0~1~0~00126~1~20000110~20110423~R~ ~0~Z~1662.94~ ~002041~0045~Z~... (3 Replies)
Discussion started by: Arun Mishra
3 Replies

4. Shell Programming and Scripting

Cgi to dump xml data from form input field

Hi All, I am trying to write a shell script which takes parse the web form find the input field and dump the data of that field into one xml file. The form looks like, <input type="button" id="btnSave" value="Save" onclick="saveXmlData()"/> <form name="submitForm"... (1 Reply)
Discussion started by: jdp
1 Replies

5. Shell Programming and Scripting

Pad zeroes first field in a Delimited file

Need help. I tried using an awk command to pad zeroes. Unfortunately, the "|" pipe delimited character is gone when I tried to write the records to another file. awk -F \| ' {$1=sprintf("%06s", $1); print $0}' $CUSTFINAL2 > $CUSTFINAL3 BEFORE "KEYRECORD"|"SA ID"|"PER ID"|"SP ID"|"ACCT... (3 Replies)
Discussion started by: johnhips
3 Replies

6. Shell Programming and Scripting

Using AWK to parse a delimited field

Hi everyone! How can I parse a delimited field using AWK? For example, if I have lastName#firstName or lastName*firstName. I'd like an AWK script that would return lastName and then another that would return firstName? Is this possible? (13 Replies)
Discussion started by: Fatbob
13 Replies

7. Shell Programming and Scripting

insert a field into a tab delimited file

Hello, Can someone help me to do this with awk or sed? I have a file with multiple lines, each line has many fields separated with a tab. I would like to add one more field holding 'na' in between the first and second fields. old file looks like, 1, field1 field2 field3 ... 2, field1... (7 Replies)
Discussion started by: ssshen
7 Replies

8. Shell Programming and Scripting

Count field frequency in a '|' delimited file

I have a large file with fields delimited by '|', and I want to run some analysis on it. What I want to do is count how many times each field is populated, or list the frequency of population for each field. I am in a Sun OS environment. Thanks, - CB (3 Replies)
Discussion started by: ChicagoBlues
3 Replies

9. Shell Programming and Scripting

How to perfrom summation for particular delimited field?

Hi, Please help to share your thought about how to perfrom summation for particular delimited field, and output to the particular file based on -rw-r--r-- 1 abc other 3094 Oct 19 09:40 0132019832-ps5_online_cdrm.unl -rw-r--r-- 1 abc other 1588 Oct 19 09:47... (2 Replies)
Discussion started by: rauphelhunter
2 Replies
Login or Register to Ask a Question