Need an advanced version of this script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Need an advanced version of this script
# 1  
Old 07-21-2010
Need an advanced version of this script

Hi there, i am completely new to this great forum and to linux. Since 3 days i work with Ubuntu Lucid Lynx and i love the powers of the shell.
This script helped me a lot to understand the process a little bit.
Now my problem. I have this data set, tab delimited:

Code:
'Spalte_1'	'Spalte_2'	'Spalte_3'	'Spalte_4'
Datensatz 1 Spalte 1	Datensatz 1 Spalte 2	Datensatz 1 Spalte 3	Datensatz 1 Spalte 4
Datensatz 2 Spalte 1	Datensatz 2 Spalte 2	Datensatz 2 Spalte 3	Datensatz 2 Spalte 4
Datensatz 3 Spalte 1	Datensatz 3 Spalte 2	Datensatz 3 Spalte 3	Datensatz 3 Spalte 4
Datensatz 4 Spalte 1	Datensatz 4 Spalte 2	Datensatz 4 Spalte 3	Datensatz 4 Spalte 4
Datensatz 5 Spalte 1	Datensatz 5 Spalte 2	Datensatz 5 Spalte 3	Datensatz 5 Spalte 4

I have the following script:
Code:
#!/bin/bash
L=label
{
read line
eval H=($line)
while read line
do
  eval P=($line)
  printf "<$L>\n"
  for ((i=0;i<${#H[@]};i++))
  do
    printf "  %s\n" "<${H[i]}>${P[i]}</${H[i]}>"
  done
  printf "</$L>\n"
done
} < inputfile.csv

I start the process by:
Code:
# ./feedhelper-emptyfields.sh > outputfile.xml

The Output is this:

Code:
<label>
  <Spalte_1>Datensatz</Spalte_1>
  <Spalte_2>1</Spalte_2>
  <Spalte_3>Spalte</Spalte_3>
  <Spalte_4>1</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz</Spalte_1>
  <Spalte_2>2</Spalte_2>
  <Spalte_3>Spalte</Spalte_3>
  <Spalte_4>1</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz</Spalte_1>
  <Spalte_2>3</Spalte_2>
  <Spalte_3>Spalte</Spalte_3>
  <Spalte_4>1</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz</Spalte_1>
  <Spalte_2>4</Spalte_2>
  <Spalte_3>Spalte</Spalte_3>
  <Spalte_4>1</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz</Spalte_1>
  <Spalte_2>5</Spalte_2>
  <Spalte_3>Spalte</Spalte_3>
  <Spalte_4>1</Spalte_4>
</label>

As you might see, there's a problem with the white spaces in the file. How can i implement the function to get everything between the tabs? I have a data set of 60 K lines, sometimes with whole sentences written in between the tabs. Sentences would mean, there are comma, points and so on. Is there a possibility to get things right? Should i change the delimiter or .csv format to sthg. else or better work with perl? Would it be possible to generate a <sublabel></sublabel> element within the <label></label> block? I would appreciate some help, just to start somewhere. Thanks a lot.
# 2  
Old 07-21-2010
Hi, you could use this:

Code:
#!/bin/bash
L=label
{
read line
eval H=($line)
while read line
do
  eval P=($line)
  printf "<$L>\n"
  for ((i=0;i<${#H[@]};i++))
  do
    printf "  %s\n" "<${H[i]}>${P[$((4*i))]} ${P[$((4*i+1))]} ${P[$((4*i+2))]} ${P[$((4*i+3))]}</${H[i]}>"
  done
  printf "</$L>\n"
done
} < infile

But you could also use this, usimg the tab character as field separator (entered as Ctrl-V TAB in vi)

Code:
#!/bin/bash
L=label
{
read line
eval H=($line)
while IFS='	' read P[0] P[1] P[2] P[3] 
do
  printf "<$L>\n"
  for ((i=0;i<${#H[@]};i++))
  do
    printf "  %s\n" "<${H[i]}>${P[i]}</${H[i]}>"
  done
  printf "</$L>\n"
done
} < infile

In bash you can replace the read line with:
Code:
while IFS='	' read -a P

in ksh:
Code:
while IFS='	' read -A P

S.
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 07-21-2010
put your expect output here first.

I have the feeling it can be implemented by one-line awk.
# 4  
Old 07-22-2010
Thanks Scrutinizer. Both versions of the script work like magic. The expect uotput format is this:

Code:
<label>
  <Spalte_1>Datensatz 1 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 1 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 1 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 1 Spalte 4</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz 2 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 2 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 2 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 2 Spalte 4</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz 3 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 3 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 3 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 3 Spalte 4</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz 4 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 4 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 4 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 4 Spalte 4</Spalte_4>
</label>
<label>
  <Spalte_1>Datensatz 5 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 5 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 5 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 5 Spalte 4</Spalte_4>
</label>

There are only 2 things to do for me reight now. To implement one sublabel, as my real data set looks like that:

Code:
'Spalte_1'	'Spalte_2'	'Spalte_3'	'Spalte_4'	'Employee'	'Date'	'Time'
Datensatz 1 Spalte 1	Datensatz 1 Spalte 2	Datensatz 1 Spalte 3	Datensatz 1 Spalte 4	Tommy Foo	14.05.2010	30 Min
Datensatz 2 Spalte 1	Datensatz 2 Spalte 2	Datensatz 2 Spalte 3	Datensatz 2 Spalte 4	Steve Lee	15.05.2010	45 Min
Datensatz 3 Spalte 1	Datensatz 3 Spalte 2	Datensatz 3 Spalte 3	Datensatz 3 Spalte 4	Joe Average	15.05.2010	90 Min
Datensatz 4 Spalte 1	Datensatz 4 Spalte 2	Datensatz 4 Spalte 3	Datensatz 4 Spalte 4	Jimmy Choo	16.05.2010	80 Min
Datensatz 5 Spalte 1	Datensatz 5 Spalte 2	Datensatz 5 Spalte 3	Datensatz 5 Spalte 4	Mary Haha	18.05.2010	130 Min

Expected output would be like that:
Code:
<label>
  <Spalte_1>Datensatz 1 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 1 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 1 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 1 Spalte 4</Spalte_4>
	<sublabel>
	  <Employee>Tommy Foo</Employee>
	  <Date>14.05.2010</Date>
	  <Time>30 Min</Time>
	</sublabel>
</label>

I think i have an idea to do that. I'll run the script 2 times and merge the 2 different output files later (one containing the label, the other one the sublabel).

The other plan is to buy some books on shell scripting Smilie

You helped me a lot already, thanks for that.

@rdcwayx: awk seem to be a very mighty tool, i have to learn a lot.
# 5  
Old 07-22-2010
find a lazy way, which you can adjust your template xml easily in the future.

Create a template first:

Code:
$ cat template
<label>
  <Spalte_1>Datensatz 1 Spalte 1</Spalte_1>
  <Spalte_2>Datensatz 1 Spalte 2</Spalte_2>
  <Spalte_3>Datensatz 1 Spalte 3</Spalte_3>
  <Spalte_4>Datensatz 1 Spalte 4</Spalte_4>
        <sublabel>
          <Employee>Tommy Foo</Employee>
          <Date>14.05.2010</Date>
          <Time>30 Min</Time>
        </sublabel>
</label>

Set the input file:

Code:
$ cat input
'Spalte_1'      'Spalte_2'      'Spalte_3'      'Spalte_4'      'Employee'      'Date'  'Time'
Datensatz 1 Spalte 1    Datensatz 1 Spalte 2    Datensatz 1 Spalte 3    Datensatz 1 Spalte 4    Tommy Foo       14.05.2010      30 Min
Datensatz 2 Spalte 1    Datensatz 2 Spalte 2    Datensatz 2 Spalte 3    Datensatz 2 Spalte 4    Steve Lee       15.05.2010      45 Min
Datensatz 3 Spalte 1    Datensatz 3 Spalte 2    Datensatz 3 Spalte 3    Datensatz 3 Spalte 4    Joe Average     15.05.2010      90 Min
Datensatz 4 Spalte 1    Datensatz 4 Spalte 2    Datensatz 4 Spalte 3    Datensatz 4 Spalte 4    Jimmy Choo      16.05.2010      80 Min
Datensatz 5 Spalte 1    Datensatz 5 Spalte 2    Datensatz 5 Spalte 3    Datensatz 5 Spalte 4    Mary Haha       18.05.2010      130 Min

Then run below command:

Code:
awk -F "\t" 'NR==FNR {a[++i]=$0; next}
             FNR>1 {print a[1]
             for (j=1;j<=4;j++) {gsub(/>.*</,">"$(j)"<",a[j+1]); print a[j+1]}
             print a[6]
             for (j=5;j<=7;j++) {gsub(/>.*</,">"$(j)"<",a[j+2]); print a[j+2]}
             print a[10]
             print a[11]}' template input

Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Copy a file from directroy/ prior version to the directory/ new version

How to copy a file from directroy/ prior version to the directory/ new version automatically. (4 Replies)
Discussion started by: roy1912
4 Replies

2. UNIX for Dummies Questions & Answers

Advanced Symlink Creation script

Hello. I am working on creating a script to recursively run on my media collection. The goal is to recreate the folder structure, but with slight changes in the folder names based on my rules. The folders will also include the .jpg & .mkv files, renamed according to a similar set of rules. ... (4 Replies)
Discussion started by: Davinator
4 Replies

3. Shell Programming and Scripting

Script for download file with version

Hi, Need Shell script to download the file with version or date from internet Filename will be as Eg:File-2.3.1.zip Filename will keep on changing for every 4months as File-2.3.2 or File 2.4 Thanks, Anil (4 Replies)
Discussion started by: Anil2312
4 Replies

4. Shell Programming and Scripting

Script to look for new version of software

Bit of a long shot but is there a way i can have a script check a website for new version of software and dwnload it . and maybe email me when it downlaods not sure if that is even possible, to make it even worse i have to give a username and password to be able to download it thanks ... (5 Replies)
Discussion started by: ab52
5 Replies

5. UNIX for Advanced & Expert Users

Advanced Search * View * Edit JAVA version to WORK in GLASSFISH Forum topic JAVA version

Would like to confirm the ff. I got confused actually with the version I needed to download that will work on glassfish 3.0.1 a. Debian Squeeze (HP DL360). Need to use java version6 On Debian, I did apt-get install sun-java6-jdk. So when I check it's java version "1.6.0_22" Java(TM) SE... (1 Reply)
Discussion started by: lhareigh890
1 Replies

6. Shell Programming and Scripting

Ask for Version of Script on Server w/o download

Hello! I'm sorry if my question is kind of a noob question, but I'm searching for a way to "ask" a Server for the Version of a File. The problem is I have several clients asking every hour or so for the newest version of the file, so if I'm just downloading a md5sum or so I'm getting a lot of... (10 Replies)
Discussion started by: al0x
10 Replies

7. Red Hat

Installing the correct version of RHEL 5 Advanced Platform.

Dear Linux Gurus and RedHat Experts, I am about to install RHEL 5 Advanced Platform x86-64 on a high end machine with 4 sockets of CPU. However, I only have the subscription number from RedHat and can log on into rhn.redhat.com. However there are only one version ISO image per CPU... (0 Replies)
Discussion started by: Zepiroth
0 Replies
Login or Register to Ask a Question