How to split a field into two fields?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to split a field into two fields?
# 1  
Old 02-20-2008
How to split a field into two fields?

Hi,

I have a comma delimited text file where character fields (as opposed to numeric and date fields) are always enclosed with double quotes. Records are separated by the newline character. In a shell script I would like to split a particular field into two separate fields (enclosed with double quotes). The field I would like to split always begins with <description> and ends with </description> and is always the 5th field in a record.

e.g. I would like to convert this:

18,"A",2008-02-11,"Y","<description> some long text </description>","N",1

to this:

18,"A",2008-02-11,"Y","<description> some lo","ng text </description>","N",1

I'm not bothered where in the field the split occurs - somewhere in the middle is optimal.

Really grateful for any help on this one.

Thanks, Vicky
# 2  
Old 02-20-2008
try awk:
Code:
awk -F, '{
       for(i=1; i<NF; i++) {                                                
           if($i ~ /<description>/) { 
                   half=length($i)/2; 
                   printf("%s\",\"%s,", substr($i,1,half),
                                        substr($i,half+1))
           }
           else {
                   printf("%s,", $i)
           }
       }
       print $NF }' oldfile > newfile

# 3  
Old 02-20-2008
While I was trying to solve the issue it seems Jim has already solved it.
Even I'm new to scripting and I have tried to solve it.
Hope it helps
Quote:
awk -F"," '{print $1,",",$2,",",$3,",",$4,",",substr($5,1,length($5)/2),"\",\"",substr($5,(length($5)/2)+1),",",$6,",",$7,",",$8}' file1
But my script gives extra spaces at the delimiters.
# 4  
Old 02-21-2008
Thanks a lot for your help.

I should have said in my initial post that there may be text in between the double quotes which themselves are in double quotes and may contain commas,

e.g. 18,"<description><job_title value="some text, more text" /></description>",2008-02-19,"N"

I think this makes it a lot more complicated?

I'm also having to use nawk (I'm on Solaris) as each record is likely to be more than 3000 characters (max for awk), but I think the syntax is the same/similar to awk.

Any ideas?

Thanks again
Vicky
# 5  
Old 02-21-2008
Follow up:

Think I have managed to sort out what I want with only a minor modification to user "jim mcnamara" solution:

I used nawk and put </description> as the record delimiter instead of ,

Solution:
nawk -F "</description>" '{
for(i=1; i<NF; i++) {
if($i ~ /<description>/) {
half=length($i)/2;
printf("%s\",\"%s", substr($i,1,half), substr($i,half+1))
}
else {
printf("%s,", $i)
}
}
print $NF }' oldfile > newfile
fi

Thanks so much for your help.

Vicky
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to add plus or minus to fields and split another field

In the tab-delimited input below I am trying to use awk to -10 from $2 and +10 to $3. Something like awk -F'\t' -v OFS='\t' -v s=10 '{split($4,a,":"); print $1,$2-s,$3+s,a,$5,$6} | awk {split(a,b,"-"); print $1,$2-s,$3+s,b-s,b+s,$5,$6}' input should do that. I also need to -10 from $4... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Inserting a field without disturbing field separator on other fields

Hi All, I have the input as below: cat input 032016002 2.891 97.109 16.605 27.172 24.017 32.207 0.233 0.021 39.810 0.077 0.026 19.644 13.882 0.131 11.646 0.102 11.449 76.265 23.735 16.991 83.009 8.840 91.160 0.020 99.980 52.102 47.898 44.004 55.996 39.963 18.625 0.121 1.126 40.189... (15 Replies)
Discussion started by: am24
15 Replies

3. Shell Programming and Scripting

Split fields from a file

Hi All, I have a file where a list of email id's are stored as shown below: emailid1@blh.com emaild2@blh.com asdf@blah.com emailid3@blh.com In my shell script, i am sending emails to above id's My requirement is to seperate the email id's into 2 groups.. emailid1@blh.com... (10 Replies)
Discussion started by: galaxy_rocky
10 Replies

4. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Discussion started by: 100bees
6 Replies

5. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

6. Shell Programming and Scripting

AWK:Split fields separated by semicolon

Hi all, I have a .vcf file which contains 8 coulmns and the data under each column as shown below, CHROM POS ID REF ALT QUAL FILTER INFO 1 3000012 . A G 126 ... (6 Replies)
Discussion started by: mehar
6 Replies

7. Shell Programming and Scripting

compare two fields and get a third field

Hello, I'm trying to get a value based on a comparison of two fields, this is: file1 687.45 687.18 687.322 687.405 686.865 file 2 685 6.43 686 6.43 687 6.42 688 6.42 (3 Replies)
Discussion started by: Gery
3 Replies

8. Shell Programming and Scripting

Sorting on two fields time field and number field

Hi, I have a file that has data in it that says 00:01:48.233 1212 00:01:56.233 345 00:09:01.221 5678 00:12:23.321 93444 The file has more line than this but i just wanted to put in a snippet to ask how I would get the highest number with time stamp into another file. So from the above... (2 Replies)
Discussion started by: pat4519
2 Replies

9. Web Development

split the fields in a column into 3 columns

Have a column "address" which is combination of city, region and postal code like. Format is : city<comma><space>region<space>postal code abc, xyz 123456 All these three city, region and postal code are not mandatory. There can be any one of the above. In that case a nell... (2 Replies)
Discussion started by: rakshit
2 Replies

10. Shell Programming and Scripting

returning split fields

I have a variable with data in this format field1;field2;field3 I wanted to split the variable like this field1 field2 field3 this statement was working fine echo $key_val | awk '{gsub(";" , "\n"))' but sometimes we get the data in the variable in this format... (3 Replies)
Discussion started by: mervin2006
3 Replies
Login or Register to Ask a Question