Fixing corrupted vcard files.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Fixing corrupted vcard files.
# 1  
Old 10-13-2008
Fixing corrupted vcard files.

KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:
Code:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

The whole thing should be on one line, and the spaces at the beginning
of each line shouldn't be there at all. I have a directory with 422
files corrupted like this.

Can a shell script go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.
# 2  
Old 10-13-2008
Try this:

Code:
perl -pi.bak -e 'BEGIN { $/=""; } s/\n //gm' *.vcard

It should save backups of the files as filename.vcard.bak.
# 3  
Old 10-14-2008
Thanks. I am trying to see what happens here:
perl: this is obvious
-pi.bak: simply copy the current file to it's name + .bak?
-e: there is no mention of this in man perl.
'BEGIN { $/=""; } s/\n //gm': the actual regex. I don't quite get it
*.vcard: go through all these files?

I actually need to change the regex so that it not only removes the space at the beginning of a line, but removes the newline character as well. The only newline characters that should remain are those not followed by a space. In php that would be str_replace("\n ", "", $string); however I cannot figure out the perl regex to modify it as such. And regexes are hard to google for!

I do appreciate the code example, but I am also trying to learn a bit (unusual, I know). I very much appreciate your assistance and patience.
# 4  
Old 10-14-2008
As far as I can see my solution does what you describe:

Code:
$ cat testfile.vcard
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
$ perl -pi.bak -e 'BEGIN { $/=""; } s/\n //gm' *.vcard
$ cat testfile.vcard
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

There aren't any hidden/funny characters in the input files are there? Check with cat -vet.

man perlrun is the page you really need to look at for the command-line options. -p and -i are separate options that I combined for the sake of brevity.

-p makes perl behave like awk, including supporting a BEGIN clause before processing any input. In that clause I've redefined the input record separator to be an empty string... this means that perl "slurps" the entire input file in one go rather than reading it line-by-line, which allows us to do regex matches against multiple lines. It is separate from the actual s/// command to do the search and replace.

s/// is documented on the man perlop page.

I'm glad to see you don't just want spoonfeeding (all too common around here!).
# 5  
Old 10-15-2008
Thanks. I'm going through the docs as we speak. Perl is _complicated_! That does not seem to be my own opinion, either. Googling some example leads me to lots of frustrated people!

In any case, I probably should have posted the entire vcard file. Here it is, along with the results of the code:
Code:
hardy2@hardy2-laptop:~/test$ cat test.vcf
BEGIN:VCARD
FN:First Last
N:Last;First;;;
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:First Line.\nThe Second Line i
 s long so that it will wrap. Long\, long\, and wrapping!=\n\nThird Line.\n
UID:frh74xvYZ9
VERSION:2.1
END:VCARD

BEGIN:VCARD
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=90=D7=90=D7=A4=D7=A8=D7=98=D
 7=99 =D7=9E=D7=A9=D7=A4=D7=97=D7=94
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=9E=D7=A9=D7=A4=D7=97=D7=94;=D
 7=90=D7=90=D7=A4=D7=A8=D7=98=D7=99;;;
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=A9=D7=95=D7=A8=D7=94 =D7=A9=D7=A0=D7=
 99=D7=94 =D7=94=D7=99=D7=90 =D7=\n=90=D7=A8=D7=95=D7=9B=D7=94\, =D7=9B=D7=9
 3=D7=99 =D7=A9=D7=A0=D7=A8=D7=90=\n =D7=90=D7=95=D7=AA=D7=94 =D7=92=D7=95=D
 7=9C=D7=A9=D7=AA. =D7=90=D7=A8=D7=\n=95=D7=9B=D7=94\, =D7=90=D7=A8=D7=95=D7
 =9B=D7=94\, =D7=95=D7=92=D7=95=D7=9C=\n=D7=A9=D7=AA!\n=D7=A9=D7=95=D7=A8=D7
 =94 =D7=A9=D7=9C=D7=99=D7=A9=D7=99=D7=AA.\n
UID:KqbQKbfBaF
VERSION:2.1
END:VCARD

hardy2@hardy2-laptop:~/test$ perl -pi.bak -e 'BEGIN { $/=""; } s/\n //gm' *.vcf
hardy2@hardy2-laptop:~/test$ cat test.vcf
BEGIN:VCARD
FN:First Last
N:Last;First;;;
s long so that it will wrap. Long\, long\, and wrapping!=\n\nThird Line.\ni
UID:frh74xvYZ9
VERSION:2.1
END:VCARD

BEGIN:VCARD
7=99 =D7=9E=D7=A9=D7=A4=D7=97=D7=94INTABLE:=D7=90=D7=90=D7=A4=D7=A8=D7=98=D
7=90=D7=90=D7=A4=D7=A8=D7=98=D7=99;;;ABLE:=D7=9E=D7=A9=D7=A4=D7=97=D7=94;=D
=94 =D7=A9=D7=9C=D7=99=D7=A9=D7=99=D7=AA.\nA9=D7=AA!\n=D7=A9=D7=95=D7=A8=D7
UID:KqbQKbfBaF
VERSION:2.1
END:VCARD

hardy2@hardy2-laptop:~/test$

As can be easily seen, the lines still wrap, and worse, critical parts of the file are destroyed. I have been playing around with the line of code, but it is slow going and I could really use a hand with this. I do appreciate your patience and willingness to teach a noob.
# 6  
Old 10-15-2008
I'm suspecting there are some funny line terminators in this file. Can you post the output of cat -vet test.vcf?

I agree about perl, it looks pretty horrible and I was a very slow adopter; but its brevity, power and ubiquity make it difficult to live without. I generally use awk when I can, but perl is ideal for this problem due to its convenient handling of multi-line regex.
# 7  
Old 10-16-2008
Code:
hardy2@hardy2-laptop:~$ cat -vet test.vcf
BEGIN:VCARD^M$
FN:First Last^M$
N:Last;First;;;^M$
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:First Line.\nThe Second Line i^M$
 s long so that it will wrap. Long\, long\, and wrapping!=\n\nThird Line.\n^M$
UID:frh74xvYZ9^M$
VERSION:2.1^M$
END:VCARD^M$
^M$
BEGIN:VCARD^M$
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=90=D7=90=D7=A4=D7=A8=D7=98=D^M$
 7=99 =D7=9E=D7=A9=D7=A4=D7=97=D7=94^M$
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=9E=D7=A9=D7=A4=D7=97=D7=94;=D^M$
 7=90=D7=90=D7=A4=D7=A8=D7=98=D7=99;;;^M$
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A^M$
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=A9=D7=95=D7=A8=D7=94 =D7=A9=D7=A0=D7=^M$
 99=D7=94 =D7=94=D7=99=D7=90 =D7=\n=90=D7=A8=D7=95=D7=9B=D7=94\, =D7=9B=D7=9^M$
 3=D7=99 =D7=A9=D7=A0=D7=A8=D7=90=\n =D7=90=D7=95=D7=AA=D7=94 =D7=92=D7=95=D^M$
 7=9C=D7=A9=D7=AA. =D7=90=D7=A8=D7=\n=95=D7=9B=D7=94\, =D7=90=D7=A8=D7=95=D7^M$
 =9B=D7=94\, =D7=95=D7=92=D7=95=D7=9C=\n=D7=A9=D7=AA!\n=D7=A9=D7=95=D7=A8=D7^M$
 =94 =D7=A9=D7=9C=D7=99=D7=A9=D7=99=D7=AA.\n^M$
UID:KqbQKbfBaF^M$
VERSION:2.1^M$
END:VCARD^M$
^M$
hardy2@hardy2-laptop:~$

Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Hardware

Files getting corrupted

$ uname -a Linux darksun 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:45 UTC 2014 i686 athlon i686 GNU/Linux My files are getting corrupted on a frequent basis. $ sudo fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total... (10 Replies)
Discussion started by: rlopes
10 Replies

2. Shell Programming and Scripting

help fixing awk statement

awk "BEGIN {if($MessageREAD<$ThresholdW) {print \"OK\" ; exit 0} else if(($MessageREAD>=$ThresholdW) && ($MessageREAD<$ThresholdC)) {print \"WARNING\" ; exit 1}" else if($MessageREAD<=$ThresholdC) {print \"CRITICAL\" ;... (4 Replies)
Discussion started by: SkySmart
4 Replies

3. Shell Programming and Scripting

Help fixing awk code to print values from 2 files

Hi everyone, Please help on this: I have file1: <file title="Title 1 and 2"> <report> <title>Title 1</title> <number>No. 1234</number> <address>Address 1</address> <date>October 07, 2009</date> <description>Some text</description> </report> ... (6 Replies)
Discussion started by: Ophiuchus
6 Replies

4. HP-UX

WinRAR files are corrupted after FTP

In my Windows 2003 server machine I have a winrar or winzip file that i around 3GB. This zip/rar file is ftped to a unix mahine (HPUX) . FTP is successful. But when it get this file to check if its has been ftped correctly, the file is corrupted. Is there something wrong that i am doing while... (4 Replies)
Discussion started by: maroli
4 Replies

5. Solaris

PAM login library files corrupted, have ILOM, can I get root?

I was installing sfw sudo and its dependencies (libiconv, libintl, libgcc)on Solaris 10, running on an x86 x4200 and I corrupted some PAM library files. It's a standard Solaris 10 base install, with some added software & libraries from a vendor. I am on console trying to get root access back,... (1 Reply)
Discussion started by: Mariognarly
1 Replies

6. Linux

fixing with sed

I am trying to replace the value of $f3 but its not working . I don't know what I am missing here . cat dim_copy.20080516.sql | grep -i "create view" | grep -v OPSDM002 | while read f1 f2 f3 f4 f5 f6 f7 f8 f9 do echo " $f3 " sed -e... (13 Replies)
Discussion started by: capri_drm
13 Replies

7. UNIX for Dummies Questions & Answers

Extracting files from corrupted tape

I've got a backuptape in cpio format that was accidentally overwritten with a very small batch file. As I assume that the cpio header has been overwritten, I cannot extract files from the backup in the conventional manner: ( cpio -itv </dev/rct0 cpio: this is not a cpio file, bad header) ... (0 Replies)
Discussion started by: mart4179
0 Replies

8. UNIX for Dummies Questions & Answers

Corrupted files from Windows to Unix Sco

I downloaded some applications from CD on a windows2000 PC to a Unix Sco machine using the WS-FTP program. When I tried to run the applications on the Unix machines I got an error. The files must have been corrupted in the process of transferring files from a Windows 2000 to a Unix Sco... (9 Replies)
Discussion started by: BAM
9 Replies
Login or Register to Ask a Question