The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
fixing with sed capri_drm Linux 13 05-27-2008 03:13 PM
vCard Creator Full 0.0.1 (Default branch) iBot Software Releases - RSS News 0 05-06-2008 06:50 PM
Extracting files from corrupted tape mart4179 UNIX for Dummies Questions & Answers 0 03-28-2008 06:40 AM
Corrupted files from Windows to Unix Sco BAM UNIX for Dummies Questions & Answers 9 08-29-2002 11:09 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 10-13-2008
dotancohen's Avatar
dotancohen dotancohen is offline
Registered User
  
 

Join Date: Feb 2008
Location: חיפה
Posts: 26
Fixing corrupted vcard files.

KDE's Kontact PIM breaks quoted-printable vcard files because it
linebreaks in the middle of a word. Take this text for example:

Code:
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

The whole thing should be on one line, and the spaces at the beginning
of each line shouldn't be there at all. I have a directory with 422
files corrupted like this.

Can a shell script go through a directory of files and replace each instance
of "newline-space" with nothing? The system is Ubuntu 8.04 with KDE if
it matters. Thanks.
  #2 (permalink)  
Old 10-13-2008
Annihilannic Annihilannic is offline Forum Advisor  
  
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 1,009
Try this:


Code:
perl -pi.bak -e 'BEGIN { $/=""; } s/\n //gm' *.vcard

It should save backups of the files as filename.vcard.bak.
  #3 (permalink)  
Old 10-14-2008
dotancohen's Avatar
dotancohen dotancohen is offline
Registered User
  
 

Join Date: Feb 2008
Location: חיפה
Posts: 26
Thanks. I am trying to see what happens here:
perl: this is obvious
-pi.bak: simply copy the current file to it's name + .bak?
-e: there is no mention of this in man perl.
'BEGIN { $/=""; } s/\n //gm': the actual regex. I don't quite get it
*.vcard: go through all these files?

I actually need to change the regex so that it not only removes the space at the beginning of a line, but removes the newline character as well. The only newline characters that should remain are those not followed by a space. In php that would be str_replace("\n ", "", $string); however I cannot figure out the perl regex to modify it as such. And regexes are hard to google for!

I do appreciate the code example, but I am also trying to learn a bit (unusual, I know). I very much appreciate your assistance and patience.
  #4 (permalink)  
Old 10-14-2008
Annihilannic Annihilannic is offline Forum Advisor  
  
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 1,009
As far as I can see my solution does what you describe:


Code:
$ cat testfile.vcard
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=
 A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
$ perl -pi.bak -e 'BEGIN { $/=""; } s/\n //gm' *.vcard
$ cat testfile.vcard
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=94=D7=A9=D7=95=D7=A8=D7=94 =D7=94=D7=A9=D7=A0=D7=99=D7=94 =D7=9B=D7=\n

There aren't any hidden/funny characters in the input files are there? Check with cat -vet.

man perlrun is the page you really need to look at for the command-line options. -p and -i are separate options that I combined for the sake of brevity.

-p makes perl behave like awk, including supporting a BEGIN clause before processing any input. In that clause I've redefined the input record separator to be an empty string... this means that perl "slurps" the entire input file in one go rather than reading it line-by-line, which allows us to do regex matches against multiple lines. It is separate from the actual s/// command to do the search and replace.

s/// is documented on the man perlop page.

I'm glad to see you don't just want spoonfeeding (all too common around here!).
  #5 (permalink)  
Old 10-15-2008
dotancohen's Avatar
dotancohen dotancohen is offline
Registered User
  
 

Join Date: Feb 2008
Location: חיפה
Posts: 26
Thanks. I'm going through the docs as we speak. Perl is _complicated_! That does not seem to be my own opinion, either. Googling some example leads me to lots of frustrated people!

In any case, I probably should have posted the entire vcard file. Here it is, along with the results of the code:

Code:
hardy2@hardy2-laptop:~/test$ cat test.vcf
BEGIN:VCARD
FN:First Last
N:Last;First;;;
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:First Line.\nThe Second Line i
 s long so that it will wrap. Long\, long\, and wrapping!=\n\nThird Line.\n
UID:frh74xvYZ9
VERSION:2.1
END:VCARD

BEGIN:VCARD
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=90=D7=90=D7=A4=D7=A8=D7=98=D
 7=99 =D7=9E=D7=A9=D7=A4=D7=97=D7=94
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=9E=D7=A9=D7=A4=D7=97=D7=94;=D
 7=90=D7=90=D7=A4=D7=A8=D7=98=D7=99;;;
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=A9=D7=95=D7=A8=D7=94 =D7=A9=D7=A0=D7=
 99=D7=94 =D7=94=D7=99=D7=90 =D7=\n=90=D7=A8=D7=95=D7=9B=D7=94\, =D7=9B=D7=9
 3=D7=99 =D7=A9=D7=A0=D7=A8=D7=90=\n =D7=90=D7=95=D7=AA=D7=94 =D7=92=D7=95=D
 7=9C=D7=A9=D7=AA. =D7=90=D7=A8=D7=\n=95=D7=9B=D7=94\, =D7=90=D7=A8=D7=95=D7
 =9B=D7=94\, =D7=95=D7=92=D7=95=D7=9C=\n=D7=A9=D7=AA!\n=D7=A9=D7=95=D7=A8=D7
 =94 =D7=A9=D7=9C=D7=99=D7=A9=D7=99=D7=AA.\n
UID:KqbQKbfBaF
VERSION:2.1
END:VCARD

hardy2@hardy2-laptop:~/test$ perl -pi.bak -e 'BEGIN { $/=""; } s/\n //gm' *.vcf
hardy2@hardy2-laptop:~/test$ cat test.vcf
BEGIN:VCARD
FN:First Last
N:Last;First;;;
s long so that it will wrap. Long\, long\, and wrapping!=\n\nThird Line.\ni
UID:frh74xvYZ9
VERSION:2.1
END:VCARD

BEGIN:VCARD
7=99 =D7=9E=D7=A9=D7=A4=D7=97=D7=94INTABLE:=D7=90=D7=90=D7=A4=D7=A8=D7=98=D
7=90=D7=90=D7=A4=D7=A8=D7=98=D7=99;;;ABLE:=D7=9E=D7=A9=D7=A4=D7=97=D7=94;=D
=94 =D7=A9=D7=9C=D7=99=D7=A9=D7=99=D7=AA.\nA9=D7=AA!\n=D7=A9=D7=95=D7=A8=D7
UID:KqbQKbfBaF
VERSION:2.1
END:VCARD

hardy2@hardy2-laptop:~/test$

As can be easily seen, the lines still wrap, and worse, critical parts of the file are destroyed. I have been playing around with the line of code, but it is slow going and I could really use a hand with this. I do appreciate your patience and willingness to teach a noob.
  #6 (permalink)  
Old 10-15-2008
Annihilannic Annihilannic is offline Forum Advisor  
  
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 1,009
I'm suspecting there are some funny line terminators in this file. Can you post the output of cat -vet test.vcf?

I agree about perl, it looks pretty horrible and I was a very slow adopter; but its brevity, power and ubiquity make it difficult to live without. I generally use awk when I can, but perl is ideal for this problem due to its convenient handling of multi-line regex.
  #7 (permalink)  
Old 10-16-2008
dotancohen's Avatar
dotancohen dotancohen is offline
Registered User
  
 

Join Date: Feb 2008
Location: חיפה
Posts: 26

Code:
hardy2@hardy2-laptop:~$ cat -vet test.vcf
BEGIN:VCARD^M$
FN:First Last^M$
N:Last;First;;;^M$
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:First Line.\nThe Second Line i^M$
 s long so that it will wrap. Long\, long\, and wrapping!=\n\nThird Line.\n^M$
UID:frh74xvYZ9^M$
VERSION:2.1^M$
END:VCARD^M$
^M$
BEGIN:VCARD^M$
FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=90=D7=90=D7=A4=D7=A8=D7=98=D^M$
 7=99 =D7=9E=D7=A9=D7=A4=D7=97=D7=94^M$
N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=9E=D7=A9=D7=A4=D7=97=D7=94;=D^M$
 7=90=D7=90=D7=A4=D7=A8=D7=98=D7=99;;;^M$
NOTE;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=D7=A9=D7=95=D7=A8=D7=94 =D7=A^M$
 8=D7=90=D7=A9=D7=95=D7=A0=D7=94.\n=D7=A9=D7=95=D7=A8=D7=94 =D7=A9=D7=A0=D7=^M$
 99=D7=94 =D7=94=D7=99=D7=90 =D7=\n=90=D7=A8=D7=95=D7=9B=D7=94\, =D7=9B=D7=9^M$
 3=D7=99 =D7=A9=D7=A0=D7=A8=D7=90=\n =D7=90=D7=95=D7=AA=D7=94 =D7=92=D7=95=D^M$
 7=9C=D7=A9=D7=AA. =D7=90=D7=A8=D7=\n=95=D7=9B=D7=94\, =D7=90=D7=A8=D7=95=D7^M$
 =9B=D7=94\, =D7=95=D7=92=D7=95=D7=9C=\n=D7=A9=D7=AA!\n=D7=A9=D7=95=D7=A8=D7^M$
 =94 =D7=A9=D7=9C=D7=99=D7=A9=D7=99=D7=AA.\n^M$
UID:KqbQKbfBaF^M$
VERSION:2.1^M$
END:VCARD^M$
^M$
hardy2@hardy2-laptop:~$

Closed Thread

Bookmarks

Tags
operating systems

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 04:41 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0