Removing characters from end of line (length unknown)


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing characters from end of line (length unknown)
# 1  
Old 01-05-2012
Removing characters from end of line (length unknown)

Hi

I have a file which contains wrong XML, There are some garbage characters at the end of line that I want to get rid of. Example:

<request type="product" ><attributes><pair><name>q</name><value><![CDATA[LOL]]></value></pair><pair><name>start</name><value>1</value></pair></attributes></request>�J I�i�Y�Y��'z�3�u�J�5��}���#Q/k;!�ˑ�9Q){_������ŐF
<request type="product"><attributes><pair><name>q</name><value><![CDATA[LOL2]]></value></pair><pair><name>start</name><value>1</value></pair></attributes></request>4/lIT�l��'�c�Oֲ�{�;��_?��(>͏Y�mP��

How can I remove the garbage characters after </request> ? Or in other words, How to remove string between </request> and <request> ?

Please note from <request> to </request> is just one line so

Code:
awk '/<request t/ , /<\/request>/' test.txt

does not work.


My purpose is to extract value when name is "q" (LOL and LOL2) in this case. So if that can be done , easily, I am not bothered about removing the junk characters.


Thank you for your time.
# 2  
Old 01-05-2012
Code:
perl -e ' while(<>){print "$1\n" if (/name>q<\/name><value><(?:!\[CDATA\[)?([^\]]+)\]\]><\/value/);}' test.txt


Last edited by Franklin52; 01-06-2012 at 05:50 AM.. Reason: Code tags
This User Gave Thanks to Skrynesaver For This Post:
# 3  
Old 01-05-2012
You Sir, are awesome. 1000 internets to you.
# 4  
Old 01-05-2012
Sorry, my bad.. Didn't read the question completely. Deleted my erroneous solution.

Last edited by balajesuri; 01-05-2012 at 08:30 AM..
# 5  
Old 01-06-2012
Hi

Just trying to understand your solution, some questions:

1)why did you use "?:" before !\[CDATA

2) What is the reason for putting "(?:!\[CDATA\[)" in parentheses i.e. "(" and ")"

3) What does "?" in the middle do?

4) What does ([^\]]+) do?

Sorry, I am still learning regular expressions. Someday I want to be as good as you. Please help.

I have made the characters in bold for your convenience.

Thank you.

if (/name>q<\/name><value><(?:!\[CDATA\[)?([^\]]+)\]\]><\/value/);
# 6  
Old 01-06-2012
Code:
/name>q<\/name><value>< # literal string
(?: #non capturing parenthesis
!\[CDATA\[)? This block is optional (allows for cases where the data isn't CDATA escaped)
( #begin capture
[^\]]+ # more than one character which isn't a ] (match is greedy so it will capture as many as possible
)#end of capture
(:?\])+ #What I should have said ;) to make the CDATA wrapper genuinely optional
><\/value# string literal
/x # allow comments in regexes so the maintainer doesn't hunt you down and kill you

The contents matched by the capturing parenthesis available then as $1.
This User Gave Thanks to Skrynesaver For This Post:
# 7  
Old 01-06-2012
Thanks for taking time out and explaining things. $1 is not (?:!\[CDATA\[) even though it is in parenthesis because it is followed by "?"

Or in other words, what is the reason $1 is not set to (?:!\[CDATA\[) even though that is the first expression inside parenthesis?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed - Removing all characters from token to end of line

Hello. The token is any printable characters between 2 " . The token is unknown, but we know that it is between 2 " Tok 1 : "1234x567" Tok 2 : "A3b6+None" Tok 3 : "A3b6!1234=@" The ligne is : Line 1 : "9876xABCDE"Do you have any code fragments or data samples in your post Line 2 : ... (3 Replies)
Discussion started by: jcdole
3 Replies

2. UNIX for Dummies Questions & Answers

How to specify beginning-of-line/end-of-line characters inside a regex range

How can I specify special meaning characters like ^ or $ inside a regex range. e.g Suppose I want to search for a string that either starts with '|' character or begins with start-of-line character. I tried the following but it does not work: sed 's/\(\)/<do something here>/g' file1 ... (3 Replies)
Discussion started by: jawsnnn
3 Replies

3. UNIX for Dummies Questions & Answers

Removing characters from end of string

Hello, I have records like below that I want to remove any five characters from the end of the string before the double quotes unless it is only an asterik. 3919,5020 ,04/17/2012,0000000000006601.43,,0000000000000000.00,, 132, 251219,"*" 1668,0125 ... (2 Replies)
Discussion started by: jyoung
2 Replies

4. Shell Programming and Scripting

adding characters end of line where line begins with..

Hi all, using VI, can anyone tell me how to add some characters onto the end of a line where the line begins with certain charactars eg a,b,c,......., r,s,t,........, a,b,c,......., all lines in the above example starting with a,b,c, I want to add an x at the end of the line so the... (6 Replies)
Discussion started by: satnamx
6 Replies

5. Shell Programming and Scripting

sed removing until end of line

All: Can somebody help me out with a sed command, which removes the the first occurance of ')' until the end of the line If I have the following input ... (5 Replies)
Discussion started by: BeefStu
5 Replies

6. UNIX for Dummies Questions & Answers

Removing end of line using SED

Hello Friends, How can I remove the last two values of this line using sed John Carey:507-699-5368:29 Albert way, Edmonton, AL 25638:9/3/90:45900 The result should look like this: John Carey:507-699-5368:29 Albert way, Edmonton, AL 25638 (3 Replies)
Discussion started by: humkhn
3 Replies

7. Shell Programming and Scripting

Get the 1st 99 characters and add new line feed at the end of the line

I have a file with varying record length in it. I need to reformat this file so that each line will have a length of 100 characters (99 characters + the line feed). AU * A01 EXPENSE 6990370000 CWF SUBC TRAVEL & MISC MY * A02 RESALE 6990788000 Y... (3 Replies)
Discussion started by: udelalv
3 Replies

8. Shell Programming and Scripting

Deleting Characters at specific position in a line if the line is certain length

I've got a file that would have lines similar to: 12345678 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00 23456781 x.00 xx.00 xx.00 xx.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00 34567812 x.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00 45678123 x.00 xx.00 xx.00 xx.00 xx.00 x.00 xxx.00 xx.00 xx.00 xx.00 xx.00... (10 Replies)
Discussion started by: Cailet
10 Replies

9. Shell Programming and Scripting

Removing character from list line (at the end)

Hi, I have file as shown below. abc, def, abc, xyz, I have to remove ',' from end of last line (xyz,). How can I do that with single command? Is it possible or I have to iterate through complete file to remove that? - Malay (2 Replies)
Discussion started by: malaymaru
2 Replies

10. Shell Programming and Scripting

Removing characters from end of $string

I am writing a script to search PCL output and append more PCL data to the end accordingly. I need to remove the last 88 bytes from the string. I have searched for a few hours now and am coming up with nothing. I can't use head or tail because the PCL output is all on one line. awk crashes on... (3 Replies)
Discussion started by: craig2k
3 Replies
Login or Register to Ask a Question