Hi All , I have seen a lot of code samples which suggest how to remove the junk data from and XML , I need a code in unix which removes the junk characters as well as the valid characters those are not in XML tags , for example my XML is as follows :
I want to remove "abcd" and "pqrs $$ faf»" type of records which are not in tags , from the xml in Unix .
Is that possible ?
For the sample data you've shown us, try:
seems to do what you want. It will not, however, remove all of the occurrences of 07-04-2014 00:15:04 which is also data that is not in any tag.
thanks Don...
is still a valid data as it is between the valid tags of time stamp which start before it and ends after it.
But the data like "abcd" and "pqrs $$ faf»" is not enclosed in tags , which doesnt allow it to form a valid xml.
You know that because you know what tags are valid and what data is valid between certain tags. You can build that knowledge into a script, but, for the generic script, there is a lot of meta-data that needs to be provided to that script defining the valid tags, the valid nesting of tags, the formats allowed for data between certain tags, ... . But, you aren't going to get a parser like that from a forum like this.
If you can state some clear requirements to simplify the general problem to a more specific issue, we might be able to help.
Do you just want to throw away stuff (other than a single <newline> character after a </spd> flag) that does not start with a <?xml version="1.0" encoding="IBM037"?> tag and end with the next </spd> tag?
Most XML code I've seen would have a </xml> tag for each of the <?xml...> tags. Why aren't there any in your XML code?
Hi All,
I have a issue that we are getting Junk characters from source and i am not able to load that records to Database.
Line breakers
Junk Characters (Â and different every time)
Japanese Characters
Every time I am using grep command and awk -F "\007" to find them and delete that... (1 Reply)
I would like to remove all characters starting with "%" and ending with ")" in the 4th field - please help!!
1412007819.864 /device/services/heartbeatxx 204 0.547%!i(int=0) 0.434 0.112
1412007819.866 /device/services/heartbeatxx 204 0.547%!i(int=1) 0.423 0.123... (10 Replies)
I am using flatfile, in that flat file we are getting the junk chars
1)I21001f<82>^Me<85>!h49 Service Charge
2) I21001f‚
e...!h49 Service Charge
please tell me how to remove all junk chars in unix scripts. (1 Reply)
I wanted to remove junk char in my csv. :mad:
Input file format:
"17","9986782190","0","D","2"
"17","9900918331","0","D","2"
"13","9986782194","0","A","2"
Output file format
9986782190
9900918331
9986782194
And one more thing all the time "13"," this will be different Ex: . (2 Replies)
Hi
I have to remove the junk characters from my file. Please help..
File content :
CURITY_CODE_GSD) FROM� DL_CB_SOD_EOD_VALUATION WHERE� ASOF (1 Reply)
Hello sir,
I have generated XML file from VS 2005. It works well in windows but it shows some junk characters in unix.
Can any help me with this problem.
Thank you in advance.
Hema (6 Replies)
Guys,
can you help me in removing the junk character "^S" from the below line using perl
Reference Data Not Recognised ^S Where a value is provided by the consuming system, which is not reco
Thanks,
M.Mohan (1 Reply)
Hi Team,
I have a file having size greater than 1 GB. What i want to do is to check if it contains any JUNK character (ie any special charater thats not on the key board stroke). This file has 532 column & seperated with ^~^.
I have found some solution from the file, but it is for a... (4 Replies)
Hello friendz,
dfl;g435hkd.fg
..this is what I am getting. I want to print strings without junk chars.
I want to exactly like this... dflg435hkd.fg ...need some specific operators also. for example, dot or comma should allow.
plz help me out ya!
~Balan:confused: (1 Reply)
Hi All,
I have been trying to FTP some data files from Windows directory to a UNIX server. The txt file in the windows contails the following data:
"111~XYZ~1~Contact person’s phone number~COMMENTS~~~~"
but the same line is appearing as
"111~XYZ~1~Contact person^Òs phone number~COMMENTS~~~~"... (8 Replies)