Identify lines with wrong format in a file and fix
Gurus,
I have a data file which has a certain number of columns say 101. It has one description column which contains foreign characters and due to this some times, those special characters are translated to new line character and resulting in failing the process.
I am using the following awk command to indentify the number of columns
If the file has foreign characters then it will display some thing like this if I run the above command.
I have written a small code to compare the counts of previous line with the next line and displays the line no. I have not done extensive unix scripting but the performance is not good as it takes for ever.
Is there a better way to identify the issue record and replace the foreign character with space or remove it all together.
Here is the issue record:
bad: good:
When the above bad record is read by abinitio etl tool, it gets new line character and the record gets scrambled and will shifted to next column and process fails.
Any help or quick command will be helpful. Thanks.
Last edited by Don Cragun; 10-30-2015 at 05:35 PM..
Reason: Add CODE tags.
Why not do the entire loop in awk? Try
to find the field count of the lines with line-No.s.
The interpretation of non- ASCII characters is locale- dependent. This
would do as you requested.
---------- Post updated at 22:06 ---------- Previous update was at 22:01 ----------
Actually, I can't see those non-ASCII chars modify the field count. Try also
to compare the lines before - after the replacement
Thanks for your suggestion. After searching for other threads in this forum, I have modified the code as follows. It identifies the problem records which have greater than 101 columns.
But to fix it is still a manual step. I am getting the output as shown below.
Is there a way to identify the record which is modified to
in advance and fix it.
I don't understand your request. The red pipe char is the normal ASCII 0X7C used as a field separator in your file. Do you imply that one is an artefact? Did you try the gsub from post#1?
Last edited by RudiC; 11-04-2015 at 01:20 PM..
Reason: included "red"
Hi team,
getting output logs wrong in different format from telnet script ...
getting Output.txt
macro_outdoor_dist-6.0.0(v4_0_2) DN:1.3.903 (1101:100:11w:500:3:2:103:aa)
macro_outdoor_dist-8.1.0(v3_1_0) DN:1.3.409 (N/A)... (3 Replies)
Hi All,
I have the below scenario in my environment
Developers used to copy file from windows to Linux box. Some time on the copied file developers miss to run the dos2unix utility. Because of this script gets failed during the execution. Most of the failures are due to the dos2unix format... (7 Replies)
Hi,
I have a 100 byte length fixed width file . In that three rows are broken and went off to next line. How can I identify the broken lines?
E.g.
ABCD1234MNRD4321
abcd1234mnrd
4321
As you can see in my example my second row with small case alphabets is broken... (5 Replies)
First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem.
I've got a string that looks something like this:
Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
I have a bunch of files in various folders. I want to go through each of them and display certain lines in a particular format
All files have a similar format
Date:
Time:
User:
Message:
Miscellaneous:
(and some other stuff)I want to display to only the "Date:", "Time:" "User:" lines in... (7 Replies)
Hi All,
I need read the file and out put format as below using ksh, I wrote below script its keep on repeating first line in the file.
may i know the best way to get the below out put while incrementing line in the file.
cat b.txt |awk '{print $0}' |while read line
do
aa=`cat $line |head -1... (7 Replies)
Hello Experts,
I have a timestamp(6) column in a .csv data file , format of the data is as below:-
ETCT,P,Elec, Inc.,abc,11/5/2010 4:16:09.000000 PM,Y,Y,Y
I want the timestamp column to be properly formatted like
11/05/2010 04:16:09.000000 PM
Currently the "0" is missing with... (3 Replies)
Hi again:
I have a log file wihch has always this format:
DATA line 1
DATA line 2
^^^^^
| Spaces or TABs
The first line always begins from the start, but the second begins with spaces or TABs,
Question:
How can I add the second line to the first one?
I mean this:
DATA... (6 Replies)