finding the incorrect record


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting finding the incorrect record
# 1  
Old 11-04-2011
finding the incorrect record

Hi ,

I have a scenario where i have to find the incorrect records in the file.
In our comma delimited file , we have the following to be taken care :


1) if there is new line character then we have to capture the current line and the next line as error record

Ex : In a comma delimited file , if there are 4 fields ..
Code:
a,b,c,d
1,2,3\n
4
 
Ans
1,2,3\n
4

2) If there is a comma in the field then that particular field will be in double quotes which implies that it is a value in the field.In this case we should ignore this comma since it is a value

Code:
a,"h93,23",c,d
a,"h94,24",c\n
d
 
Ans
a,"h94,24",c\n
d

Can you please help to find the error record with a piece of code
# 2  
Old 11-04-2011
Is "\n" the ASCII characters '\' and 'n', or an actual linefeed (0x0A)? If the latter, what terminates a non-error line?
# 3  
Old 11-04-2011
Its basically the new line character.
# 4  
Old 11-04-2011
Which is the same as every line (on unix), so how are you differentiating errored lines?

Are there always 3 fields in an errored line?
# 5  
Old 11-05-2011
The total number of fields is 4.
If the record has lesser number of fields then its a error record.
The exception is "a comma can be a value in the field which is acceptable. In this case the field value will be enclosed in double quotes.
# 6  
Old 11-05-2011
deleted

--ahamed
# 7  
Old 11-06-2011
This may be more complex than you need, but it's the code I usually use for csv parsing (which isn't mine, but I forget where I originally got it from).

Code:
$ cat csv.awk
function csv2array ()
{
        gsub(DELIM, REPL)
        $0 = gensub(/([^,])\"\"/, "\\1'", "g")
        out = ""
        n = length($0)
        for (i = 1;  i <= n;  i++) {
                if ((ch = substr($0, i, 1)) == "\"") {
                        inString = (inString) ? 0 : 1
                }
                out = out ((ch == "," && ! inString) ? DELIM : ch)
        }
        csvNumFields=split(out,csvFields,DELIM);
}

BEGIN {
        if (DELIM == "") DELIM = "\t"
        if (REPL == "") REPL = "~"
}
{
        csv2array();

        if (csvNumFields < 4) {
                for (i=1; i<=csvNumFields; i++) {
                        if (i>1) {
                                printf (",");
                        }
                        printf ("%s", csvFields[i]);
                }
                printf ("\n");
        }
}
$ cat 1.txt
a,b,c,d
1,2,3
4
a,"h93,23",c,d
a,"h94,24",c
d
a,"h96,24",c
d
a,"h95,23",c,d

$ awk -f csv.awk 1.txt
1,2,3
4
a,"h94,24",c
d
a,"h96,24",c
d


Last edited by CarloM; 11-06-2011 at 09:13 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need code for updating second record to first record in shell scripting

Hi,, I have requirement that i need to get DISTINCT values from a table and if there are two records i need to update it to one record and then need to submit INSERT statements by using the updated value as a parameter. Here is the example follows.. SELECT DISTINCT ID FROM OFFER_GROUP WHERE... (1 Reply)
Discussion started by: Samah
1 Replies

2. Shell Programming and Scripting

Extract timestamp from first record in xml file and it checks if not it will replace first record

I have test.xml <emp><id>101</id><name>AAA</name><date>06/06/14 1811</date></emp> <Join><id>101</id><city>london</city><date>06/06/14 2011</date></join> <Join><id>101</id><city>new york</city><date>06/06/14 1811</date></join> <Join><id>101</id><city>sydney</city><date>06/06/14... (2 Replies)
Discussion started by: vsraju
2 Replies

3. Shell Programming and Scripting

Finding the Latest record

Dear All, I have getting data as follows, the second field signifies table name and last one is time stamp. I have return always latest record based on time stamp. Could you please help me ? I/P ==== ... (1 Reply)
Discussion started by: srikanth38
1 Replies

4. Shell Programming and Scripting

How to compare current record,with next and previous record in awk without using array?

Hi! all can any one tell me how to compare current record of column with next and previous record in awk without using array my case is like this input.txt 0 32 1 26 2 27 3 34 4 26 5 25 6 24 9 23 0 32 1 28 2 15 3 26 4 24 (7 Replies)
Discussion started by: Dona Clara
7 Replies

5. Shell Programming and Scripting

Reject the record if the record in the next line does not begin with 2.

Hi, I have a input file with the following entries: 1one 2two 3three 1four 2five 3six 1seven 1eight 1nine 2ten 2eleven 2twelve 1thirteen 2fourteen The output should be: (5 Replies)
Discussion started by: supchand
5 Replies

6. Shell Programming and Scripting

Reject the record if the record in the next line does not satisfy the pattern

Hi, I have a input file with the following entries: 1one 2two 3three 1four 2five 3six 1seven 1eight 1nine 2ten The output should be 1one 2two 3three 1four 2five 3six (2 Replies)
Discussion started by: supchand
2 Replies

7. Shell Programming and Scripting

Finding longest line in a Record

Good Morning/Afternoon All, I am using the nawk utility in korn shell to find the longest field and display that result. My Data is as follows: The cat ran The elephant ran Milly ran too We all ran I have tried nawk '{ if (length($1) > len) len=length($1); print $1}' filename The... (5 Replies)
Discussion started by: SEinT
5 Replies

8. UNIX for Advanced & Expert Users

Print Full record and substring in that record

I have i got a requirement like below. I have input file which contains following fixed width records. 00000000000088500232007112007111 I need the full record and concatenated with ~ and characters from 1to 5 and concatenated with ~ and charactes from 10 to 15 The out put will be like... (1 Reply)
Discussion started by: ukatru
1 Replies

9. UNIX for Dummies Questions & Answers

how to read record by record from a file in unix

Hi guys, i have a big file with the following format.This includes header(H),detail(D) and trailer(T) information in the file.My problem is i have to search for the character "6h" at 14 th and 15 th position in all the records .if it is there i have to write all those records into a... (1 Reply)
Discussion started by: raoscb
1 Replies

10. Shell Programming and Scripting

Finding a character in first line of a record

HI, I am pretty new to Unix scripting. I will need help in Finding a character in first line of a file or a set of files. The scenario is as follows: Lets consider a set of files which is having a character "ID"(without quotes) in the first line of each file.I need to find this character... (14 Replies)
Discussion started by: bsandeep_80
14 Replies
Login or Register to Ask a Question