Correct incomplete fields separated by new lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Correct incomplete fields separated by new lines
# 1  
Old 05-29-2015
Correct incomplete fields separated by new lines

Hello Friends,

I have an issue with a csv file that is separated by comma. The file should have 5 fields every time. The record delimiter of the file is \r\n but we are seeing that in few records the address field has \r\n too in them which is causing the line to break into two or more lines.

Please see the example file below:
The file should be read as follows:
firstname,lastname,address,city,state

example data is:
Code:
david,smith,123 Lindsay Street,columbus,oh
john,bush,5434A 
Cresent Drive, Cleveland,oh
Micheal,Slater,34E Lobson
Street NE
Apt 3,Burbank,43017
Bill,thompson,1298 Bread Street,Cincinnati,oh

How should I code to convert the above file to the following:
Code:
david,smith,123 Lindsay Street,columbus,oh
john,bush,5434A Cresent Drive, Cleveland,oh
Micheal,Slater,34E Lobson Street NE Apt 3,Burbank,43017
Bill,thompson,1298 Bread Street,Cincinnati,oh

Please let me know if I need to provide further details.

Last edited by Scrutinizer; 05-29-2015 at 02:00 AM.. Reason: CODE tags
# 2  
Old 05-29-2015
Try something like:
Code:
tr -d '\r' < file | awk -F, '{while (NF<5 && (getline n)>0) $0=$0 n}1'

or
Code:
awk -F, '{while (NF<5 && (getline n)>0) $0=$0 n; gsub(/\r/,x)}1' file

which should work as long as the \r\n does not appear in field nr. 5

---
On Solaris use /usr/xpg4/bin/awk rather than awk

Last edited by Scrutinizer; 05-29-2015 at 04:15 AM..
# 3  
Old 05-29-2015
Hi Scrutinizer,

I am trying to understand code for my learning purpose, can you please explain if possible.

Code:
awk -F, '{while (NF<5 && (getline n)>0) $0=$0 n}1'

why it is (getline n) >0
what i have known is getline is used to read nextline, ending 1 is always true and records with fields lessthan 5

please correct me if am wrong

Thanks

venky
# 4  
Old 05-29-2015
man awk:
Quote:
Getline returns 0 on end-of-file, -1 on error, otherwise 1.
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to extract fields from a CSV i.e comma separated where some of the fields having comma as value?

can anyone help me!!!! How to I parse the CSV file file name : abc.csv (csv file) The above file containing data like abv,sfs,,hju,',',jkk wff,fst,,rgr,',',rgr ere,edf,erg,',',rgr,rgr I have a requirement like i have to extract different field and assign them into different... (4 Replies)
Discussion started by: J.Jena
4 Replies

2. Shell Programming and Scripting

Convert fixed value fields to comma separated values

Hi All, Hope you are doing Great!!!. Today i have came up with a problem to say exactly it was for performance improvement. I have written code in perl as a solution for this to cut in specific range, but it is taking time to run for files thousands of lines so i am expecting a sed... (9 Replies)
Discussion started by: mad man
9 Replies

3. Shell Programming and Scripting

Not able to sort two fields and printf not displaying the correct values

Not able to sorting two fileds resolved printf issue 01-1000/9|JAN 01-0000/6|MAN 01-1010/2|JAN 01-1010/2|JAN 01-1010/2|JAN 01-1000/9|JAN 01-1000/9|JAN 01-1000/9|SAA 01-1000/9|SAA 01-0000/6|SAN 01-0000/6|SAN 1.sort -t'|' -k1,1n -k2,2 file (3 Replies)
Discussion started by: kalia4u
3 Replies

4. Shell Programming and Scripting

How to get the values of multipledot(.) separated fields?

Hello, I have a file which has the following contents : thewall............0000000000200000 kmemfreelater......0000000000000000 kmemgcintvl........0000000000000002 kmeminuse..........00000000223411C0 allocated..........0000000029394000 bucket.......... @.F1000A02800C2158 The mentioned... (4 Replies)
Discussion started by: rahul2662
4 Replies

5. Shell Programming and Scripting

AWK:Split fields separated by semicolon

Hi all, I have a .vcf file which contains 8 coulmns and the data under each column as shown below, CHROM POS ID REF ALT QUAL FILTER INFO 1 3000012 . A G 126 ... (6 Replies)
Discussion started by: mehar
6 Replies

6. Shell Programming and Scripting

Compare files with fields separated with semicolon

Dear experts I have files like ABD : 5869 events, relative ratio : 1.173800E-01 , sum of ratios : 1.173800E-01 VBD : 12147 events, relative ratio : 2.429400E-01 , sum of ratios : 3.603200E-01 SDF : 17000 events, relative ratio : 3.400000E-01 , sum of ratios : 7.003200E-01 OIP: 14984... (9 Replies)
Discussion started by: Alkass
9 Replies

7. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

8. Shell Programming and Scripting

Parse apart strings of comma separated data with varying number of fields

I have a situation where I am reading a text file line-by-line. Those lines of data contain comma separated fields of data. However, each line can vary in the number of fields it can contain. What I need to do is parse apart each line and write each field of data found (left to right) into a file.... (7 Replies)
Discussion started by: 2reperry
7 Replies

9. UNIX for Dummies Questions & Answers

Remove whitespaces between comma separated fields from file

Hello all, I am a unix dummy. I am trying to remove spaces between fields. I have the file in the following format 12332432, 2345 , asdfsdf ,100216 , 9999999 12332431, 2341 , asdfsd2 ,100213 , 9999999 &... (2 Replies)
Discussion started by: nitinbjoshi
2 Replies
Login or Register to Ask a Question