Have problem with extra EOLs in my CSV - need help cleaning out


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Have problem with extra EOLs in my CSV - need help cleaning out
# 1  
Old 06-16-2014
Have problem with extra EOLs in my CSV - need help cleaning out

Hi Everyone,

Searching the forum, I came across another closed thread, that appears to be either the same problem, or very close to what I'm experiencing. Closed thread for reference is at: Removing cr,lf till number of fields are full

I have a report which generates 7 fields, using a comma as FS:
-Page
-EmplEvName
-Date
-Number of Views (no pct)
-Number of Visits (no pct)
-Number of Visitors (no pct)
-Empl Page Views

However, recently some rows have been generated with extra end-of-line characters (Vim output shown, with $). This breaks any of our other scripts which run against the output (expecting to see 7 fields for every row).

The closed thread offered some solutions using awk/sed, but they didn't seem to work for me, probably because I didn't completely understand them.

All of our scripts are written using BASH, and ideally, I need to stick to a solution that can be implemented in a BASH shell script.

What would be the cleanest way to eliminate the extra "$" shown on the last four lines of the sample below? The data in the last four fields are always integers, but the first three fields vary in length and format.

Code:
"Page","EmplEvName","Date","Number of Views (no pct)","Number of Visits (no pct)","Number of Visitors (no pct)","Empl Page Views"$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","connect_to_a_server",6/14/2014,16,16,15,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","connect_to_a_server",6/11/2014,14,14,14,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","connect_to_a_server",6/7/2014,13,13,13,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","connect_to_a_server",6/1/2014,7,7,6,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","connect_to_a_server",6/8/2014,6,5,5,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","$
Access reports$
",6/12/2014,5,5,4,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","$
Access reports$
",6/5/2014,5,5,5,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","$
Access reports$
",6/13/2014,4,4,4,0$
"Empltvdemo.edgesuite.net/software/analytics/cognos/videos/htvs/cm102","$
Access reports$
",6/2/2014,4,4,4,1$

Logically, it seems fair to remove ALL EOLs, and then insert new ones after every 7th field. This would include instances when field 2 is empty (but still a field).

Thanks,

Rich

Moderator's Comments:
Mod Comment Please use [code], not [icode]

Last edited by Corona688; 06-16-2014 at 04:42 PM..
# 2  
Old 06-16-2014
Logically, you should fix the thing which is producing your broken CSV's, not kludge them back together after, but:

Code:
awk -F, -v OFS="," 'NR==1 { MINREC=NF; next }
{
        if(PREV) {      $0=PREV $0;     $1=$1   }
        if(NF >= MINREC) { print; PREV=""; }
        else    PREV=$0
}' < inputfile > outputfile

This will fail if , appears inside your quoted sections.
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 06-16-2014
Thanks Corona688 - works like a charm. Agree that fixing the root cause is what should be done, however, at our company it's handled by a different group, so we must wait for them to implement fixes.. Smilie

Could I bother your for a brief explanation of what is going on, especially on lines 3,4?

Cheers,

R
This User Gave Thanks to richardsantink For This Post:
# 4  
Old 06-16-2014
$0 is a special variable in awk which means "the entire line".

NF is a special variable which means "the number of fields", i.e. columns.

So we set the entire line with its current contents, plus the PREV contents, and if there's enough fields, print it. Otherwise, save it all in PREV for next time.
This User Gave Thanks to Corona688 For This Post:
# 5  
Old 06-16-2014
Quote:
Originally Posted by Corona688
$0 is a special variable in awk which means "the entire line".

NF is a special variable which means "the number of fields", i.e. columns.

So we set the entire line with its current contents, plus the PREV contents, and if there's enough fields, print it. Otherwise, save it all in PREV for next time.
Nice! Thank you again!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Insert three extra columns in csv file

Hello Experts, I got a requirement i have a input file which am getting from different source,Now i want to add extra 3 columns to this file like BASE,ACTUAL and DATE. Input File Looks like QUAL CHGE TYP LAW COM1 COM2 A 1 X SED HO ASE B 3 Z CDE SE ... (5 Replies)
Discussion started by: ahmed.vaghar
5 Replies

2. Shell Programming and Scripting

Randomly inserting extra columns into csv file

Hi Tech Guru, I have a test file as below , which needs some more fields to be populated randomly : dks3243;12;20130823;1420;25m;0;syt dks3243;rocy;10 dks3243;kiop;18 sde21p4;77;20151210;8479;7py;9;vfr sde21p4;temp;67 sfq6i01;12;20120123;3412;4rd;7;jui sfq6i01;uymk;90 sfq6i01;kiop;51 ... (8 Replies)
Discussion started by: Lokesha
8 Replies

3. UNIX for Dummies Questions & Answers

To Add extra commas to a CSV file.

Hi All, I got this requirement to process a complex CSV file. Eg File. Line 1: Name:,XYz Line 2: Age:,15 Line 3: Grade:,7 Line 4: Line 5: English, Maths, Science,Spanish Line 6:10,11,13,14 As you can see the maximum column is 4 . The file i need to make is Line 1: Name:,XYz,,... (12 Replies)
Discussion started by: chillblue
12 Replies

4. UNIX for Dummies Questions & Answers

To Add extra commas to a CSV file using 2 files...

Hi , Based on my previous requirement the code works fine for comma as delimiter. Now my Req is widened up a bit .. There will be two set of files .. one with comma as delimiter and other with semi-colon ; as delimiter. Second Sample file. With Double Quotes (Semi-Colon... (1 Reply)
Discussion started by: chillblue
1 Replies

5. Shell Programming and Scripting

Adding Extra Commas to a CSV file

Trying in this forum. Not sure if it is permitted.... but in need of help. Please find the requirements in the below link. https://www.unix.com/unix-dummies-questions-answers/191503-add-extra-commas-csv-file-2.html#post302665179 Thanks in Advance. (1 Reply)
Discussion started by: chillblue
1 Replies

6. Shell Programming and Scripting

Ascii Mode appending extra records to csv file

I am relatively new to this forum and Unix scripting. ksh script: part 1 :will call a PL\SQL program will create 3 CSV file at the unix directory. part 2 : will sftp the files to the EFT server. Once the EFT server receives these file , it will transfer them to a shared windows folders. ... (3 Replies)
Discussion started by: developerpa
3 Replies

7. Shell Programming and Scripting

shell script to remove extra commas from CSV outp file

Name,,,,,,,,,,,,,,,,,,,,Domain,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Contact,Phone,Email,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Location -----------------------,------------------------------------------------,-------,-----,---------------------------------,------------------------------------ ----... (1 Reply)
Discussion started by: sreenath1037
1 Replies

8. UNIX for Dummies Questions & Answers

cleaning up spaces from fixed width file while converting to csv file

Open to a sed/awk/or perl alternative so that i can stick command into my bash script. This is a problem I resolve using a combination of cut commands - but that is getting convoluted. So would really appreciate it if someone could provide a better solution which basically replaces all... (3 Replies)
Discussion started by: svn
3 Replies

9. Shell Programming and Scripting

SED help - cleaning up code, extra spaces won't go away

Hello, W/in the script I'm working on, I have a need to take a column from a file, and format it so I can have a variable that will egrep for & invert the regex from another file. My solution is this: VAR=`awk -F, '{print $2}' $FAIL | sed 's/-i/\|/g'` VAR2=`echo $VAR | sed 's/... (5 Replies)
Discussion started by: Matthias03
5 Replies

10. Shell Programming and Scripting

manipulating csv to add extra row

hi how do i manipulate .csv file to add an extra row after each row using shell script? I need a blank line added for each 1000 records in my file? I will then need to copy and paste some data in the blank row created. thanks 4 ur support neil (3 Replies)
Discussion started by: neil546
3 Replies
Login or Register to Ask a Question