Text file to CSV with field data separated by blank lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Text file to CSV with field data separated by blank lines
# 1  
Old 03-18-2012
Text file to CSV with field data separated by blank lines

Hello,

I have some data in a text file where fields are separated by blank lines. There are only 6 fields however some fields have several lines of data as I will explain. Also data in a particular field is not consistently the same size but does end on a blank line. The first field start with the number 12 so I can use that as a start/stop point I guess but it has been a while and I forgot much of my shell scripting. Here is an example and what I need to do:
Code:
12 ABC5

01/01/2012 NameOfCity

Street Address

Person 1
Person 1 continued
Person 1 extra data

Person 2

12 ABC6

01/01/2012 NameOfCity

Street Address
Street Address continued
Street Address extra

Person 1

Person 2
Person 2 extra data

12 ABC7

01/01/2012 NameOfCity

.........and so on

What I want it to look like delimited for CSV use is:
Code:
12 ABC5;01/01/2012 NameOfCity;Street Address;Person 1,Person1 continued, Person1 extra data;Person 2
12 ABC6;01/01/2010 NameOfCity;StreetAddress;Person 1;Person 2, Person2 extra data
12 ABC7;....and so on........to the end of file.

In a perfect world I would like to delimit the date from city name like so:
Code:
12 ABC5;01/01/2012;NameOfCity;Street Address;Person 1,Person1 continued, Person1 extra data;Person 2
12 ABC6;01/01/2010;NameOfCity;StreetAddress;Person 1;Person 2, Person2 extra data
12 ABC7;....and so on........to the end of file.

So you see the problem is that Address, Person 1 and Person 2 can just be one line or may have many lines (up until a blank line) but that is not the same for every field every time. Just need to get this stuff all on single lines so I can import into excel or MYSQL etc.

Preferably I would like to do this with sed and loop in common bash script as my current script that got me this far is in bash. I'll take anything at this point though even if I have to call another program.

Thanks in advance for any replies!


Art

Last edited by Franklin52; 03-18-2012 at 10:52 AM.. Reason: Please use code tags for data and code samples, thank you
# 2  
Old 03-18-2012
Try this:
Code:
awk '/^12 /{
  if(s){print s}
  s=$0
  next
}
NF{
  if($0 ~ /..\/..\//){sub(" ",";")}
  s=s";"$0
}
END{print s}
' file

# 3  
Old 03-18-2012
Franklin52,

Thanks so much for that. What you sent is nearly perfect. The problem is that some fields like "person1" or "person2" (as well as others) could have many more than 2 or 3 lines. What you sent is awesome but some fields had a ";" in them where there should be a comma (fields with more than 2 or 3 lines initially). I am just not familiar with awk and how to adjust this.

Thanks so much for what you sent. I will try to hack at it unless you follow up.

Art

---------- Post updated at 12:24 PM ---------- Previous update was at 12:16 PM ----------

Also noticed that far down in the file I am processing that the ";" delimiter is dropped after the date. Not sure if this is due to extra lines per field in some cases.

Art
# 4  
Old 03-18-2012
Try:
Code:
awk 'END{print "\n"} $1~"/../"{{sub(" ",";")} NR>1 && NF{$1=$1;print /^12/?"\n":";"}NF' FS="\n" OFS=, RS= ORS= infile

slightly more readable form:

Code:
awk '
END{
  print "\n"
}
$1~"/../"{{
  sub(" ",";")
}
NR>1 && NF{
  $1=$1
  print /^12/?"\n":";"
}
NF
' FS="\n" OFS=, RS= ORS= infile


Last edited by Scrutinizer; 03-18-2012 at 01:41 PM..
# 5  
Old 03-18-2012
That is it

Scrutinizer,

Thanks so much, that is it! Exactly what I am trying to to. Now I should be able to import this in mysql or excel as CSV data.

Also many thanks to Franklin52 for what you sent!

You made my day!Smilie


Art
# 6  
Old 03-20-2012
Need to remove cooma from description in Comma seperated csv file

Hi All,

I have csv file which is comma seperated having data like :

CSV FILE:

121,11852,2011 Q4,0201, ,TRADITIONAL,Internal Fraud,6/14/2010,"100,000,000","Unauthorized Trading (e.g. Violating guidelines, Hiding losses, Securities Valuation)",1,

121,11853,2011 Q4,0201, ,TRADITIONAL,Internal Fraud,6/14/2010,"100,000,000",Fraud and Collusion,2,

121,11854,2011 Q4,0201, ,TRADITIONAL,Internal Fraud,6/14/2010,"100,000,000",Embezzlement,3,

In this file for few columns actual data contains comma e.g. column no 9,10 in first row. ("100,000,000","Unauthorized Trading (e.g. Violating guidelines, Hiding losses, Securities Valuation)").

For columns where actual data contains comma those columns are prefixed and suffixed by double quote in csv (" ").

I want to replace commas which comes in between double quotes with semicolon. (In short where actual column data contains comma i want it to replace with semicolon) and also those double quotes should get deleted afterwards.

Output should be like this where comma should be only file delimiter:


121,11852,2011 Q4,0201, ,TRADITIONAL,Internal Fraud,6/14/2010,100;000;000,Unauthorized Trading (e.g. Violating guidelines; Hiding losses; Securities Valuation),1,

121,11853,2011 Q4,0201, ,TRADITIONAL,Internal Fraud,6/14/2010,100;000;000,Fraud and Collusion,2,

121,11854,2011 Q4,0201, ,TRADITIONAL,Internal Fraud,6/14/2010,100;000;000",Embezzlement,3,

Please reply Smilie
# 7  
Old 03-20-2012
Hi ops_team, please start a new thread for this. Don't forget to use code tags.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Tried many options but unable to delete blank lines from text file

Hi, I tried the following options but was unable to delete blank lines from file Input file = temp.hash.txt temp.hash.txt content 90 0 89.56 0 0 57575.4544 56.89 (9 Replies)
Discussion started by: uuuunnnn
9 Replies

2. Shell Programming and Scripting

Converting data for text file to csv

Gents Using the script attached (raw2csv). i use to create the file .csv.. The input file is called 201.raw. Kindly can you check if there is easy way to do it. The script works fine but takes a lot time to process Thanks for your help (8 Replies)
Discussion started by: jiam912
8 Replies

3. Shell Programming and Scripting

Need to remove a selection of rows separated by blank lines

hello, here is an example: 9.07 9.05 0.00 2.28 0.00 0.08 1.93 3.62 10.97 12.03 12.03 0.00 2.73 0.00 0.07 (3 Replies)
Discussion started by: Baron1
3 Replies

4. Shell Programming and Scripting

Help in replacing two blank lines with two lines of diff data

Hi.. I'm facing a trouble in replacing two blank lines in a file using shell script... I used sed to search a line and insert two blank lines after the searchd line using the following sed command. sed "/data/{G;G;}/" filename . In the file, after data tag, two lines got inserted blank lines..... (4 Replies)
Discussion started by: arjun_arippa
4 Replies

5. Shell Programming and Scripting

extract data in a csv file based on a certain field.

I have a csv file that I need to extract some data from depending on another field after reading info from another text file. The text file would say have 592560 in it. The csv file may have some data like so Field 1 Field2 Field3 Field4 Field5 Field6 20009756 1 ... (9 Replies)
Discussion started by: GroveTuckey
9 Replies

6. Shell Programming and Scripting

Compare Tab Separated Field with AWK to all and print lines of unique fields.

Hi. I have a tab separated file that has a couple nearly identical lines. When doing: sort file | uniq > file.new It passes through the nearly identical lines because, well, they still are unique. a) I want to look only at field x for uniqueness and if the content in field x is the... (1 Reply)
Discussion started by: rocket_dog
1 Replies

7. UNIX for Dummies Questions & Answers

Remove blank lines and comments from text file

Hi, I am using BASH. How can I remove any lines in a text file that are either blank or begin with a # (ie. comments)? Thanks in advance. Mike (3 Replies)
Discussion started by: msb65
3 Replies

8. Shell Programming and Scripting

Find lines in text file with certain data in first field

Hi all, Sorry for the title, I was unsure how to word my issue. I'll get right to the issue. In my text file, I need to find all lines with the same data in the first field. Then I need to create a file with the matching lines merged into one. So my original file will look something like... (4 Replies)
Discussion started by: rstev39147
4 Replies

9. Shell Programming and Scripting

How to add text to a field within a csv file

I have a csv file which has three columns mem no. name surname 1234 John Smith 12345 John Doe I want to change the mem no. to add TF to the mem no. field i.e. mem no. name surname 1234TF John Smith 12345TF John Doe How do you do this for all records in the file? (3 Replies)
Discussion started by: Pablo_beezo
3 Replies

10. Shell Programming and Scripting

Exporting text file data to csv

Could any one help me in basic shell script to export text file data to csv. I need to export only particular data from text file to csv column. I am a newbie to UNIX could anyone help me with sample script code (3 Replies)
Discussion started by: l_jayakumar
3 Replies
Login or Register to Ask a Question