Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Help with a project. convert a txt to csv Post 302594186 by turk451 on Monday 30th of January 2012 07:12:22 PM
Old 01-30-2012
For more complex text processing, I like to use Perl. If you have it installed in your environment, I found this to work on your initial data file:
Code:
#!/usr/bin/perl -w

open (DATAFILE, $ARGV[0]) or die ("Could not open data file.");

foreach $line (<DATAFILE>){
        $line =~ s/\s+$//;
        if (length($line) > 0) {
                if ( $line =~ /^Study Week:/ ) {
                      @lineArray = split("\\*|\\-",$line);
                      $bdate = $lineArray[1];
                      $bdate =~ s/^\s+|\s+$//g;
                      $edate = $lineArray[2];
                      $edate =~ s/^\s+|\s+$//g;
                } elsif ( $line =~ /^Day:/ ) {
                      @lineArray = split("/",$line);
                      $bdate = $lineArray[1];
                } elsif ( $line =~ /^ ?[0-9]/ ) {
                      print(join(',',$bdate,$edate,substr($line,0,5),substr($line,7,5),substr($line,35,10),substr($line,46,25),substr($line,72)."\n"));
                }
        }
}

I am just assuming that the begin date and end date are in that first line "Study Week: 1 * 03.10.11 - 08.10.11 * ...", and that the begin date changes as the document progresses and the lines beginning with "Day" are encountered. Here is the program running on the text file provided:
Code:
./filter.pl ../dat.dat
03.10.11,08.10.11, 8:00,10:15,seminar   ,MCK16 Reproduct/Develop  ,821 Gynaecology
03.10.11,08.10.11, 8:00,10:15,seminar   ,MCC8 Dis.RenalFunc./Oedem,824 1stInternal Med
03.10.11,08.10.11,10:30,13:00,practice  ,MCK16 Reproduct/Develop  ,821 Gynaecology
03.10.11,08.10.11,10:30,13:00,seminar   ,MCC8 Dis.RenalFunc./Oedem,221 Seminar Room
03.10.11,08.10.11,14:30,17:00,elect.cour,Urgent proc. in burned   ,804 Burns Surgery
04.10.11,08.10.11, 8:00, 9:30,seminar   ,MCK16 Reproduct/Develop  ,821 Gynaecology
04.10.11,08.10.11, 8:00,10:15,seminar   ,MCC8 Dis.RenalFunc./Oedem,422 Seminar Room
04.10.11,08.10.11, 9:35,10:20,individ.st,MCK16 Reproduct/Develop  ,104 Individ.study
04.10.11,08.10.11,10:30,12:45,individ.st,MCK16 Reproduct/Develop  ,104 Individ.study
04.10.11,08.10.11,10:30,13:00,seminar   ,MCC8 Dis.RenalFunc./Oedem,422 Seminar Room
04.10.11,08.10.11,14:00,15:30,lecture   ,Psychology/pathopsychol. ,220 Seminar Room
04.10.11,08.10.11,15:45,17:15,practice  ,Psychology/pathopsychol. ,220 Seminar Room
05.10.11,08.10.11, 7:45,10:15,seminar   ,MCK16 Reproduct/Develop  ,821 Gynaecology
05.10.11,08.10.11, 8:00,10:15,seminar   ,MCC8 Dis.RenalFunc./Oedem,423 Seminar Room
05.10.11,08.10.11,10:30,13:00,practice  ,MCK16 Reproduct/Develop  ,821 Gynaecology
05.10.11,08.10.11,10:30,12:00,seminar   ,MCC8 Dis.RenalFunc./Oedem,423 Seminar Room
05.10.11,08.10.11,12:15,13:00,seminar   ,MCC8 Dis.RenalFunc./Oedem,423 Seminar Room
05.10.11,08.10.11,14:00,15:30,elect.cour,Advanced Czech communic. ,960 FNKV/pav.X/lang
06.10.11,08.10.11, 8:00,10:15,practice  ,MCK16 Reproduct/Develop  ,821 Gynaecology
06.10.11,08.10.11, 8:00,13:00,practice  ,MCC8 Dis.RenalFunc./Oedem,824 1stInternal Med
06.10.11,08.10.11,10:45,13:15,practice  ,MCK16 Reproduct/Develop  ,821 Gynaecology
06.10.11,08.10.11,14:30,16:00,elect.cour,Endoscopic/robotic urol. ,909 Thomayer Hosp.
06.10.11,08.10.11,14:30,15:15,elect.cour,Methods Nuclear Cardiol. ,838 Urolog/NuclMed.
07.10.11,08.10.11, 8:00,10:15,seminar   ,MCK16 Reproduct/Develop  ,906 ÚPMD
07.10.11,08.10.11, 8:00, 9:45,seminar   ,MCC8 Dis.RenalFunc./Oedem,422 Seminar Room
07.10.11,08.10.11,10:00,13:15,seminar   ,MCC8 Dis.RenalFunc./Oedem,422 Seminar Room
07.10.11,08.10.11,11:00,13:30,practice  ,MCK16 Reproduct/Develop  ,220 Seminar Room

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How to convert PS files to txt file?

Hi, I need to convert PS files to txt file. I thought of using ps2ascii, but its not installed in my AIX box, any other option? (2 Replies)
Discussion started by: redlotus72
2 Replies

2. Shell Programming and Scripting

AWK CSV to TXT format, TXT file not in a correct column format

HI guys, I have created a script to read 1 column in a csv file and then place it in text file. However, when i checked out the text file, it is not in a column format... Example: CSV file contains name,age aa,11 bb,22 cc,33 After using awk to get first column TXT file... (1 Reply)
Discussion started by: mdap
1 Replies

3. Shell Programming and Scripting

how to convert XLS to CSV and DOC/RTF to TXT

Hi, i don't know anything about PERL. Can anyone help me providing PERL scripts for 1. converting XLS to CSV (and vice-versa) 2. converting DOC/RTF to TXT Thanks much Prvn (1 Reply)
Discussion started by: prvnrk
1 Replies

4. Shell Programming and Scripting

Txt to csv convert

Hi, I was trying some split command to pull out values like "uid=abc,ou=INTERNAL,ou=PEOPLE" into a csv file. However because of erratic nature of occurrance of rows made me stopped. Could someone help me in this? and if someone has a one liner for this? The text file contain pattern like this... (14 Replies)
Discussion started by: john_prince
14 Replies

5. Shell Programming and Scripting

Convert txt to csv

Hi - I am looking to convert the following text to csv. The columns may not always have data in them and they may have varying spaces but I still need to have a comma there anyway: Sample Data: ~~~~~~~ Name Email Location Phone Tom... (4 Replies)
Discussion started by: JPBovaird
4 Replies

6. Shell Programming and Scripting

txt file to CSV

hi.. I have a text file which looks likes this 2258 4569 1239 258 473 i need to convert it into comma seperated format eg:2258,4569,1239,258,437 pls help (8 Replies)
Discussion started by: born
8 Replies

7. Shell Programming and Scripting

Need script to convert TXT file into CSV

Hi Team, i have some script which give output in TXT format , need script to convert TXT file into CSV. Output.TXT 413. U-UU-LVDT-NOD-6002 macro_outcome_dist-8.0.0(v1_0_2) KK:1.2.494 (1234:333:aaa:2333:3:2:333:a) 414. U-UU-LVDT-NOD-6004 ... (10 Replies)
Discussion started by: Ganesh Mankar
10 Replies

8. Red Hat

How to convert TXT to PDF in RHEL 6?

Hello friends, I need to convert ASCII text to PDF on RHEL 6 so I did the below and could generate PDF but it has lot of junk/special characters. yum install enscript ghostscript enscript -p output.ps input.txt ps2pdf output.ps output.pdf So I download latest source of Ghostscript... (4 Replies)
Discussion started by: magnus29
4 Replies

9. Solaris

How to convert pdf file to txt?

Hello Unix gurus, I am learning unix. I have lots pdf data files. I need to convert them into txt files. Can you please guide me how to do that? Thanks in advance. Rao (1 Reply)
Discussion started by: raopatwari
1 Replies

10. UNIX for Beginners Questions & Answers

Convert a txt file to a CSV file

Hi , I have a Txt file which consist of 1000's of SOAP request and response and i want the file to be converted to a csv file like column a should have a soap request and column b should have the soap response . can someone assist me in achieving this please ? Thanks (2 Replies)
Discussion started by: kumarm8
2 Replies
DateTime::Locale::tl(3) 				User Contributed Perl Documentation				   DateTime::Locale::tl(3)

NAME
DateTime::Locale::tl SYNOPSIS
use DateTime; my $dt = DateTime->now( locale => 'tl' ); print $dt->month_name(); DESCRIPTION
This is the DateTime locale package for Tagalog. DATA
This locale inherits from the DateTime::Locale::fil locale. It contains the following data. Days Wide (format) Lunes Martes Miyerkules Huwebes Biyernes Sabado Linggo Abbreviated (format) Lun Mar Mye Huw Bye Sab Lin Narrow (format) L M M H B S L Wide (stand-alone) Lunes Martes Miyerkules Huwebes Biyernes Sabado Linggo Abbreviated (stand-alone) Lun Mar Mye Huw Bye Sab Lin Narrow (stand-alone) L M M H B S L Months Wide (format) Enero Pebrero Marso Abril Mayo Hunyo Hulyo Agosto Setyembre Oktubre Nobyembre Disyembre Abbreviated (format) Ene Peb Mar Abr May Hun Hul Ago Set Okt Nob Dis Narrow (format) E P M A M H H A S O N D Wide (stand-alone) Enero Pebrero Marso Abril Mayo Hunyo Hulyo Agosto Setyembre Oktubre Nobyembre Disyembre Abbreviated (stand-alone) Ene Peb Mar Abr May Hun Hul Ago Set Okt Nob Dis Narrow (stand-alone) E P M A M H H A S O N D Quarters Wide (format) Q1 Q2 Q3 Q4 Abbreviated (format) Q1 Q2 Q3 Q4 Narrow (format) 1 2 3 4 Wide (stand-alone) Q1 Q2 Q3 Q4 Abbreviated (stand-alone) Q1 Q2 Q3 Q4 Narrow (stand-alone) 1 2 3 4 Eras Wide BCE CE Abbreviated BCE CE Narrow BCE CE Date Formats Full 2008-02-05T18:30:30 = Martes, Pebrero 05 2008 1995-12-22T09:05:02 = Biyernes, Disyembre 22 1995 -0010-09-15T04:44:23 = Sabado, Setyembre 15 -10 Long 2008-02-05T18:30:30 = Pebrero 5, 2008 1995-12-22T09:05:02 = Disyembre 22, 1995 -0010-09-15T04:44:23 = Setyembre 15, -10 Medium 2008-02-05T18:30:30 = Peb 5, 2008 1995-12-22T09:05:02 = Dis 22, 1995 -0010-09-15T04:44:23 = Set 15, -10 Short 2008-02-05T18:30:30 = 2/5/08 1995-12-22T09:05:02 = 12/22/95 -0010-09-15T04:44:23 = 9/15/-10 Default 2008-02-05T18:30:30 = Peb 5, 2008 1995-12-22T09:05:02 = Dis 22, 1995 -0010-09-15T04:44:23 = Set 15, -10 Time Formats Full 2008-02-05T18:30:30 = 18:30:30 UTC 1995-12-22T09:05:02 = 09:05:02 UTC -0010-09-15T04:44:23 = 04:44:23 UTC Long 2008-02-05T18:30:30 = 18:30:30 UTC 1995-12-22T09:05:02 = 09:05:02 UTC -0010-09-15T04:44:23 = 04:44:23 UTC Medium 2008-02-05T18:30:30 = 18:30:30 1995-12-22T09:05:02 = 09:05:02 -0010-09-15T04:44:23 = 04:44:23 Short 2008-02-05T18:30:30 = 18:30 1995-12-22T09:05:02 = 09:05 -0010-09-15T04:44:23 = 04:44 Default 2008-02-05T18:30:30 = 18:30:30 1995-12-22T09:05:02 = 09:05:02 -0010-09-15T04:44:23 = 04:44:23 Datetime Formats Full 2008-02-05T18:30:30 = Martes, Pebrero 05 2008 18:30:30 UTC 1995-12-22T09:05:02 = Biyernes, Disyembre 22 1995 09:05:02 UTC -0010-09-15T04:44:23 = Sabado, Setyembre 15 -10 04:44:23 UTC Long 2008-02-05T18:30:30 = Pebrero 5, 2008 18:30:30 UTC 1995-12-22T09:05:02 = Disyembre 22, 1995 09:05:02 UTC -0010-09-15T04:44:23 = Setyembre 15, -10 04:44:23 UTC Medium 2008-02-05T18:30:30 = Peb 5, 2008 18:30:30 1995-12-22T09:05:02 = Dis 22, 1995 09:05:02 -0010-09-15T04:44:23 = Set 15, -10 04:44:23 Short 2008-02-05T18:30:30 = 2/5/08 18:30 1995-12-22T09:05:02 = 12/22/95 09:05 -0010-09-15T04:44:23 = 9/15/-10 04:44 Default 2008-02-05T18:30:30 = Peb 5, 2008 18:30:30 1995-12-22T09:05:02 = Dis 22, 1995 09:05:02 -0010-09-15T04:44:23 = Set 15, -10 04:44:23 Available Formats d (d) 2008-02-05T18:30:30 = 5 1995-12-22T09:05:02 = 22 -0010-09-15T04:44:23 = 15 EEEd (d EEE) 2008-02-05T18:30:30 = 5 Mar 1995-12-22T09:05:02 = 22 Bye -0010-09-15T04:44:23 = 15 Sab Hm (H:mm) 2008-02-05T18:30:30 = 18:30 1995-12-22T09:05:02 = 9:05 -0010-09-15T04:44:23 = 4:44 hm (h:mm a) 2008-02-05T18:30:30 = 6:30 PM 1995-12-22T09:05:02 = 9:05 AM -0010-09-15T04:44:23 = 4:44 AM Hms (H:mm:ss) 2008-02-05T18:30:30 = 18:30:30 1995-12-22T09:05:02 = 9:05:02 -0010-09-15T04:44:23 = 4:44:23 hms (h:mm:ss a) 2008-02-05T18:30:30 = 6:30:30 PM 1995-12-22T09:05:02 = 9:05:02 AM -0010-09-15T04:44:23 = 4:44:23 AM M (L) 2008-02-05T18:30:30 = 2 1995-12-22T09:05:02 = 12 -0010-09-15T04:44:23 = 9 Md (M-d) 2008-02-05T18:30:30 = 2-5 1995-12-22T09:05:02 = 12-22 -0010-09-15T04:44:23 = 9-15 MEd (E, M-d) 2008-02-05T18:30:30 = Mar, 2-5 1995-12-22T09:05:02 = Bye, 12-22 -0010-09-15T04:44:23 = Sab, 9-15 MMM (LLL) 2008-02-05T18:30:30 = Peb 1995-12-22T09:05:02 = Dis -0010-09-15T04:44:23 = Set MMMd (MMM d) 2008-02-05T18:30:30 = Peb 5 1995-12-22T09:05:02 = Dis 22 -0010-09-15T04:44:23 = Set 15 MMMEd (E MMM d) 2008-02-05T18:30:30 = Mar Peb 5 1995-12-22T09:05:02 = Bye Dis 22 -0010-09-15T04:44:23 = Sab Set 15 MMMMd (MMMM d) 2008-02-05T18:30:30 = Pebrero 5 1995-12-22T09:05:02 = Disyembre 22 -0010-09-15T04:44:23 = Setyembre 15 MMMMEd (E MMMM d) 2008-02-05T18:30:30 = Mar Pebrero 5 1995-12-22T09:05:02 = Bye Disyembre 22 -0010-09-15T04:44:23 = Sab Setyembre 15 ms (mm:ss) 2008-02-05T18:30:30 = 30:30 1995-12-22T09:05:02 = 05:02 -0010-09-15T04:44:23 = 44:23 y (y) 2008-02-05T18:30:30 = 2008 1995-12-22T09:05:02 = 1995 -0010-09-15T04:44:23 = -10 yM (yyyy-M) 2008-02-05T18:30:30 = 2008-2 1995-12-22T09:05:02 = 1995-12 -0010-09-15T04:44:23 = -010-9 yMEd (EEE, yyyy-M-d) 2008-02-05T18:30:30 = Mar, 2008-2-5 1995-12-22T09:05:02 = Bye, 1995-12-22 -0010-09-15T04:44:23 = Sab, -010-9-15 yMMM (y MMM) 2008-02-05T18:30:30 = 2008 Peb 1995-12-22T09:05:02 = 1995 Dis -0010-09-15T04:44:23 = -10 Set yMMMEd (EEE, y MMM d) 2008-02-05T18:30:30 = Mar, 2008 Peb 5 1995-12-22T09:05:02 = Bye, 1995 Dis 22 -0010-09-15T04:44:23 = Sab, -10 Set 15 yMMMM (y MMMM) 2008-02-05T18:30:30 = 2008 Pebrero 1995-12-22T09:05:02 = 1995 Disyembre -0010-09-15T04:44:23 = -10 Setyembre yQ (y Q) 2008-02-05T18:30:30 = 2008 1 1995-12-22T09:05:02 = 1995 4 -0010-09-15T04:44:23 = -10 3 yQQQ (y QQQ) 2008-02-05T18:30:30 = 2008 Q1 1995-12-22T09:05:02 = 1995 Q4 -0010-09-15T04:44:23 = -10 Q3 yyMM (yy-MM) 2008-02-05T18:30:30 = 08-02 1995-12-22T09:05:02 = 95-12 -0010-09-15T04:44:23 = -10-09 yyMMM (MMM yy) 2008-02-05T18:30:30 = Peb 08 1995-12-22T09:05:02 = Dis 95 -0010-09-15T04:44:23 = Set -10 Miscellaneous Prefers 24 hour time? Yes Local first day of the week Lunes SUPPORT
See DateTime::Locale. AUTHOR
Dave Rolsky <autarch@urth.org> COPYRIGHT
Copyright (c) 2008 David Rolsky. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This module was generated from data provided by the CLDR project, see the LICENSE.cldr in this distribution for details on the CLDR data's license. perl v5.16.3 2014-06-10 DateTime::Locale::tl(3)
All times are GMT -4. The time now is 11:57 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy