Help with datafile parsing and creating spreadsheet


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with datafile parsing and creating spreadsheet
# 1  
Old 04-28-2012
Help with datafile parsing and creating spreadsheet

I have a datafile containing data in the following format

Code:
name1,employee_number1,cell1,home1,fax1
name2,employee_number2,cell2,home2,fax2
name3,employee_number3,cell3,home3,fax3
name4,employee_number4,cell4,home4,fax4
name5,employee_number5,cell5,home5,fax5
...
...
....

I would like to convert this file into a columnar format, so data for each employee shows up as a single column in excel. Something like the following :

Code:
name1,name2,name3,name4,name5 .........
<blank_line1>
<blank_line2>
<blank_line3>
<blank_line4>
<blank_line5>
employee_number1, employee_number2, employee_number3, employee_number4, employee_number5.....
cell1,cell2,cell3,cell4,cell5....
home1,home2,home3,home4,home5.......
fax1,fax2,fax3,fax4,fax5.....

thanks as always!

Last edited by Scrutinizer; 04-29-2012 at 04:36 AM.. Reason: code tags
# 2  
Old 04-28-2012
there is answer on this recently.

This is the transposition of matrix. Check the post of acp, an awesome script there.
www.unix.com/shell-programming-scripting/18897-row-column-transpose.html

Last edited by yifangt; 04-29-2012 at 12:11 AM.. Reason: give details
This User Gave Thanks to yifangt For This Post:
# 3  
Old 04-29-2012
That was really helpful, however my requirement is slightly more complex :

Input Data file :

Code:
TEMP, STUDENT1 COURSE1,COURSE1_DETAILS,COURSE1_SCORE
TEMP, STUDENT1 COURSE1,COURSE1_MORE_DETAILS,COURSE1_COMMENTS
TEMP, STUDENT1 COURSE1,COURSE1_MORE_DETAILS1,COURSE1_COMMENTS1
TEMP, STUDENT1 COURSE1,COURSE1_MORE_DETAILS2,COURSE1_COMMENTS2
TEMP, STUDENT1, COURSE2,COURSE2_DETAILS,COURSE2_SCORE
TEMP, STUDENT1 COURSE2,COURSE2_MORE_DETAILS,COURSE2_COMMENTS
TEMP, STUDENT1 COURSE2,COURSE2_MORE_DETAILS1,COURSE2_COMMENTS1
TEMP, STUDENT1 COURSE2,COURSE2_MORE_DETAILS2,COURSE_COMMENTS2
..
..

TEMP, STUDENT2, COURSE1,COURSE1_DETAILS,COURSE1_SCORE
TEMP, STUDENT2, COURSE1,COURSE1_MORE_DETAILS,COURSE1_COMMENTS
..
TEMP, STUDENT2, COURSE2,COURSE2_DETAILS,COURSE2_SCORE
..
TEMP, STUDENT2, COURSE3,COURSE3_DETAILS,COURSE3_SCORE
..
..
PERM, STUDENT1 COURSE1,COURSE1_DETAILS,COURSE1_SCORE
PERM, STUDENT1 COURSE1,COURSE1_MORE_DETAILS,COURSE1_COMMENTS
PERM, STUDENT1 COURSE1,COURSE1_MORE_DETAILS1,COURSE1_COMMENTS1
PERM, STUDENT1 COURSE1,COURSE1_MORE_DETAILS2,COURSE1_COMMENTS2
PERM, STUDENT1, COURSE2,COURSE2_DETAILS,COURSE2_SCORE
PERM, STUDENT1 COURSE2,COURSE2_MORE_DETAILS,COURSE2_COMMENTS
PERM, STUDENT1 COURSE2,COURSE2_MORE_DETAILS1,COURSE2_COMMENTS1
PERM, STUDENT1 COURSE2,COURSE2_MORE_DETAILS2,COURSE_COMMENTS2
..
..

PERM, STUDENT2, COURSE1,COURSE1_DETAILS,COURSE1_SCORE
PERM, STUDENT2, COURSE1,COURSE1_MORE_DETAILS,COURSE1_COMMENTS
..
PERM, STUDENT2, COURSE2,COURSE2_DETAILS,COURSE2_SCORE
..
PERM, STUDENT2, COURSE3,COURSE3_DETAILS,COURSE3_SCORE
..
..

So there are 2 types of students, TEMP and PERM.
Each Student will have a variable number of courses. For each course, there are anywhere between 2-4 rows in the input file

I would like to create two output files one for TEMP students and another one for PERM students, with data for each student and all his/her various course details in a columnar format


Code:
  STUDENT1, STUDENT2, STUDENT3 ......STUDENTn
  COURSE1:COURSE1_DETAILS, COURSE1:COURSE1_DETAILS....
  COURSE1:COURSE1_MORE_DETAILS, COURSE1:COURSE1_MORE_DETAILS ...
  ..
  COURSE2:COURSE2_DETAILS, COURSE2:COURSE2_DETAILS....
  COURSE2:COURSE2_MORE_DETAILS, COURSE2:COURSE2_MORE_DETAILS ...
  ..
  COURSEn:COURSEn_DETAILS, COURSEn:COURSEn_DETAILS....
  COURSEn:COURSEn_MORE_DETAILS, COURSEn:COURSEn_MORE_DETAILS ...

So the goal is to have one column in Excel for each student, with details about each of his courses listed on various rows for that particular student's column in the spreadsheet

Moderator's Comments:
Mod Comment Welcome to the UNIX and Linux Forums. Please use [code][/code] tags. Video tutorial on how to use them

Last edited by Scrutinizer; 05-03-2012 at 03:14 AM..
# 4  
Old 04-29-2012
Re: transpose the matrix

Then this is not a story to transpose the matrix and more complicated as the course number for the students may be different. And your output wants to combine every two original cols tagged with the course name.
Code:
TEMP, STUDENT1 A,something,Morething
TEMP, STUDENT1 B,anything,Morestuff
TEMP, STUDENT1 C,goodthing,Morebla
TEMP, STUDENT1 D,goodthing,Morebla2
TEMP, STUDENT2 A,something,Morething
TEMP, STUDENT2 B,anything,Morestuff
TEMP, STUDENT2 D,thing2,Morebla2

It seems to me you transfer the normalised table to an irregular one. Right?
Code:
      STUDENT1                              STUDENT2
A:something   A:Morething              A:something   A:Morething
B:anything    B:Morestuff              B:anything    B:Morestuff
C:goodthing   C:Morebla                D:thing2      D:Morebla2
D:goodthing2  D:Morebla2

Not a good way for your next processing, if you have. Probably you need think over what you really want.

Last edited by yifangt; 04-29-2012 at 05:39 PM.. Reason: Prune to make it better.
# 5  
Old 04-29-2012
First of all, appreciate your time in helping me out.

Apologies for not making it clear. Let me start all over again and to make it easy, lets make prune some additional details and lay down some hard restrictions :

1. Let's assume there is no more the student_type (TEMP or PERM) data point
2. Let's assume each student takes precisely 8 courses (which have same course names) and for each course there are there are 6 variable pieces of information separated by commas on a single row. So this means for each student, there will be precisely 8 rows in the input file, containing student_name, course_name followed by 6 pieces of variable information for each student.

Code:
STUDENT1, COURSE1, COURSE1_VAR1, COURSE1_VAR2,..COURSE1_VAR6
STUDENT1, COURSE2, COURSE2_VAR1, COURSE2_VAR2,..COURSE2_VAR6
..
..
STUDENT1, COURSE8, COURSE8_VAR1, COURSE8_VAR2,..COURSE8_VAR6
..
..
STUDENTn, COURSE1,COURSE1_VAR1, COURSE1_VAR2,..COURSE1_VAR6
STUDENTn, COURSE2,COURSE2_VAR1, COURSE2_VAR2,..COURSE2_VAR6
..
..
STUDENTn, COURSE8,COURSE8_VAR1, COURSE8_VAR2,..COURSE8_VAR6

In the output, ALL data for each student should be represented in a single column in Excel.


Code:
STUDENT1, STUDENT2, STUDENT3 ......STUDENTn
COURSE1_VAR1, COURSE1_VAR1,.. COURSE1_VAR1..
COURSE1_VAR2, COURSE1_VAR2,.. COURSE1_VAR2...
..
COURSE1_VAR6, COURSE1_VAR6,.. COURSE1_VAR6
COURSE2_VAR1, COURSE2_VAR1,.. COURSE2_VAR1..
COURSE2_VAR2, COURSE2_VAR2,.. COURSE2_VAR2...
..
COURSE2_VAR6, COURSE2_VAR6,.. COURSE2_VAR6
..
COURSE8_VAR1, COURSE8_VAR1,.. COURSE8_VAR1..
COURSE8_VAR2, COURSE8_VAR2,.. COURSE8_VAR2...
..
COURSE8_VAR6, COURSE8_VAR6,.. COURSE8_VAR6

So as you can see the COURSE1 thru COURSE8 have now become the row labels and STUDENT names have become the column labels. So it is kind of transpose of matrix, in some ways.

Once again I appreciate your time!

Last edited by Scrutinizer; 05-03-2012 at 03:16 AM.. Reason: code tags
# 6  
Old 04-30-2012
RE:

Good! It's clearer than last time, and the situation is simpler than what I thought .For simplicity I changed the description to string. e.g. CS111 = Student 1 + Course1 + variable1 and so on.
Code:
ST1,CS1,CS111,CS112,CS113,CS114,CS115,CS116
ST1,CS2,CS121,CS122,CS123,CS124,CS125,CS126
...
ST4,CS7,CS471,CS472,CS473,CS474,CS475,CS476
ST4,CS8,CS481,CS482,CS483,CS484,CS485,CS486

By copying radoulov's code
Code:
awk -F"," 'END { for (A in _) print A"," _[A] }
{ _[$1] = $1 in _ ? _[$1] FS $0 : $0 }' input.data > tmp.data

Then by acp's code as transpose.awk:
Code:
awk -f transpose.awk tmp.data > output.data

Code:
    ST1    ST2    ST3    ST4    
    ST1    ST2    ST3    ST4    
    CS1    CS1    CS1    CS1    
    CS111    CS211    CS311    CS411    
    CS112    CS212    CS312    CS412    
    ......   
    CS116    CS216    CS316    CS416    
    ST1    ST2    ST3    ST4    
    CS2    CS2    CS2    CS2    
    CS121    CS221    CS321    CS421  
......

Left the headers for each course and student for clarity, which can be removed easily in Excel.
This User Gave Thanks to yifangt For This Post:
# 7  
Old 05-03-2012
simply awesome!! you-da-man! works like a charm.

Many thanks for your time and help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a log file and creating a report script

The log file is huge and lot of information, i would like to parse and make a report . below is the log file looks like: REPORT DATE: Mon Aug 10 04:16:17 CDT 2017 SYSTEN VER: v1.3.0.9 TERMINAL TYPE: prod SYSTEM: nb11cu51 UPTIME: 04:16AM up 182 days 57 mins min MODEL, TYPE, and SN:... (8 Replies)
Discussion started by: amir07
8 Replies

2. Shell Programming and Scripting

Validating a datafile with the datatypes

I have two input files 1)datafile 2)metadata file. I have a metadata file like: field1datatypeformat1number2string3dateyy-mm-dd I have a data file like: 1234abc12-8-16 xyz234512-9-163456acd14-08-12 In the first row there is no correction as everything is inline with the metadata.... (3 Replies)
Discussion started by: bikky6
3 Replies

3. Shell Programming and Scripting

Parsing with Name value pair and creating a normalized file

I have url string as follows and I need to parse the name value pair into fields /rows event_id date time payload 1329130951 20120214 22.30.40... (1 Reply)
Discussion started by: smee
1 Replies

4. Shell Programming and Scripting

sorting the datafile in an order given in second datafile

Hi, I have two files: first input file is having 7-8 columns, and second data file is like I want to arrange my datafile1 in the order given in second data file, by comparing the seconddatafile with the second column of first file and print the entire line....also if any... (2 Replies)
Discussion started by: CAch
2 Replies

5. UNIX for Advanced & Expert Users

How do we know which processis creating a datafile

Hi, Is there any way we can find out which process is creating a partucular datafile.I know the user and group but i am just curios to know is there any way to find the process. Thanks (7 Replies)
Discussion started by: ukatru
7 Replies

6. Shell Programming and Scripting

Combine a datafile with Master datafile, emergent!

Hi guys, my supervisor has asked me to solve the problem in 7 days, I've taken 3 days to think about it but couldn't figure out any idea. Please give me some thoughts with the following problem, I have index.database that has only index date: 1994 1995 1996 1997 1998 1999 I have... (6 Replies)
Discussion started by: onthetopo
6 Replies

7. UNIX for Dummies Questions & Answers

Append filename to datafile

I am working on an shell script which checks for all the file starting with abc*.* and if file found then the filelines need to append the file name in begining can some one help with the filename appending... for i in `ls $filename*.csv` do echo $i --- NEED to append file name befor... (3 Replies)
Discussion started by: Satyagiri
3 Replies

8. Shell Programming and Scripting

selective positions from a datafile

Hi dear friends, Im writing a shell script which has to select the strings based on the position. but the problem is there is no field seperator. Normally a datafile contains 2000 records (lines) and each line is of size 500 charecters. I want to select the fields from all the lines which... (10 Replies)
Discussion started by: ganapati
10 Replies

9. Solaris

oracle datafile *dbf

Hi ,,,, I have move an oracle db from old server to a new server ( solaris 5.9 is the operating system ) my problem is that to new server the datafile ( *.dbf ) are in a different path ..... example old : /export/home/data/blobs ........... new /oracle/data/blobs....... how i can... (3 Replies)
Discussion started by: tt155
3 Replies

10. Shell Programming and Scripting

replace one section in a datafile

Hi: First, this is not a homework problem. I just need enough of a hint to get this going... My datafile (dataf.in) is made up of 10 sections. Each section begins with & and with && So it looks like this:------------------------------------- &section1 ...etc... && &section2 ...etc...... (4 Replies)
Discussion started by: Paprika
4 Replies
Login or Register to Ask a Question