How join works and the specific parameters to my problem?

 
Thread Tools Search this Thread
Homework and Emergencies Homework & Coursework Questions How join works and the specific parameters to my problem?
# 1  
Old 04-13-2010
How join works and the specific parameters to my problem?

1. The problem statement, all variables and given/known data:
I have two files created from extracting data off of two CSV files, one containing class enrollment on a specific quarter and the other containing grades for that specific quarter. The Enrollment file generated contains course name, instructor and catalog number and a few other columns. The Grade file generated contains GPA and catalog number and a few other columns.

Both files have over 5k rows. I need to join the two files using the catalog number as the key field.

The Enrollment CSV file had some entries with Catalog No. missing, these were, in fact, subtotal entries and they were filtered out after the file was created.

2. Relevant commands, code, scripts, algorithms:
awk was used to generated the files using "|" as the field separator.

Example:
Course file
...
Catalog No. | ART 101 | instructor | Intro to Art
...

Grade file
...
Catalog No. | ART 101 | GPA
...


I need to use join to create a merged file of the two based on the Catalog No.

Example:
Merge file
...
Catalog No. | ART 101 | Instructor | Intro to Art | GPA
...


The original CSV files have the courses grouped by department (i.e. ART, CRIM, ENGR, CS, etc.)

3. The attempts at a solution (include all code and scripts):
I've attempted two approaches, one where both files are sorted based on Catalog No. and one without sorting.

Every relevant join parameters were used in an attempt to merge the two files but each time, I get either an empty merged file or it just contains one or the other file in it.

join -t"|" -o'1.1 1.2 1.3 2.1 2.3' Courses.txt Grades.txt > Merged.txt

empty

join -t"|" -o'1.1 1.2 1.3 2.1 2.3' Courses.txt Grades.txt > Merged.txt

Gave me: 10000|GEOL|150|EARTH REVEALED|Professor||||

The |||| were where the GPA and other stuff from Grades are supposed to be.

join -t"|" -1 1 -2 1 Courses.txt Grades.txt > Merged.txt
empty

join -t"|" -a1 Courses.txt Grades.txt > Merged.txt
Just Courses entry, none from Grades. Same thing with -a2.

-j was also used, as well as.

Keep in mind I sorted the files so that the catalog numbers would match line to line.

All the examples I've seen regarding how join works has the key fields in both files in order and correspond almost line-to-line so this sort of suggest that join does not search the entire file for matches.

California State University, Los Angeles, California, USA. Albert Cervantes, CS 345

You guys got strange requirements. Gee, where's Los Angeles? In Canada?[COLOR="#738fbf"]

Last edited by Lechnology; 04-14-2010 at 03:32 AM..
# 2  
Old 04-14-2010
Should I give up now or is your lack of response suggesting that even the experts here are trumped by my problem?
# 3  
Old 04-14-2010
Could you post some small representative sample input (not only the column headers) and the output you'd like to get given that input.
# 4  
Old 04-14-2010
Okay, is this what you mean:

Courses.txt
Code:
Cata#|Department|Crs#|Class name|Instructor
10015|GEOL    |158|NATURAL DISASTERS|Rezaie Boroon Mohammad Hassan|
10016|GEOL    |158|NATURAL DISASTERS|Hamane Angelique C.|
10020|ART     | 101A     |WORLD ART|Anderson Paul A|
10021|ART     | 101B     |WORLD ART|Aguilar-Moreno Jose Manuel|

from Grades.txt
Code:
Cata#|Dept|Crs#|GPA|A|B|C|D|E|F...etc.
10015|GEOL|158|1.64|2|2|1|7|2|11|26|9|17|12|5|13|||1|
10016|GEOL|158|1.93|13||3|19||3|40|1||9||22|||1|2
10020|ART|101A|1.19|8|5|1|16|7|7|11|8|3|13|6|77||||13
10021|ART|101B|2.59|38|19|15|17|9|8|16|4|4|9|5|12||||3

Using join, we are to generate:
Code:
10015|GEOL    |158|NATURAL DISASTERS|Rezaie Boroon Mohammad Hassan|1.64
10016|GEOL    |158|NATURAL DISASTERS|Hamane Angelique C.|1.93
10020|ART     | 101A     |WORLD ART|Anderson Paul A|1.19
10021|ART     | 101B     |WORLD ART|Aguilar-Moreno Jose Manuel|2.59

You'd think that join -1 1 -2 1 Courses.txt Grades would match field 1 of Courses and field 1 of Grades easily, but I keep getting an empty file when I output it.

To clarify, we're trying to get the professor's name and the average GPA they had for the class they taught.

Last edited by radoulov; 04-14-2010 at 08:55 AM.. Reason: Code tags, please!
# 5  
Old 04-14-2010
You were very nearly there.

The input files are not sorted on the first field "Catalogue number". If they were sorted, the heading line would be at the bottom.

You need to either remove the heading line before trying the join or sort the files with unix "sort" which will force the heading line to the bottom.

After sorting the files and twiddling with the output fields.

Code:
join -t"|" -o 1.1 1.2 1.3 1.4 1.5 2.4 Courses2.txt Grades2.txt

10015|GEOL    |158|NATURAL DISASTERS|Rezaie Boroon Mohammad Hassan|1.64
10016|GEOL    |158|NATURAL DISASTERS|Hamane Angelique C.|1.93
10020|ART     | 101A     |WORLD ART|Anderson Paul A|1.19
10021|ART     | 101B     |WORLD ART|Aguilar-Moreno Jose Manuel|2.59
Cata#|Department|Crs#|Class name|Instructor|GPA

# 6  
Old 04-14-2010
Actually, there are no heading lines. I just added them here for clarification what each field was for (then radoulov included them in the code tags).

I'll try you solution and get back to you.

Update: Nope, merge file shows this
Code:
10015|GEOL    |158|NATURAL DISASTERS|Rezaie Boroon Mohammad Hassan|
10016|GEOL    |158|NATURAL DISASTERS|Hamane Angelique C.|
10020|ART     | 101A     |WORLD ART|Anderson Paul A|
10021|ART     | 101B     |WORLD ART|Aguilar-Moreno Jose Manuel|

Which is just the Courses.txt's content, nothing from Grades.txt.

Perhaps I should backtrack to when I generated the file? It shouldn't matter whether I used awk, sed or something else to create the files and fields, right?

Last edited by Lechnology; 04-14-2010 at 12:55 PM..
# 7  
Old 04-14-2010
Do your command output parameters match mine?

Code:
-o 1.1 1.2 1.3 1.4 1.5 2.4

The grade field is 2.4 (i.e. file 2 field 4). Your original didn't have 2.4 .
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

How to join Linux pc to active directory in specific ou with authconfig?

I use authconfig command to join linux computers to AD How do I specify which OU they go in? I llook at authconfig --help but not see antyhing\ Can I use --ldapbasedn=<dn> to select what OU it should go iN (0 Replies)
Discussion started by: red888
0 Replies

2. Shell Programming and Scripting

Replacing whole string starting with specific works

Hi guys, So what I am trying to accomplish is to replace a whole string starting with some designated string. eg: When even I find a string starting with : eai.endpoint.url= replace the entire line with: eai.endpoint.url=http://www.endpoint.com/API Righ now I am trying to... (4 Replies)
Discussion started by: Junaid Subhani
4 Replies

3. Solaris

How to Limit ftp access parameters for specific users?

Dear friends, :) I create new user useradd -g other -d /export/home/sltftp -m -s /bin/bash -c "SLT user account for TMA ftp backup" sltftp now i need do restrict thees chmod delete overwrite rename from this user:(for all the files in the server ,sltftp user can only able to download... (4 Replies)
Discussion started by: darakas
4 Replies

4. UNIX for Dummies Questions & Answers

Join Lines at a specific point

Hi I'm a beginner, and i've been having trouble joining two lines. I need to convert this file 1097ALABAMA Mobile County METHOMYL INSE CTICIDES 6 1.6200000E+00 1.8000001E+00 1003ALABAMA Baldwin County ... (5 Replies)
Discussion started by: kf_1434
5 Replies

5. Shell Programming and Scripting

how to retrieve specific parameters using a xml tag

Hi, I have the following code in my xml file: <aaaRule loginIdPattern=".*" orgIdPattern=".*" deny="false" /> <aaaRuleGroup name="dpaas"> <aaaRule loginIdPattern=".*" orgIdPattern=".*" deny="false" /> I want to retrieve orgIdPattern and loginIdPattern parameter value based on... (2 Replies)
Discussion started by: mjavalkar
2 Replies

6. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

7. UNIX for Dummies Questions & Answers

problem with join

So I want to join two files that have a lot of rows The file named gen1 has 2 columns: head gen1 1008567 0.4026931012 1119535 0.7088912314 1120590 0.7093805634 1145994 0.7287952590 1148140 0.7313924434 1155173 0.7359550430 1188481 0.7598914553 1201155 0.7663406553 1206921... (2 Replies)
Discussion started by: peanuts48
2 Replies

8. Programming

Problem with Pipes => Only works first pipe

Hi! I'm having problems with pipes... I need comunnications with childs processes and parents, but only one child can comunnicate with parent (first child), others childs can't. A brief of code: if(pipe(client1r)<0){ perror("pipe"); } ... (1 Reply)
Discussion started by: serpens11
1 Replies

9. Shell Programming and Scripting

A join problem?

Hi everybody, I am hoping somebody here will be either be able to solve my troubles or at least give me a push in the right direction :) . I am developing a shell script to read in 4 different files worth of data that each contain a list of: username firstname secondname group score I... (2 Replies)
Discussion started by: jamjamjammie
2 Replies

10. UNIX for Dummies Questions & Answers

Plink problem....only works if passwd is in the script

I have a one line bat script run off a XP machine that tar's and compresses some files from a Sol 8 box. It goes something like this (a bit simplified).... plink -pw <passwd> user@host "tar -cvf - -C / tmp/a_file | compress " > a_file.tar.Z So this works....and it's worked many times. But now... (3 Replies)
Discussion started by: Yinzer955i
3 Replies
Login or Register to Ask a Question