Sponsored Content
Full Discussion: splitting a column into rows
Top Forums UNIX for Dummies Questions & Answers splitting a column into rows Post 302128397 by spindoctor on Tuesday 24th of July 2007 02:51:26 PM
Old 07-24-2007
Ultimately, this comes from a series of files which are one news story each.
A typical news story looks like this:

Document 4 of 6

Ours is a manufacturing province:[Final Edition]
Edmonton Journal. Edmonton, Alta.:Jan 2, 2002. p. A12

Document types: Business; Editorial

Section: Opinion

Publication title: Edmonton Journal. Edmonton, Alta.: Jan 2, 2002. pg. A.12

Source type: Newspaper

ProQuest document: 221806441


Text Word Count 559

Document URL: http://proquest.umi.com/
pqdweb?did=221806441&Fmt=3&clientId=14119&RQT=309&VName=PQD

Abstract (Document Summary)

The tax load on manufacturers in Edmonton is the lowest of all cities in North
America, according to a com ison by ICF Economic Consulting Group of San
Francisco, in a study for Economic Development Edmonton.

The EDE survey discovered that 80 per cent of advanced manufacturing companies
in Edmonton were founded here. So our city's economic growth may depend more
upon encouraging local entrepreneurs than upon attracting businesses from
elsewhere.

Allan Scott, EDE's incoming president, promises to pursue venture capital and
has suggested that small amounts of provincial or municipal government funds
might reasonably go into high-risk, high- return venture portfolios.

Full Text (559 words)

Copyright Southam Publications Inc. Jan 2, 2002

Premier Ralph Klein has accurately recognized the importance of manufacturing
to Alberta's economy.

Too often, we assume that our province depends only on energy prices, inviting
complacency when they are high, and gloom when they are low -- as they are now.

If prices stay low, "the only way we can make up the difference is if there is
a strong movement in the manufacturing sector," Klein said in a year-end
interview.

Fortunately, that sector has grown steadily over the past three decades.
Alberta manufacturing shipments have risen from $1.9 billion to $32.8 billion
from 1970 to 1998.

*************************

I'm at the stage where I'm extracting information into files to put into an excel spreadsheet. I received help from someone else in another thread and settled on using the egrep command. I would go through each file, and egrep the line that started with "ProQuest Document ID" and output that to a separate file. Then, I would egrep again and seek every line that started with, say, Publication title and output that to a different file. Then, I would import both into excel, and line both columns up so that the ProQuest Document ID line matched up with the *corresponding* publication title information in the next column.

That actually worked pretty well for most fields of data that I'm interested in. However, some of the fields that I'm interested in (Section and Document types) in particular, <b> do not appear in each file</b>. Therefore, my technique of egrepping each line wouldn't work, because not every file would have a line to egrep.

I compromised and developed an awk command:

awk ' BEGIN { FS = ":" } ; /^Document.types|^ProQuest document/ { print $2 } ' * >> ~/documents/dissertation/con/prime/newfile.txt

and that is how I got the column of data above. i was hoping to find a way then, to split the column into three columns where the data would nicely line up. However, I'm seeing this might be difficult.

Any suggestions that would work either with this column of data that I have here, or with the original news stories would work. I.e. I'd like to get the fields: ProQuest Document ID; Document Types; Section; and print them <b> in rows</b> - one row for each file - and not in a column.

By the way, I'm aware that some people may be frustrated with me as I hav eposted a numbber of times on the same project. Please understand, this my very first foray into programming and, for whatever it's worth, I have learned a shitload about unix and I'm getting much more independent at it. But I'm not ready to take the training wheels off just yet. Simple or annotated, explained scripts are welcome!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting file based on number of rows

Hi, I'm, new to shell scripting, I have a requirement where I have to split an incoming file into separate files each containing a maximum of 3 million rows. For e.g: if my incoming file say In.txt has 8 mn rows then I need to create 3 files, in which two will 3 mn rows and one will contain 2... (2 Replies)
Discussion started by: wahi80
2 Replies

2. Linux

Splitting a Text File by Rows

Hello, Please help me. I have hundreds of text files composed of several rows of information and I need to separate each row into a new text file. I was trying to figure out how to split the text file into different text files, based on each row of text in the original text file. Here is an... (2 Replies)
Discussion started by: dvdrevilla
2 Replies

3. Shell Programming and Scripting

Replicating rows by splitting column in text file

Hi, I have flat file with following format Col1, Col2, Col3, Col4 --------------------------------- r1_c1, r1_c2, r1_c3, abc | def | efg r2_c1, r2_c2, r2_c3, abcwdw | dweweef | efg | ijk r3_c1, r3_c2, r3_c3, abaac ........... The first three columns contain only one entry per... (3 Replies)
Discussion started by: nick2011
3 Replies

4. Shell Programming and Scripting

awk command to print only selected rows in a particular column specified by column name

Dear All, I have a data file input.csv like below. (Only five column shown here for example.) Data1,StepNo,Data2,Data3,Data4 2,1,3,4,5 3,1,5,6,7 3,2,4,5,6 5,3,5,5,6 From this I want the below output Data1,StepNo,Data2,Data3,Data4 2,1,3,4,5 3,1,5,6,7 where the second column... (4 Replies)
Discussion started by: ks_reddy
4 Replies

5. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

6. UNIX for Dummies Questions & Answers

[SOLVED] splitting a single column(with spaces) into multiple rows

Hi All, My requisite is to split a single column of phonemes seperated by spaces into multiple rows. my input file is: a dh u th a qn ch A v U r k my o/p should be like: adhu a dh u (3 Replies)
Discussion started by: girlofgenuine
3 Replies

7. UNIX for Dummies Questions & Answers

[Solved] Deleting all rows where the first column equals the second column

Hi, I have a tab delimited text file where the first two columns equal numbers. I want to delete all rows where the value in the first column equals the second column. How do I go about doing that? Thanks! Input: 1 1 ABC DEF 2 2 IJK LMN 1 2 ZYX OPW Output: 1 2 ZYX OPW (2 Replies)
Discussion started by: evelibertine
2 Replies

8. Shell Programming and Scripting

Need help in splitting the string to diff rows

Hi, I have file with values as below 1~ab~456~ac:bd:de:ef~yyyy-mm-dd 2~cd~458~af:fg:ty:er:ty:uj:io:~yyyy-mm-dd I want the o/p as for frist row 1~ab~456~ac~yyyy-mm-dd 1~ab~456~bd~yyyy-mm-dd 1~ab~456~de~yyyy-mm-dd 1~ab~456~ef~yyyy-mm-dd and for the second row 2~cd~458~af~yyyy-mm-dd... (4 Replies)
Discussion started by: rithushri
4 Replies

9. Shell Programming and Scripting

Converting Single Column into Multiple rows, but with strings to specific tab column

Dear fellows, I need your help. I'm trying to write a script to convert a single column into multiple rows. But it need to recognize the beginning of the string and set it to its specific Column number. Each Line (loop) begins with digit (RANGE). At this moment it's kind of working, but it... (6 Replies)
Discussion started by: AK47
6 Replies

10. Shell Programming and Scripting

Splitting delimited string into rows

Hi, I have a requirement that has 50-60 million records that we need to split a delimited string (Delimeter is newline) into rows. Source Date: SerialID UnidID GENRE 100 A11 AAAchar(10)BBB 200 B11 CCCchar(10)DDD(10)ZZZZ Field 'GENRE' is a string with new line as delimeter and not sure... (5 Replies)
Discussion started by: techmoris
5 Replies
All times are GMT -4. The time now is 10:17 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy