Sponsored Content
Full Discussion: Transpose Messy Data
Top Forums UNIX for Advanced & Expert Users Transpose Messy Data Post 302943019 by Don Cragun on Monday 4th of May 2015 04:15:18 PM
Old 05-04-2015
Quote:
Originally Posted by 91674io
I have a messy, pipe-delimited ("|") input dataset.

I would like to create a file of ID plus each component of field 4 which is delimited by ";" into a long, skinny shape for easier processing.

A couple of complications are that field 4 may contain both commas and linefeed characters from the source.

Sample data looks like:

Code:
ID1|VAR2|VAR3|VAR4|VAR5
ID2|VAR2|VAR3|PART1;PART2|1;2
ID3|VAR2|VAR3|A, B, C;PART2;BEFORE LF\nAFTER LF|1;2;3
ID4|VAR2|VAR3|1;2;3,;4|1;2;3;4

I would something like data like:

I
Code:
D1|VAR4
ID2|PART1
ID2|PART2
ID3|A, B, C
ID3|PART2
ID3|BEFORE LF  AFTER LF
ID4|1
ID4|2
ID4|3
ID4|4

Is there an elegant way to do this at the command line?

Thanks!
What have you tried to solve this problem?

I don't see anything in your description that explains why the transformations shown in red above happened. What input characters are supposed to be changed to spaces in the output? (The string "\n" is not a linefeed character, but it can be used in a format string to cause some programs to print a linefeed character.) What input characters are supposed to be deleted from the output?
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to transpose data elements in awk

Hi, I have an input data file :- Test4599,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,2,2,Rain Test90,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,Not Rain etc.... I wanted to transpose these data to:-... (2 Replies)
Discussion started by: ahjiefreak
2 Replies

2. Shell Programming and Scripting

How to transpose a table of data using awk

Hi. I have this data below:- v1 28 14 1.72414 1.72414 1.72414 1.72414 1.72414 v2 77 7 7.47126 6.89655 6.89655 6.89655 6.89655 v3 156 3 21.2644 21.2644 20.6897 21.2644 20.6897 v4 39 3 1.72414 1.72414 1.72414 1.72414 1.72414 v5 155 1 21.2644 23.5632 24.1379 23.5632 24.1379 v6 62 2 2.87356... (2 Replies)
Discussion started by: ahjiefreak
2 Replies

3. Shell Programming and Scripting

Transpose columns to Rows : Big data

Hi, I did read a few posts on the subjects, tried out a few solutions, but did not solve my problem. https://www.unix.com/302121568-post11.html https://www.unix.com/shell-programming-scripting/137953-large-file-columns-into-rows-etc-4.html Please help. Problem very similar to the second link... (15 Replies)
Discussion started by: genehunter
15 Replies

4. Shell Programming and Scripting

Transpose Daily Data from Column to Row.

Hi I'm looking to transpose Linux data from a daily report that logs every 10mins like below. After the first "comma" I need the daily total for Col2 and Col3 transposed like below. The new transposed format below will then be exported to Microsoft Excel for Reporting. Any help would be... (9 Replies)
Discussion started by: ravzter
9 Replies

5. Shell Programming and Scripting

Transpose Data from Columns to rows

Hello. very new to shell scripting and would like to know if anyone could help me. I have data thats being pulled into a txt file and currently have to manually transpose the data which is taking a long time to do. here is what the data looks like. Server1 -- Date -- Other -- value... (7 Replies)
Discussion started by: Mikes88
7 Replies

6. Shell Programming and Scripting

Transpose Column of Data to Rows

I can no longer find my commands, but I use to be able to transpose data with common fields from a single column to rows using a command line. My data is separated as follows: NAME=BOB ADDRESS=COLORADO PET=CAT NAME=SUSAN ADDRESS=TEXAS PET=BIRD NAME=TOM ADDRESS=UTAH PET=DOG I would... (7 Replies)
Discussion started by: docdave78
7 Replies

7. Shell Programming and Scripting

Transpose data as rows using awk

Hi I have below requirement, need help One file contains the meta data information and other file would have the data, match the column from file1 and with file2 and extract corresponding column value and display in another file File1: CUSTTYPECD COSTCENTER FNAME LNAME SERVICELVL ... (1 Reply)
Discussion started by: ravlapo
1 Replies

8. Shell Programming and Scripting

Help with transpose data content

Hi, Below is my input file: c116_g1_i1 -,-,-,+ c118_g2_i1 +,+ c118_g3_i1 + c120_g1_i1 +,+,+,+ . . Desired Output File c116_g1_i1 - c116_g1_i1 - c116_g1_i1 - c116_g1_i1 + c118_g2_i1 + c118_g2_i1 + (3 Replies)
Discussion started by: perl_beginner
3 Replies

9. UNIX for Beginners Questions & Answers

Transpose the data

Hi All, I have sort of a case to transpose data from rows to column input data Afghanistan|10000|1 Albania|25000|4 Algeria|25000|7 Andorra|10000|4 Angola|25000|47 Antigua and Barbuda|25000|23 Argentina|5000|3 Armenia|100000|12 Aruba|20000|2 Australia|50000|2 I need to transpose... (3 Replies)
Discussion started by: radius
3 Replies

10. UNIX for Beginners Questions & Answers

Transpose large data in UNIX

Hi I have the following sample of data: my full data dimention is 900,000* 1119 rs987435 C G 1 1 1 0 2 rs345783 C G 0 0 1 0 0 rs955894 G T 1 1 2 2 1 rs6088791 ... (7 Replies)
Discussion started by: marwah
7 Replies
RS(1)							    BSD General Commands Manual 						     RS(1)

NAME
rs -- reshape a data array SYNOPSIS
rs [-CcSs [x]] [-GgKkw N] [-EeHhjmnTty] [rows [cols]] DESCRIPTION
rs reads the standard input, interpreting each line as a row of blank-separated entries in an array, transforms the array according to the options, and writes it on the standard output. With no arguments it transforms stream input into a columnar format convenient for terminal viewing. The shape of the input array is deduced from the number of lines and the number of columns on the first line. If that shape is inconvenient, a more useful one might be obtained by skipping some of the input with the -k option. Other options control interpretation of the input col- umns. The shape of the output array is influenced by the rows and cols specifications, which should be positive integers. If only one of them is a positive integer, rs computes a value for the other which will accommodate all of the data. When necessary, missing data are supplied in a manner specified by the options and surplus data are deleted. There are options to control presentation of the output columns, including transposition of the rows and columns. The options are described below. -C [x] Output columns are delimited by the single character x. A missing x is taken to be '^I'. -c [x] Input columns are delimited by the single character x. A missing x is taken to be '^I'. -e Consider each line of input as an array entry. -G N The gutter width (inter-column space) has N percent of the maximum column width added to it. -g N The gutter width (inter-column space), normally 2, is taken to be N. -H Like -h, but also print the length of each line. -h Print the shape of the input array and do nothing else. The shape is just the number of lines and the number of entries on the first line. -j Right adjust entries within columns. -K N Like -k, but print the ignored lines. -k N Ignore the first N lines of input. -m Do not trim excess delimiters from the ends of the output array. -n On lines having fewer entries than the first line, use null entries to pad out the line. Normally, missing entries are taken from the next line of input. -S [x] Like -C, but padded strings of x are delimiters. -s [x] Like -c, but maximal strings of x are delimiters. -T Print the pure transpose of the input, ignoring any rows or cols specification. -t Fill in the rows of the output array using the columns of the input array, that is, transpose the input while honoring any rows and cols specifications. -w N The width of the display, normally 80, is taken to be the positive integer N. -y If there are too few entries to make up the output dimensions, pad the output by recycling the input from the beginning. Nor- mally, the output is padded with blanks. -z Adapt column widths to fit the largest entries appearing in them. With no arguments, rs transposes its input, and assumes one array entry per input line unless the first non-ignored line is longer than the display width. Option letters which take numerical arguments interpret a missing number as zero unless otherwise indicated. EXAMPLES
rs can be used as a filter to convert the stream output of certain programs (e.g., spell(1), du(1), file(1), look(1), nm(1), who(1), and wc(1)) into a convenient ``window'' format, as in who | rs This function has been incorporated into the ls(1) program, though for most programs with similar output rs suffices. To convert stream input into vector output and back again, use rs 1 0 | rs 0 1 A 10 by 10 array of random numbers from 1 to 100 and its transpose can be generated with jot -r 100 | rs 10 10 | tee array | rs -T > tarray In the editor vi(1), a file consisting of a multi-line vector with 9 elements per line can undergo insertions and deletions, and then be neatly reshaped into 9 columns with :1,$!rs 0 9 Finally, to sort a database by the first line of each 4-line field, try rs -eC 0 4 | sort | rs -c 0 1 SEE ALSO
jot(1), pr(1), sort(1), vi(1) BUGS
Handles only two dimensional arrays. The algorithm currently reads the whole file into memory, so files that do not fit in memory will not be reshaped. Fields cannot be defined yet on character positions. Re-ordering of columns is not yet possible. There are too many options. BSD
December 18, 2001 BSD
All times are GMT -4. The time now is 10:38 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy