Transpose columns to Rows : Big data Post: 302443633

Sponsored Content

Top Forums Shell Programming and Scripting Transpose columns to Rows : Big data Post 302443633 by genehunter on Monday 9th of August 2010 04:44:05 PM

08-09-2010

Registered User

Transpose columns to Rows : Big data

Hi,
I did read a few posts on the subjects, tried out a few solutions, but did not solve my problem.
https://www.unix.com/302121568-post11.html
https://www.unix.com/shell-programmin...ows-etc-4.html

Please help. Problem very similar to the second link poster, but slighlt different input format. The field separator is space . The actual data matrix is a file with 2000 rows and 600,000 columns.
Input style:

Code:

IID    PAT    MAT    SEX    PHENOTYPE    rs15286_1    rs319_1    rs80300_1    rs40777_1    rs8597_1    rs5136_1    rs60595_1    rs64968_1    rs4405_1    rs1554_1
TD-MIKV    0 0 2 1 1 0 0 1 0 1 0 1 1 0
TD-HA4Q 0 0 2 1 1 0 0 0 0 0 0 0 0 0
TD-H9ZG 0 0 2 2 0 0 0 1 0 0 0 0 0 0
TD-HAQX 0 0 2 1 0 0 0 2 0 0 0 0 0 0
TD-HA5E 0 0 2 2 0 1 1 1 0 0 0 1 1 0
TD-MGFV 0 0 2 2 1 0 0 0 0 NA 0 0 0 1
TD-HB4V 0 0 2 1 0 0 1 0 1 NA 0 1 1 0
TD-MIPE 0 0 2 2 0 0 0 1 0 0 0 0 0 0
TD-MINR 0 0 2 2 0 0 0 0 0 2 0 1 1 0

Output style

Code:

   IID TD-MIKV TD-HA4Q TD-H9ZG TD-HAQX TD-HA5E TD-MGFV TD-HB4V TD-MIPE TD-MINR
PAT 0 0 0 0 0 0 0 0 0
MAT 0 0 0 0 0 0 0 0 0
SEX 2 2 2 2 2 2 2 2 2
PHENOTYPE 1 1 2 1 2 2 1 2 2
rs15286_1 1 1 0 0 0 1 0 0 0
rs319_1 0 0 0 0 1 0 0 0 0
rs80300_1 0 0 0 0 1 0 1 0 0
rs40777_1 1 0 1 2 1 0 0 1 0
rs8597_1 0 0 0 0 0 0 1 0 0
rs5136_1 1 0 0 0 0 NA NA 0 2
rs60595_1 0 0 0 0 0 0 0 0 0
rs64968_1 1 0 0 0 1 0 1 0 1
rs4405_1 1 0 0 0 1 0 1 0 1
rs1554_1 0 0 0 0 0 1 0 0 0

awk or python preferable, since I understand them a teeny weeny bit.
Thanks in advance,
Regards
~GH

genehunter

View Public Profile for genehunter

Find all posts by genehunter

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Rows to Columns - File Transpose

Hi I have an input file and I want to transpose it but I need to take care that if any field is missing for a record it should be popoulated with space for that field - using a shell script INFILE ---------- emp=1 sal=2 loc=abc emp=2 sal=21 sal=22 loc=xyz emp=5 loc=abc OUTFILE...

2. Shell Programming and Scripting

Transpose Rows Into Columns

I'm aware there are a lot of resources dedicated to the question of transposing rows and columns, but I'm a total newbie at this and the task appears to be beyond me. I have 40 text files with content that looks like this: Dokument 1 von 146 Orange County Register (California) June 26, 2010...

3. Shell Programming and Scripting

Transpose columns to Rows

I have a data A 1 B 2 C 3 D 4 E 5 i would like to change the data A B C D E 1 2 3 4 5 Pls suggest how we can do it in UNIX. Start using code tags, thanks. Also start reading your PM's you get from Mods as well read the Forum Rules. That might not do any harm.

4. Shell Programming and Scripting

transpose rows to columns

Any tips on how I can awk the input data to display the desired output per below? Thanking you in advance. input test data: 2 2010-02-16 10:00:00 111111111111 bytes 99999999999 bytes 90% 4 2010-02-16 12:00:00 333333333333 bytes 77777777777 bytes 88% 5 2010-02-16 11:00:00...

5. Shell Programming and Scripting

Transpose Data from Columns to rows

Hello. very new to shell scripting and would like to know if anyone could help me. I have data thats being pulled into a txt file and currently have to manually transpose the data which is taking a long time to do. here is what the data looks like. Server1 -- Date -- Other -- value...

6. Shell Programming and Scripting

Columns to Rows - Transpose - Special Condition

Hi Friends, Hope all is well. I have an input file like this a gene1 10 b gene1 2 c gene2 20 c gene3 10 d gene4 5 e gene5 6 Steps to reach output. 1. Print unique values of column1 as column of the matrix, which will be a b c

7. Shell Programming and Scripting

awk to transpose every 7 rows into columns

input: a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 .. b7 .. z1 .. z7

8. Shell Programming and Scripting

Transpose rows to columns complex

Input: IN,A,1 IN,B,3 IN,B,2 IN,C,7 BR,A,1 BR,A,5 BR,C,9 AR,C,9 Output: CNTRY,A,B,C IN,1,5,7 BR,6,0,9 AR,0,0,9

9. Shell Programming and Scripting

Transpose comma delimited data in rows to columns

Hello, I have a bilingual database with the following structure a,b,c=d,e,f The right half is in a Left to right script and the second is in a Right to left script as the examples below show What I need is to separate out the database such that the first word on the left hand matches the first...

10. UNIX for Beginners Questions & Answers

Transpose rows to certain columns

Hello, I have the following data and I want to use awk to transpose each value to a certain column , so in case the value is not available the column should be empty. Example: Box Name: BoxA Weight: 1 Length :2 Depth :3 Color: red Box Name: BoxB Weight: 3 Length :4 Color: Yellow...

LEARN ABOUT DEBIAN

fsvs-url-format

FSVS - URL format(5)						       fsvs						      FSVS - URL format(5)

NAME

       Format of URLs -

       FSVS can use more than one URL; the given URLs are overlaid according to their priority.  FSVS can use more than one URL; the given URLs
       are overlaid according to their priority.

       For easier managing they get a name, and can optionally take a target revision.

       Such an extended URL has the form

	  ['name:'{name},]['target:'{t-rev},]['prio:'{prio},]URL

	where URL is a standard URL known by subversion -- something like http://...., svn://... or svn+ssh://....

       The arguments before the URL are optional and can be in any order; the URL must be last.

       Example:

	  name:perl,prio:5,svn://...

	or, using abbreviations,

	  N:perl,P:5,T:324,svn://...

       Please mind that the full syntax is in lower case, whereas the abbreviations are capitalized!
	Internally the : is looked for, and if the part before this character is a known keyword, it is used.
	As soon as we find an unknown keyword we treat it as an URL, ie. stop processing.

       The priority is in reverse numeric order - the lower the number, the higher the priority. (See url__current_has_precedence() )

Why a priority?
       When we have to overlay several URLs, we have to know which URL takes precedence - in case the same entry is in more than one. (Which is
       not recommended!)

Why a name?
       We need a name, so that the user can say 'commit all outstanding
	changes to the repository at URL x', without having to remember the full URL. After all, this URL should already be known, as there's a
       list of URLs to update from.

       You should only use alphanumeric characters and the underscore here; or, in other words, w or [a-zA-Z0-9_]. (Whitespace, comma and
       semicolon get used as separators.)

What can I do with the target revision?
       Using the target revision you can tell fsvs that it should use the given revision number as destination revision - so update would go
       there, but not further. Please note that the given revision number overrides the -r parameter; this sets the destination for all URLs.

       The default target is HEAD.

       Note:
	   In subversion you can enter URL@revision - this syntax may be implemented in fsvs too. (But it has the problem, that as soon as you
	   have a @ in the URL, you must give the target revision every time!)

There's an additional internal number - why that?
       This internal number is not for use by the user.
	It is just used to have an unique identifier for an URL, without using the full string.

       On my system the package names are on average 12.3 characters long (1024 packages with 12629 bytes, including newline):

	  COLUMNS=200 dpkg-query -l | cut -c5- | cut -f1 -d' ' | wc

       So if we store an id of the url instead of the name, we have approx. 4 bytes per entry (length of strings of numbers from 1 to 1024).
       Whereas using the needs name 12.3 characters, that's a difference of 8.3 per entry.

       Multiplied with 150 000 entries we get about 1MB difference in filesize of the dir-file. Not really small ...
	And using the whole URL would inflate that much more.

       Currently we use about 92 bytes per entry. So we'd (unnecessarily) increase the size by about 10%.

       That's why there's an url_t::internal_number.

Author
       Generated automatically by Doxygen for fsvs from the source code.

Version trunk:2424						    11 Mar 2010 					      FSVS - URL format(5)