Extract columns based on the first line of each column


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Extract columns based on the first line of each column
# 1  
Old 11-12-2014
Extract columns based on the first line of each column

Sorry to bother you guys again. I have a file1 with multiple columns like this:
Code:
gga_miR_100	gga_miR_300	gga_miR_3500	gga_miR_4600	gga_miR_5600	gga_miR_30	gga_miR_500
kj	rwg	ghhh	jy	jyu	we	vf
5g	5hg	h6	56h	i8	45t	44r4
4bg	4r546	9lgtr	(fer)	4fr	f433	
	3feev	f4	bf4	35g		
		vfr		ge		
		2rr

and I also have a file2 have multiple lines like this:
Code:
gga_miR_3500
gga_miR_4600
gga_miR_500
gga_miR_30

How can I extract the columns in file1 if the first line of each column in file1 exactly match the lines in file2, so I am expecting the output file is like this:
Code:
gga_miR_3500	gga_miR_4600	gga_miR_30	gga_miR_500
ghhh	jy	we	vf
h6	56h	45t	44r4
9lgtr	(fer)	f433	
f4	bf4		
vfr			
2rr

# 2  
Old 11-12-2014
No attempts from your side? Pity...

Try
Code:
awk     'FNR==NR        {H[NR]=$1; MX=NR; next}
         FNR==1         {for (j=1; j<=MX; j++) for (i=1; i<=NF; i++) if (H[j]==$i) C[j]=i}
                        {for (i=1; i<=MX; i++) printf "%s\t", $C[i]
                         printf "\n"}
        ' file2 FS="\t" file1
gga_miR_3500    gga_miR_4600    gga_miR_500     gga_miR_30
ghhh    jy      vf      we
h6      56h     44r4    45t
9lgtr   (fer)           f433
f4      bf4
vfr
2rr

Please note that your columns are NOT in the sequence in which they appear in file2!
This User Gave Thanks to RudiC For This Post:
# 3  
Old 11-12-2014
Quote:
Originally Posted by RudiC
No attempts from your side? Pity...

Try
Code:
awk     'FNR==NR        {H[NR]=$1; MX=NR; next}
         FNR==1         {for (j=1; j<=MX; j++) for (i=1; i<=NF; i++) if (H[j]==$i) C[j]=i}
                        {for (i=1; i<=MX; i++) printf "%s\t", $C[i]
                         printf "\n"}
        ' file2 FS="\t" file1
gga_miR_3500    gga_miR_4600    gga_miR_500     gga_miR_30
ghhh    jy      vf      we
h6      56h     44r4    45t
9lgtr   (fer)           f433
f4      bf4
vfr
2rr

Please note that your columns are NOT in the sequence in which they appear in file2!
Thank you very much RudiC. Actually I am an embryology biologist and I don't have any computer science background and I am a totally rookie. I have to deal with some very large data set from the experimental work recently and I don't know how to cope with them. I learned that some unix command can help me to do that, then I tried to use some very simple command like such as sed or awk to do some basic work and it is really amazing for me. However, I don't have ability to deal with the more complicated problem even these problems seems very naive for you computer guys. So that's why I post many questions on this forum recently. I wish I can figure out some problems but I really don't have the ability right now. Actually I am studying some programming skills by myself now and I hope I can also help some other people like me in the future. I raelly appreciate anyone helped me.
# 4  
Old 11-12-2014
A nice brain teaser!
The following keeps the order of columns in file1:
Code:
awk '
BEGIN {FS="\t"}
FNR==NR {Ctext[$1]; next}
FNR==1 {
  for (i=NF;i>=1;i--) if ($i in Ctext) {C[i]}
}
{
  for (i=1;i<=NF;i++) if (i in C) {printf "%s", sep$i; sep=FS}
  print sep=""
}
' file2 file1

While the following remaps the order according to file2, like Rudi's solution:
Code:
awk '
BEGIN {FS="\t"}
FNR==NR {Ctext[$1]=NR; next}
FNR==1 {
  for (i=NF;i>=1;i--) if ($i in Ctext) {C[Ctext[$i]]=i}
}
{
  for (i=1;(i in C);i++) {printf "%s", sep$C[i]; sep=FS}
  print sep=""
}
' file2 file1


Last edited by MadeInGermany; 11-12-2014 at 05:21 PM.. Reason: simplified solution1
This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 11-12-2014
Quote:
Originally Posted by yuejian
Thank you very much RudiC. Actually I am an embryology biologist and I don't have any computer science background and I am a totally rookie. I have to deal with some very large data set from the experimental work recently and I don't know how to cope with them. I learned that some unix command can help me to do that, then I tried to use some very simple command like such as sed or awk to do some basic work and it is really amazing for me. However, I don't have ability to deal with the more complicated problem even these problems seems very naive for you computer guys. So that's why I post many questions on this forum recently. I wish I can figure out some problems but I really don't have the ability right now. Actually I am studying some programming skills by myself now and I hope I can also help some other people like me in the future. I raelly appreciate anyone helped me.
This is all well and we try to help as much as we can. The reason why most of us insist that a genuine effort in solving the problem by oneself is made is because we try to help you to help yourself. Most of us, me included, are professionals and we get paid to do for customers what we help each other to achieve here. The big dfference between writing here and doing work for a customer is: i do not expect my customer to spend any effort - i expect him to pay me for doing his work instead. Here it is the other way round. I do not expect to get paid - not in the narrower sense, obviously, but not even in a virtual sense, like "being paid by recognition" or using what i do here to advertise my skills, but on the other hand i expect my vis-a-vis to spend some effort to digest what i try explain to him and to try to learn what i explain.

Otherwise i have to ask myself why i charge my customer some money for my work when i do it for free here.

But there is - or at least should be - another reason for you to invest effort in solving your problems: whenever you work using a certain tool there is a dialectical connection between the work and the tool: the tools is shaped by the work you expect it to do but also your work is shaped by the tools you use. You sure know the saying that if you only have a hammer everything starts to look like a nail. This the tool shaping the work and the way the work is done. So, when you do your work using a computer as a tool you will not only be able to do it better when you are more competent with it but chances are you will be changing the way you work and you may find new applications to the tool or ways of applying it which you are not aware of right now.

After so much philosophical contemplation here is a practical suggestion for you: most of your problems seem to involve "data massaging" of one sort or the other. Most (perhaps all) of what you do can be done using awk and/or sed. There is a phantastic book about exactly these two tools: it is written by Dale Dougherty and i can wholeheartedly suggest it. It is a good and diverting read while being the best and most complete single source about the topic i ever read.

bakunin
These 2 Users Gave Thanks to bakunin For This Post:
# 6  
Old 11-13-2014
Hi.

Perhaps this will be useful:
Quote:
Beginning Perl for Bioinformatics
An Introduction to Perl for Biologists
By James Tisdall
Publisher: O'Reilly Media
Final Release Date: October 2001
Pages: 386
3.8Read 8 ReviewsWrite a Review
With its highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis. But if you're a biologist with little or no programming experience, starting out in Perl can be a challenge. Many biologists have a difficult time learning how to apply the language to bioinformatics. The most popular Perl programming books are often too theoretical and too focused on computer science for a non-programming biologist who needs to solve very specific problems.Beginning Perl for Bioinformatics is designed to get you quickly over the Perl language barrier by approaching programming as an important new laboratory skill, revealing Perl programs and techniques that are immediately useful in the lab ...
More at Beginning Perl for Bioinformatics - O'Reilly Media , however, it looks a bit old me ... cheers, drl
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert Columns before the last Column based on the Count of Delimiters

Hi, I have a requirement where in I need to insert delimiters before the last column of the total delimiters is less than a specified number. Say if the delimiters is less than 139, I need to insert 2 columns ( with blanks) before the last field awk -F 'Ç' '{ if (NF-1 < 139)} END { "Insert 2... (5 Replies)
Discussion started by: arunkesi
5 Replies

2. Shell Programming and Scripting

Extract columns based on header

Hi to all, I have two files. File1 has no header, two columns: sample1 A sample2 B sample3 B sample4 C sample5 A sample6 D sample7 D File2 has a header, except for the first 3 columns (chr,start,end). "sample1" is the header for the 4th ,5th ,6th columns, "sample2" is the header... (4 Replies)
Discussion started by: aec
4 Replies

3. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

4. Shell Programming and Scripting

HELP: Shell Script to read a Log file line by line and extract Info based on KEYWORDS matching

I have a LOG file which looks like this Import started at: Mon Jul 23 02:13:01 EDT 2012 Initialization completed in 2.146 seconds. -------------------------------------------------------------------------------- -- Import summary for Import item: PolicyInformation... (8 Replies)
Discussion started by: biztank
8 Replies

5. Shell Programming and Scripting

displaying columns based on column name

Hello, I have a huge file with many columns . I want to use the names of the columns to print columns to another file. for example file.txt COLA COLB COLC COLD 1 2 3 5 3 5 6 9 If I give name for multiple columns to the code:... (5 Replies)
Discussion started by: ryan9011
5 Replies

6. Emergency UNIX and Linux Support

[Solved] Extract records based on a repeated column value

Hi guys, I need help in making a command to find some data. I have multiple files in which multiple records are present.. Each record is separated with a carriage return and in each record there are multiple fields with each field separated by "|" what i want is that I want to extract... (1 Reply)
Discussion started by: m_usmanayub
1 Replies

7. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8 (2 Replies)
Discussion started by: jjoe
2 Replies

8. Shell Programming and Scripting

extract csv based on column value

Hi I have a csv file which is below A,5 B,6 C,10 D,7 I want the values who's second column is greater than 7 say C,10 D,7 Help me please... Thanks, Maruth (3 Replies)
Discussion started by: maruthavanan
3 Replies

9. UNIX for Advanced & Expert Users

Grep all the columns based on a particular column

This is the data file that I have 21879, 0, 22, 58, 388 0, -1, 300, 1219172589765, 1708, 0, 200, 21891, 0, 0, 33, 309 0, -1, 300, 1219172591478, 1768, 0, 200, 22505, 0, 0, 33, 339 0, -1, 300, 1219172593251, 1738, 0, 200, 21888, 0, 1, 33, 308 0, -1, 300, 1219172594995, 633, 0, 200, 24878,... (2 Replies)
Discussion started by: pmallur
2 Replies

10. UNIX for Dummies Questions & Answers

extract column based on name

I need to extract a column from a tab delimited text file based on the string in the first row. (i.e. extract the column labeled "reaction_time"). I know that this can be done with awk by determining the column #, but I want to do it based on a search for the string b/c the column number changes... (3 Replies)
Discussion started by: t27
3 Replies
Login or Register to Ask a Question