Merging strings that have identical rownames in a dataframe


 
Thread Tools Search this Thread
Operating Systems Linux Ubuntu Merging strings that have identical rownames in a dataframe
# 1  
Old 01-08-2014
Merging strings that have identical rownames in a dataframe

Hi

I have a data frame with repeated names in column 1, and different descriptors in column 2. I want to merge/cat strings that have same entry in column 1 into one row with any separator.

Example for input:

Code:
Cvel_1        KOG0155
Cvel_1        KOG0306
Cvel_1        KOG3259
Cvel_1        KOG0931
Cvel_1        KOG3638
Cvel_1        KOG0956

Example for desired output:

HTML Code:
Cvel_1 KOG0155, KOG0306, KOG3259, KOG0931, KOG0956
Thanks a lot
Alyaa
# 2  
Old 01-08-2014
try

Code:
$ cat file
Cvel_1        KOG0155
Cvel_1        KOG0306
Cvel_1        KOG3259
Cvel_1        KOG0931
Cvel_1        KOG3638
Cvel_1        KOG0956

$ awk '{A[$1]=A[$1]?A[$1] ", " $NF :$1 " "$NF}END{for (i in A)print A[i]}' file
Cvel_1 KOG0155, KOG0306, KOG3259, KOG0931, KOG3638, KOG0956

# 3  
Old 01-08-2014
Thank you Pamu very much, it works just fine

However, when I try the same command for sth like this:
Code:
"Cvel_1"    " Transcription factor CA150 "
"Cvel_1"    " WD40-repeat-containing subunit of the 18S rRNA processing complex "
"Cvel_1"    " Peptidyl-prolyl cis-trans isomerase "
"Cvel_1"    " Predicted guanine nucleotide exchange factor, contains Sec7 domain "

the output is:
Code:
"Cvel_1"  ";";";"

your help and prompt response are much appreciated
thank you
# 4  
Old 01-08-2014
try
Code:
awk -F "\t" '{A[$1]=A[$1]?A[$1] ", " $NF :$1 " "$NF}END{for (i in A)print A[i]}' file

# 5  
Old 01-08-2014
Thank you VERY much
Perfectly fine
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ... (5 Replies)
Discussion started by: pedro88
5 Replies

2. Shell Programming and Scripting

Print text between 2 identical strings

hey, i m having a hard time trying to print only the first occurrence between 2 idenicale strings. for the following output: please help me im a noob please im a noob help me noob please help me im a noob please im a noob help me noob (3 Replies)
Discussion started by: boaz733
3 Replies

3. Shell Programming and Scripting

Merging strings which have deviation in frequency

Dear all, I need a little help. I am working on a frequency driven database in which the structure is as under: headword=gloss<space>Frequency The data which I am working with has dupes i.e. the Headword is repeated more than once with a different gloss variant on the right hand side and... (8 Replies)
Discussion started by: gimley
8 Replies

4. Programming

Grep part of dataframe in R?

Seems not very post about R language. Here is one: How to grep a sublist of a list like grep -f in unix? say I have a dataframe ID v1 v2 v3 A 1 3 4 B 4 5 6 C 7 8 9 D 1 3 4 E 1 3 3 F 2 4 5 and I only need ID v1 v2 v3 A 1 3 4 C 7 8 9 E 1 3 3 F 2 4 5 by like grep... (2 Replies)
Discussion started by: yifangt
2 Replies

5. SuSE

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

6. Programming

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

7. Shell Programming and Scripting

finding and removing block of identical strings

i have a problem in finding block of identical strings...i solved the problem in finding consecutive identical words and now i want to expand the code in order to find and remove consecutive identical block of strings... for example the awk code removing consecutive identical word is:... (2 Replies)
Discussion started by: cocostaec
2 Replies

8. Shell Programming and Scripting

Using Bash/Sed to delete between identical strings

Hi. I'm hoping that someone can help me with a bash script to delete a block of lines from a file. What I want to do is delete every line between two stings that are the same, including the line the first string is on but not the second. (Marked lines to match with !) For example if I... (2 Replies)
Discussion started by: Zykr
2 Replies

9. Shell Programming and Scripting

count identical strings print last row and count

I have a sorted file like: Apple 3 Apple 5 Apple 8 Banana 2 Banana 3 Grape 31 Orange 7 Orange 13 I'd like to search $1 and if $1 is not the same as $1 in the previous row print that row and print the number of times $1 was found. so the output would look like: Apple 8 3 Banana... (2 Replies)
Discussion started by: dcfargo
2 Replies

10. Shell Programming and Scripting

replace 2 identical strings on different lines

I am looking to replace two or more strings on different lines using sed, but not with the same variable. IE # cat xxx.file <abc> abc def ghi abc def ghi abc def ghi currently I can only change each line with the same pattern: # sed -e '/<abc>/!s/abc\(.*\)/jkl mno/' xxx.file abc jkl mno... (3 Replies)
Discussion started by: prkfriryce
3 Replies
Login or Register to Ask a Question