Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Merging rows based on same ID in First column. Post 303037774 by anjaliANJALI on Tuesday 13th of August 2019 08:34:25 AM
Old 08-13-2019
Merging rows based on same ID in First column.

Hellow,


I have a tab-delimited file with 3 columns :


Code:
BINPACKER.13259.1.p2    SSF48239    
BINPACKER.13259.1.p2    PF13243    
BINPACKER.13259.1.p2    G3DSA:1.50.10.20
BINPACKER.13259.2.p2    SSF48239    
BINPACKER.13259.2.p2    PF13243    
BINPACKER.13259.2.p2    G3DSA:1.50.10.20
BINPACKER.31705.4.p1    PF00176    GO:0005524
BINPACKER.31705.4.p1    SM00490    
BINPACKER.31705.4.p1    SSF52540    
BINPACKER.31705.4.p1    G3DSA:3.40.50.300
BINPACKER.31705.4.p1    mobidb-lite
BINPACKER.31705.4.p1    SM00487    
BINPACKER.31705.4.p1    PS51194    
BINPACKER.31705.4.p1    cd00079    
BINPACKER.31705.4.p1    PF00271    
BINPACKER.31705.4.p1    PS51192    
BINPACKER.31705.4.p1    cd00046
BINPACKER.31705.4.p1    G3DSA:3.40.50.10810    
BINPACKER.31705.4.p1    SSF52540    
BINPACKER.9719.7.p1    PF00443    GO:0016579|GO:0036459
BINPACKER.9719.7.p1    SSF57850
BINPACKER.9719.7.p1    PS50235    
BINPACKER.9719.7.p1    mobidb-lite
BINPACKER.9719.7.p1    PF02148    GO:0008270
BINPACKER.9719.7.p1    SSF54001    
BINPACKER.9719.7.p1    mobidb-lite
BINPACKER.9719.7.p1    cd02669    GO:0000245|GO:0006397
BINPACKER.9719.7.p1    PS50271    GO:0008270
BINPACKER.9719.7.p1    SM00290    GO:0008270
BINPACKER.9719.7.p1    mobidb-lite
BINPACKER.9719.7.p1    mobidb-lite
BINPACKER.9719.7.p1    G3DSA:3.30.40.10    
BINPACKER.9719.7.p1    G3DSA:3.90.70.10
BINPACKER.937.4.p1    PS51032    GO:0003700|GO:0006355
BINPACKER.937.4.p1    PIRSF038123    GO:0003700
BINPACKER.937.4.p1    cd00018    GO:0003700|GO:0006355
BINPACKER.937.4.p1    SSF54171    GO:0003677
BINPACKER.937.4.p1    G3DSA:3.30.730.10    GO:0003700|GO:0006355
BINPACKER.937.4.p1    PR00367    GO:0003700|GO:0006355

I want to mege the rows based on first column with same ID. In column 2, I want only ID starting with PF and in 3rd column, want to concatenate all GO term seperated with comma. in each case there should be no duplicate eg:


Code:
BINPACKER.13259.1.p2    PF13243    NA
BINPACKER.13259.2.p2  PF13243                    NA
 BINPACKER.31705.4.p1    PF00176,PF00271    GO:0005524
BINPACKER.9719.7.p1    PF00443,PF02148    GO:0016579,GO:0036459,GO:0008270,GO:0000245,GO:0006397
BINPACKER.937.4.p1    NA    GO:0003700,GO:0006355,GO:0003677

Thankyou

Last edited by anjaliANJALI; 08-13-2019 at 01:43 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merging column from two files based on identifier

Hi, I have two files consisting of two columns. So I want to merge column 2 if column 1 is the same. So heres an example of what I mean. FILE1 driver 444 car 333 hat 222 FILE2 driver 333 car 666 hat 999 So I want to merge the column 2's together so... (4 Replies)
Discussion started by: phil_heath
4 Replies

2. Shell Programming and Scripting

column to rows based on another column...

Guys, i have a file in below format where the barcode's are uniq per site but could be repeated for different site. so i want to convert the site column to rows based on the barcode's as below output. your help is appreciated!!! input: SITE BARCODE QTY SP CP 10001 6281103890017 10 50 48... (5 Replies)
Discussion started by: malcomex999
5 Replies

3. Shell Programming and Scripting

Merging 2 files based on a common column

Hi All, I do have 2 files file 1 has 4 tab delimited columns 234 a c dfgyu 294 b g fih 302 c h jzh 328 z c san 597 f g son File 2 has 2 tab delimted columns 234 23 302 24 597 24 I want to merge file 2 with file 1 based on the data common in both files which is the first column so... (6 Replies)
Discussion started by: Lucky Ali
6 Replies

4. Shell Programming and Scripting

merging two files based on first column

I had two files file1 and file2. I want a o/p file(file3) like below using first column as ref. Pls give suggestion ass join is not working as the number of lines in each file is nealry 5 C? file1 --------------------- 404000324810001 Y 404000324810004 N 404000324810008 Y 404000324810009 N... (1 Reply)
Discussion started by: p_sai_ias
1 Replies

5. Shell Programming and Scripting

Merging rows with same column 1 value

I have the following space-delimited input: 1 11.785710 117.857100 1 15 150 1 20 200 1 25 250 3 2.142855 21.428550 3 25 250 22 1.071435 10.714350 The first field is the ID number, the second field is the percentage of the total points that the person has and the third column is the number... (3 Replies)
Discussion started by: mdlloyd7
3 Replies

6. Shell Programming and Scripting

Merging columns based on one or more column in two files

I have two files. FileA.txt 30910 rs7468327 36587 rs10814410 91857 rs9408752 105797 rs1133715 146659 rs2262038 152695 rs2810979 181843 rs3008128 182129 rs3008131 192118 rs3008170 FileB.txt 30910 1.9415219673 0 36431 1.3351312477 0.0107191428 36587 1.3169171182... (2 Replies)
Discussion started by: genehunter
2 Replies

7. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

8. UNIX for Dummies Questions & Answers

Merging lines based on one column

Hi, I have a file which I'd like to merge lines based on duplicates in one column while keeping the info for other columns. Let me simplify it by an example: File ESR1 ANASTROZOLE NA FDA_approved ESR1 CISPLATIN NA FDA_approved ESR1 DANAZOL agonist NA ESR1 EXEMESTANE NA FDA_approved... (3 Replies)
Discussion started by: JJ001
3 Replies

9. UNIX for Dummies Questions & Answers

File merging based on column patterns

Hello :) I am in this situation: Input: two tab-delimited files, `File1` and `File2`. `File2` (`$2`) has to be parsed by patterns found in `File1` (`$1`). Expected output: tab-delimited file, `File3`. `File3` has to contain the same rows as `File2`, plus the corresponding value in... (5 Replies)
Discussion started by: dovah
5 Replies

10. UNIX for Beginners Questions & Answers

Merging multiple lines into single line based on one column

I Want to merge multiple lines based on the 1st field and keep into single record. SRC File: AAA_POC_DB.TAB1 AAA_POC_DB.TAB2 AAA_POC_DB.TAB3 AAA_POC_DB.TAB4 BBB_POC_DB.TAB1 BBB_POC_DB.TAB2 CCC_POC_DB.TAB6 OUTPUT ----------------- 'AAA_POC_DB','TAB1','TAB2','TAB3','TAB4'... (10 Replies)
Discussion started by: raju2016
10 Replies
MADISON-LITE(1)                                           Debian General Commands Manual                                           MADISON-LITE(1)

NAME
madison-lite -- display versions of Debian packages in an archive SYNOPSIS
madison-lite [--config-file file] [--mirror directory] [--nocache] [--update] [-S] [-r] [-a architecture[,...]] [-c component[,...]] [-s suite[,...]] package [...] DESCRIPTION
madison-lite inspects a local Debian package archive and displays the versions of the given packages found in each suite (for example, stable, testing, or unstable) in a brief but easily human-readable form. It aims to be a drop-in replacement for the madison utility (since renamed to dak ls), from the dak archive management suite that runs on the central Debian archive systems, but one which can run without access to the archive's SQL database. The following options are available: --config-file file Read configuration from file, and ignore the system configuration file (see CONFIGURATION below). --mirror directory Quick configuration: use directory as the top level of the Debian mirror. --nocache Normally, parts of the Packages and Sources files in the archive are cached in ~/.madison-lite/cache for speed. This option disables that behaviour. --update Force caches of Packages and Sources files to be updated. -S, --source-and-binary Interpret package as a source package name, and display versions of any associated binary packages as well as of the source package. -r, --regex Interpret package as a Perl regular expression anchored at the start of the package name rather than as an exact name. Make sure to quote any shell metacharacters such as '*' or '?' if necessary. -a, --architecture architecture[,...] Display only entries for packages built for these architectures. Separate multiple architectures with commas or spaces. -c, --component component[,...] Display only entries in the given components. Separate multiple components with commas or spaces. -s, --suite suite[,...] Display only entries in the given suites. Separate multiple suites with commas or spaces. CONFIGURATION
madison-lite reads configuration information from the file named by --config-file, or, if that is not supplied, from the first of ~/.madison-lite/config and /etc/madison-lite/config that exists. The following configuration directives are recognized: mirror directory Set the top-level directory of the local Debian mirror. Relative directories in the suite directive are interpreted relative to this directory. Defaults to the current directory. suite name directory [component [...]] Defines the suite name based at directory, containing the specified components (defaulting to all subdirectories of directory). Output is displayed following the order of suite directives in the configuration file. If no suite directives are present, then every subdi- rectory of the dists directory under mirror is treated as a suite, with all of their subdirectories as components. The Debian archive is structured such that the subdirectories of each suite directory identify components (such as main). Each of those in turn has subdirectories for each architecture (binary-i386, and so on), each of which contains any or all of Packages, Packages.gz, and Packages.bz2 files listing binary packages; it also has a subdirectory called source which contains any or all of Sources, Sources.gz, and Sources.bz2 files listing source packages. The configuration file may contain comment lines, which start with a '#' character. EXAMPLES
Show versions of the coreutils package: $ madison-lite coreutils Show versions of all binary packages on powerpc produced by the glibc source package: $ madison-lite -S -a powerpc glibc Show versions of all packages in the unstable suite whose names begin with 'man': $ madison-lite -s unstable -r 'man.*' An example configuration file for a simple local mirror: mirror /mirror/debian suite unstable dists/unstable main suite unstable-non-US non-US/dists/unstable non-US/main SEE ALSO
dpkg-scanpackages(8), dpkg-scansources(8), apt-ftparchive(1) AUTHORS
madison-lite was written by Colin Watson <cjwatson@debian.org>. The interface mirrors that of madison (since renamed to dak ls), written by James Troup. Debian August 1, 2007 Debian
All times are GMT -4. The time now is 09:15 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy