Sponsored Content
Top Forums Web Development Perl join two files by "common" column Post 302494159 by z1dane on Saturday 5th of February 2011 08:34:05 PM
Old 02-05-2011
Hello!

A common bioinformatics problem, joining two tables :P I wonder why you posted your question to the "Web Development" section but it happens to be the only forum I subscribe to Smilie

Some comments about your code:

Code:
#!/usr/bin/perl -w
use strict;

#!/usr/bin/perl
use strict;
use warnings;

-w has been superseded by use warnings.

Code:
while (<F2>) { 
    s/\r?\n//;                 #remove return of carriage at the end of each line;

chomp;

The chomp() function removes return carriages and newlines.

The strategy I would use is to find some way to just capture the assembly information and using the assembly information to store the information on the line:

Code:
#note untested code
my @F=split /\t/, $_;
#use informative name for first column
my $id_seq = $f[0];
#remove anything after the first pipe
$id_seq =~ s/\|.*//;
#declare new variable for assembly information
my $assembly = '';
#store only assembly information
if ($id_seq =~ /.*(mira_.*)/){
   $assembly = $1;
} else {
   die "Unexpected notation on $. for $id_seq;
}
#store the line information into a hash using $assembly as the key
$line2{$assembly} = $_;

Then read file2 like you did before and get the required information from your %line2 hash.

Hope that works and helps,

Dave
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

2. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

3. Shell Programming and Scripting

awk command to replace ";" with "|" and ""|" at diferent places in line of file

Hi, I have line in input file as below: 3G_CENTRAL;INDONESIA_(M)_TELKOMSEL;SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL My expected output for line in the file must be : "1-Radon1-cMOC_deg"|"LDIndex"|"3G_CENTRAL|INDONESIA_(M)_TELKOMSEL"|LAST|"SPECIAL_WORLD_GRP_7_FA_2_TELKOMSEL" Can someone... (7 Replies)
Discussion started by: shis100
7 Replies

4. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

5. Shell Programming and Scripting

Substituting comma "," for dot "." in a specific column when comma"," is a delimiter

Hi, I'm dealing with an issue and losing a lot of hours figuring out how i would solve this. I have an input file which looks like this: ('BLABLA +200-GRS','Serviço ','TarifaçãoServiço','wap.bla.us.0000000121',2985,0,55,' de conversão em escada','Dia','Domingos') ('BLABLA +200-GRR','Serviço... (6 Replies)
Discussion started by: poliver
6 Replies

6. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

7. Shell Programming and Scripting

Problem of Perl's "join" function

$ perl -e '@f=("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","1","911"); print join("\t",@f)."\n";' aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ... (5 Replies)
Discussion started by: carloszhang
5 Replies

8. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

9. Shell Programming and Scripting

Delete all log files older than 10 day and whose first string of the first line is "MSH" or "<?xml"

Dear Ladies & Gents, I have a requirement to delete all the log files in /var/log/test directory that are older than 10 days and their first line begin with "MSH" or "<?xml" or "FHS". I've put together the following BASH script, but it's erroring out: for filename in $(find /var/log/test... (2 Replies)
Discussion started by: Hiroshi
2 Replies

10. Shell Programming and Scripting

Join, merge, fill NULL the void columns of multiples files like sql "LEFT JOIN" by using awk

Hello, This post is already here but want to do this with another way Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv: 1|123|jojo 1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies
gacutil(Mono 1.0)														 gacutil(Mono 1.0)

NAME
gacutil - Global Assembly Cache management utility. SYNOPSIS
gacutil [-user] [command] [options] DESCRIPTION
gacutil is a tool used by developers to install versioned assemblies into the system Global Assembly Cache (GAC) to become part of the assemblies that are available for all applications at runtime. Notice that they are not directly available to the compiler. The convention is that assemblies must also be placed in a separate directory to be accessed by the compiler. This is done with the -package directive to gacutil. The tool allows for installation, removal, and listing of the contents of the assembly cache. The GAC is relative to the Mono installation prefix: mono_prefix/lib/mono. COMMANDS
-i <assembly_path> [-check_refs] [-package NAME] [-root ROOTDIR] [-gacdir GACDIR] Installs an assembly into the global assembly cache. <assembly_path> is the name of the file that contains the assembly manifest The -package option can be used to also create a directory in in prefix/lib/mono with the name NAME, and a symlink is created from NAME/assembly_name to the assembly on the GAC. This is used so developers can reference a set of libraries at once. The -root option is used to specify the "libdir" value of an installation prefix which differs from the prefix of the system GAC. Typical automake usage is "-root $(DESTDIR)$(prefix)/lib". To access assemblies installed to a prefix other than the mono prefix, it is necessary to set the MONO_GAC_PREFIX environment variable. The -gacdir option is included for backward compatibility but is not recommended for new code. Use the -root option instead. The -check_refs option is used to ensure that the assembly being installed into the GAC does not reference any non strong named assemblies. Assemblies being installed to the GAC should not reference non strong named assemblies, however the is an optional check. -l [assembly_name] [-root ROOTDIR] [-gacdir GACDIR] Lists the contents of the global assembly cache. When the <assembly_name> parameter is specified only matching assemblies are listed. -u <assembly_display_name> [-package NAME] [-root ROOTDIR] [-gacdir GACDIR] Uninstalls an assembly from the global assembly cache. <assembly_display_name> is the name of the assembly (partial or fully qualified) to remove from the global assembly cache. If a partial name is specified all matching assemblies will be uninstalled. As opposed to the install option that takes a filename, this takes as an argument the assembly name, which looks like this: MyLibrary.Something, version=1.0.0.0, publicKeyToken=xxxx,culture=neutral Notice that you can have spaces in the command line. There is no need to quote them. Performs a greedy removal. If you only specify one component like, "MyLibrary.Something", it will remove all versions of the library. -us <assembly_path> [-package NAME] [-root ROOTDIR] [-gacdir GACDIR] Uninstalls an assembly using the specified assembly's full name. <assembly path> is the path to an assembly. The full assembly name is retrieved from the specified assembly if there is an assembly in the GAC with a matching name, it is removed. Unlike the -u option this option takes a file name, like this: Example: -us myDll.dll -ul <assembly_list_file> [-package NAME] [-root ROOTDIR] [-gacdir GACDIR] Uninstalls one or more assemblies from the global assembly cache. <assembly_list_file> is the path to a test file containing a list of assembly names on separate lines. Example -ul assembly_list.txt assembly_list.txt contents: assembly1,Version=1.0.0.0,Culture=en,PublicKeyToken=0123456789abcdef assembly2,Version=2.0.0.0,Culture=en,PublicKeyToken=0123456789abcdef FILES
On Unix assemblies are loaded from the installation lib directory. If you set `prefix' to /usr, the assemblies will be located in /usr/lib. On Windows, the assemblies are loaded from the directory where mono and mint live. /etc/mono/config, ~/.mono/config Mono runtime configuration file. See the mono-config(5) manual page for more information. WEB SITE
Visit: http://www.go-mono.com for details SEE ALSO
mcs(1),mono(1) gacutil(Mono 1.0)
All times are GMT -4. The time now is 09:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy