Hello!
I am writing a program to run through two large lists of data (~300,000 rows), find where rows in one file match another, and combine them based on matching fields. Due to the large file sizes, I'm guessing AWK will be the most efficient way to do this. Overall, the input and output I'm... (5 Replies)
Hi folks,
Lets say I have the following text file:
name, lastname, 1234, name.lastname@test.com
name1, lastname1, name2.lastname2@test.com, 2345
name, 3456, lastname, name3.lastname3@test.com
4567, name, lastname, name4.lastname4@test.com
I now need the following output:
1234... (5 Replies)
Hi experts,
I need to print the first field first then last two fields should come next and then i need to print rest of the fields.
Input :
a1,abc,jsd,fhf,fkk,b1,b2
a2,acb,dfg,ghj,b3,c4
a3,djf,wdjg,fkg,dff,ggk,d4,d5
Expected output:
a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Dear AWK-experts!
I did get stuck in the task of combining files after matching fields, so I'm still awkward with learning AWK.
There are 2 files: one containing 3 columns with ID, coding status, and score for long noncoding RNAs:
file1 (1.txt) (>5000 lines)
... (12 Replies)
Hi all,
I want to extract rows with the pattern ALPHANUMERIC/ALPHANUMNERIC in the 2nd column.
I dont wan rows with more than 1 slash or without any slash in 2nd column.
a a/b
b a/b/c
c a/b//c
d t/y
e r
f /f
I came up with the regex
grep '\/$' file
a a/b
b a/b/c
d t/y (3 Replies)
Hi
I have a file as below
<field1> <field2> <field3> ... <field_num1> <field_num2>
Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields.
I tried this and it doesn't sort on the difference field .. Appreciate your help.
cat... (9 Replies)
In the below I am trying to use awk to match all the $13 values in input, which is tab-delimited,
that are in $1 of gene which is just a single column of text.
However only the line with the greatest $9 value in input needs to be printed.
So in the example below all the MECP2 and LTBP1... (0 Replies)
Hi everyone,
Given two files (test1 and test2) with the following contents:
test1:
80263760,I71
80267369,M44
80274628,L77
80276793,I32
80277390,K05
80277391,I06
80279206,I43
80279859,K37
80279866,K35
80279867,J16
80280346,I14and test2:
80263760,PT18
80279867,PT01I need to do some... (3 Replies)
Hi,
I have 2 tab-delimited input files as follows.
file1.tab:
green A apple
red B apple
file2.tab:
apple - A;Z
Objective:
Return $1 of file1 if,
. $1 of file2 matches $3 of file1 and,
. any single element (separated by ";") in $3 of file2 is present in $2 of file1
In order to... (3 Replies)
Trying to use awk to match the contents of each line in file1 with $5 in file2. Both files are tab-delimited and there may be a space or special character in the name being matched in file2, for example in file1 the name is BRCA1 but in file2 the name is BRCA 1 or in file1 name is BCR but in file2... (6 Replies)
Discussion started by: cmccabe
6 Replies
LEARN ABOUT MOJAVE
english
English(3pm) Perl Programmers Reference Guide English(3pm)NAME
English - use nice English (or awk) names for ugly punctuation variables
SYNOPSIS
use English;
use English qw( -no_match_vars ) ; # Avoids regex performance penalty
# in perl 5.16 and earlier
...
if ($ERRNO =~ /denied/) { ... }
DESCRIPTION
This module provides aliases for the built-in variables whose names no one seems to like to read. Variables with side-effects which get
triggered just by accessing them (like $0) will still be affected.
For those variables that have an awk version, both long and short English alternatives are provided. For example, the $/ variable can be
referred to either $RS or $INPUT_RECORD_SEPARATOR if you are using the English module.
See perlvar for a complete list of these.
PERFORMANCE
NOTE: This was fixed in perl 5.20. Mentioning these three variables no longer makes a speed difference. This section still applies if
your code is to run on perl 5.18 or earlier.
This module can provoke sizeable inefficiencies for regular expressions, due to unfortunate implementation details. If performance matters
in your application and you don't need $PREMATCH, $MATCH, or $POSTMATCH, try doing
use English qw( -no_match_vars ) ;
. It is especially important to do this in modules to avoid penalizing all applications which use them.
perl v5.18.2 2014-01-06 English(3pm)