Sponsored Content
Top Forums Shell Programming and Scripting Removing file lines that each match to a different patterns Post 302412980 by Jo_puzzled on Wednesday 14th of April 2010 07:01:10 AM
Old 04-14-2010
Removing file lines that each match to a different patterns

I have a very large file (10,000,000 lines), that contains a sample id and a property of that sample. I have another file that contains around 1,000,000 lines with sample ids that I want to remove from the original file (create a new file without these lines).
I know how to do this in Perl, but it is too time consuming to run. I am aware of sed and awk as commands that should be able to complete this task in a much faster time. I have tried to implement codes that I thought would work, even after consulting previous posts, none seem to quite cover it. I also find it hard to debug as the server I'm working on is French so I don't understand the error messages of my command.

Please could anyone suggest a quick way of achieving this ?

Here are examples of the files I'm dealing with.

Here is a tab delineated sample id and property.
Code:
HELIUM:1:2:3      ABCDEF
HELIUM:1:2:4      ADEFBC
HELIUM:1:2:5      BDFACE
HELIUM:1:2:6      BEBACG
HELIUM:1:2:7      ABCDEF
HELIUM:1:2:8      ADEFBC
HELIUM:1:2:9      BDFACE
HELIUM:1:3:0      BEBACG

Here is a list of ids (The common prefix is missing) I wish to remove:
Code:
:1:2:3
:1:2:5
:1:2:6
:1:2:9

Many thanks in advance for any help you can provide.

Last edited by Franklin52; 04-15-2010 at 08:47 AM.. Reason: Please use code tags!
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed/awk help to match list of patterns and remove from org file

Hi, From the pattern mentioned below remove lines based on pattern range. Conditions 1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below. Note : The above pattern list could be found in... (1 Reply)
Discussion started by: rajan_san
1 Replies

2. Shell Programming and Scripting

Searching patterns in 1 file and deleting all lines with those patterns in 2nd file

Hi Gurus, I have a file say for ex. file1 which has 3500 lines in it which are different account numbers and another file (file2) which has 230000 lines in it. I want to read all the lines in file1 and delete all those lines from file2 which has that same pattern as in file1. I am not quite... (4 Replies)
Discussion started by: toms
4 Replies

3. Shell Programming and Scripting

print lines which match multiple patterns

Hi, I have a text file as follows: 11:38:11.054 run1_rdseq avg_2-5 999988.0000 1024.0000 11:50:52.053 run3_rdrand 999988.0000 1135.0 128.0417 11:53:18.050 run4_wrrand avg_2-5 999988.0000 8180.5833 11:55:42.051 run4_wrrand avg_2-5 999988.0000 213.8333 11:55:06.053... (2 Replies)
Discussion started by: annazpereira
2 Replies

4. Shell Programming and Scripting

Match multiple patterns in a file and then print their respective next line

Dear all, I need to search multiple patterns and then I need to print their respective next lines. For an example, in the below table, I will look for 3 different patterns : 1) # ATC_Codes: 2) # Generic_Name: 3) # Drug_Target_1_Gene_Name: #BEGIN_DRUGCARD DB00001 # AHFS_Codes:... (3 Replies)
Discussion started by: AshwaniSharma09
3 Replies

5. Shell Programming and Scripting

Retrieve lines that match any occurence in a list of patterns

I have two files. The first containing a header and six columns of data. Example file 1: Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position 787066 SNP_A-8575395 RS6650104 1 NOCALL 564477 786872 SNP_A-8575125 RS10458597 1 AA ... (13 Replies)
Discussion started by: Selftaught
13 Replies

6. Shell Programming and Scripting

Match 2 different patterns and print the lines

Hi, i have been trying to extract multiple lines based on two different patterns as below:- file1 @jkm|kdo|aas012|192.2.3.1 blablbalablablkabblablabla sjfdsakfjladfjefhaghfagfkafagkjsghfalhfk fhajkhfadjkhfalhflaffajkgfajkghfajkhgfkf jahfjkhflkhalfdhfwearhahfl @jkm|sdf|wud08q|168.2.1.3... (8 Replies)
Discussion started by: redse171
8 Replies

7. UNIX for Dummies Questions & Answers

Match patterns from another file and tag

Hi all, I have a file , which has 6 tab delimited fields, with $3 and $4 subfielded with spaces. I wamt to match cols $2,$3,$4 of tmp1 with tmp2, ..and then flag the 5th col if found. tmp1 1756 Xerm XermA XermB XermC XermD AA TT AA GG A 1 1763 Xerm XermA XermB XermC... (3 Replies)
Discussion started by: senhia83
3 Replies

8. Shell Programming and Scripting

Removing multiple lines from input file, if multiple lines match a pattern.

GM, I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed. I am assuming that sed, awk or even perl could do what I need. I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
Discussion started by: jxfish2
7 Replies

9. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

10. Shell Programming and Scripting

Selecting text on multiple lines, then removing a beginning and end patterns

I have a file similar to the below. I am selecting only the paragraphs with @inlineifset. I am using the following command sed '/@inlineifset/,/^ *$/!d; s/@inlineifset{mrg, @btpar{@//' $flnm >> $ofln This produces @section Correlations between seismograms,,,,}} ... (5 Replies)
Discussion started by: Danette
5 Replies
Format(3pm)						User Contributed Perl Documentation					       Format(3pm)

NAME
Locale::Currency::Format - Perl functions for formatting monetary values SYNOPSIS
use Locale::Currency::Format; $amt = currency_format('USD', 1000); # => 1,000.00 USD $amt = currency_format('EUR', 1000, FMT_COMMON); # => EUR1.000,00 $amt = currency_format('USD', 1000, FMT_SYMBOL); # => $1,000.00 $sym = currency_symbol('USD'); # => $ $sym = currency_symbol('GBP', SYM_HTML); # => &#163; $decimals = decimal_precision('USD'); # => 2 $decimals = decimal_precision('BHD'); # => 3 $thou_sep = thousands_separator('USD'); # => , $thou_sep = thousands_separator('EUR'); # => . $dec_sep = decimal_separator('USD'); # => . $dec_sep = decimal_separator('EUR'); # => , currency_set('USD', '#.###,## $', FMT_COMMON); # => set custom format currency_format('USD', 1000, FMT_COMMON); # => 1.000,00 $ currency_set('USD'); # => reset default format The following example illustrates how to use Locale::Currency::Format with Mason. Skip it if you are not interested in Mason. A simple Mason component might look like this: Total: <% 123456789, 'eur' |c %> <%init> use Locale::Currency::Format; $m->interp->set_escape(c => &escape_currency); sub escape_currency { my ($amt, $code) = ${$_[0]} =~ /(.*?)([A-Za-z]{3})/; ${$_[0]} = currency_format($code, $amt, FMT_HTML); } </%init> DESCRIPTION
Locale::Currency::Format is a light-weight Perl module that enables Perl code to display monetary values in the formats recognized internationally and/or locally. "currency_format(CODE, AMOUNT [, FORMAT])" currency_format takes two mandatory parameters, namely currency code and amount respectively, and optionally a third parameter indicating which format is desired. Upon failure, it returns undef and an error message is stored in $Locale::Currency::Format::error. CODE A 3-letter currency code as specified in ISO 4217. Note that old code such as GBP, FRF and so on can also be valid. AMOUNT A numeric value. FORMAT There are five different format options FMT_STANDARD, FMT_COMMON, FMT_SYMBOL, FMT_HTML and FMT_NAME. If it is omitted, the default format is FMT_STANDARD. FMT_STANDARD Ex: 1,000.00 USD, 1.000.000,00 EUR FMT_SYMBOL Ex: $1,000.00 FMT_COMMON Ex: 1.000 Dong (Vietnam), BEF 1.000 (Belgium) FMT_HTML Ex: &#xA3;1,000.00 (pound-sign HTML escape) FMT_NAME Ex: 1,000.00 US Dollar NOTE: By default the trailing zeros after the decimal point will be added. To turn it off, do a bitwise C<or> of FMT_NOZEROS with one of the five options above. Ex: FMT_STANDARD | FMT_NOZEROS will give 1,000 USD "currency_symbol(CODE [, TYPE])" For conveniences, the function currency_symbol is provided for currency symbol lookup given a 3-letter currency code. Optionally, one can specify which format the symbol should be returned - Unicode-based character or HTML escape. Default is a Unicode-based character. Upon failure, it returns undef and an error message is stored in $Locale::Currency::Format::error. CODE A 3-letter currency code as specified in ISO 4217 TYPE There are two available types SYM_UTF and SYM_HTML SYM_UTF returns the symbol (if exists) as an Unicode character SYM_HTML returns the symbol (if exists) as a HTML escape "decimal_precision(CODE)" For conveniences, the function decimal_precision is provided to lookup the decimal precision for a given 3-letter currency code. Upon failure, it returns undef and an error message is stored in $Locale::Currency::Format::error. CODE A 3-letter currency code as specified in ISO 4217 "decimal_separator(CODE)" For conveniences, the function decimal_separator is provided to lookup the decimal separator for a given 3-letter currency code. Upon failure, it returns undef and an error message is stored in $Locale::Currency::Format::error. CODE A 3-letter currency code as specified in ISO 4217 "thousands_separator(CODE)" For conveniences, the function thousands_separator is provided to lookup the thousands separator for a given 3-letter currency code. Upon failure, it returns undef and an error message is stored in $Locale::Currency::Format::error. CODE A 3-letter currency code as specified in ISO 4217 "currency_set(CODE [, TEMPLATE, FORMAT])" currency_set can be used to set a custom format for a currency instead of the provided format. For example, in many non-English speaking countries, the US dollars might be displayed as 2.999,99 $ instead of the usual $2,999.99. In order to accomplish this, one will need to do as follows: use Locale::Currency::Format qw(:default $error); my $currency = 'USD'; my $template = '#.###,## $'; if (currency_set($currency, $template, FMT_COMMON)) { print currency_format($currency, 2999.99, FMT_COMMON), " "; } else { print "cannot set currency format for $currency: $error "; } The arguments to currency_set are: CODE A 3-letter currency code as specified in ISO 4217 TEMPLATE A template in the form #.###,##$, #.### kr, etc. If a unicode character is used, make sure that the template is double-quoted. Ex: currency_set('GBP', "x{00A3}#,###.##", FMT_SYMBOL) If an HTML symbol is wanted, escape its equivalent HTML code. Ex: currency_set('GBP', '&#x00A3;#,###.##', FMT_HTML) FORMAT This argument is required if TEMPLATE is present. The formats FMT_SYMBOL, FMT_COMMON, FMT_HTML are accepted. NOTE! FMT_STANDARD and FMT_NAME will always be in the form <amount><space><code|name> such as 1,925.95 AUD. Hence, currency_set returns an error if FMT_STANDARD or FMT_NAME is specified as FORMAT. With FMT_COMMON, you can always achieve what you would have done with FMT_STANDARD and FMT_NAME, as follows my $amt = 1950.95; currency_set('USD', 'USD #.###,##', FMT_COMMON); print currency_format('USD', $amt, FMT_COMMON); # USD 1,950.95 currency_set('USD', 'US Dollar #.###,##', FMT_COMMON); print currency_format('USD', $amt, FMT_COMMON); # US Dollar 1,950.95 Invoking currency_set with one argument will reset all formats to their original settings. For example currency_set('USD') will clear all previous custom settings for the US currency (ie. FMT_SYMBOL, FMT_HTML, FMT_COMMON). A WORD OF CAUTION Please be aware that some currencies might have missing common format. In that case, currency_format will fall back to FMT_STANDARD format. Also, be aware that some currencies do not have monetary symbol. As countries merge together or split into smaller ones, currencies can be added or removed by the ISO. Please help keep the list up to date by sending your feedback to the email address at the bottom. To see the error, examine $Locale::Currency::Format::error use Locale::Currency::Format; my $value = currency_format('USD', 1000); print $value ? $value : $Locale::Currency::Format::error OR use Locale::Currency::Format qw(:DEFAULT $error); my $value = currency_format('USD', 1000); print $value ? $value : $error Lastly, please refer to perluniintro and perlunicode for displaying Unicode characters if you intend to use FMT_SYMBOL and currency_symbol. Otherwise, it reads "No worries, mate!" SEE ALSO
Locale::Currency, Math::Currency, Number::Format, perluniintro, perlunicode BUGS
If you find any inaccurate or missing information, please send your comments to tnguyen@cpan.org. Your effort is certainly appreciated! CONTRIBUTOR(S) James Kiser <james.kiser@gmail.com> AUTHOR
Tan D Nguyen <tnguyen@cpan.org> COPYRIGHT
This library is free software. You may distribute under the terms of either the GNU General Public License or the Artistic License. perl v5.12.4 2011-07-11 Format(3pm)
All times are GMT -4. The time now is 09:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy