Removing file lines that each match to a different patterns
I have a very large file (10,000,000 lines), that contains a sample id and a property of that sample. I have another file that contains around 1,000,000 lines with sample ids that I want to remove from the original file (create a new file without these lines).
I know how to do this in Perl, but it is too time consuming to run. I am aware of sed and awk as commands that should be able to complete this task in a much faster time. I have tried to implement codes that I thought would work, even after consulting previous posts, none seem to quite cover it. I also find it hard to debug as the server I'm working on is French so I don't understand the error messages of my command.
Please could anyone suggest a quick way of achieving this ?
Here are examples of the files I'm dealing with.
Here is a tab delineated sample id and property.
Here is a list of ids (The common prefix is missing) I wish to remove:
Many thanks in advance for any help you can provide.
Last edited by Franklin52; 04-15-2010 at 08:47 AM..
Reason: Please use code tags!
Hi,
From the pattern mentioned below remove lines based on pattern range.
Conditions
1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below.
Note : The above pattern list could be found in... (1 Reply)
Hi Gurus,
I have a file say for ex. file1 which has 3500 lines in it which are different account numbers and another file (file2) which has 230000 lines in it. I want to read all the lines in file1 and delete all those lines from file2 which has that same pattern as in file1. I am not quite... (4 Replies)
Dear all,
I need to search multiple patterns and then I need to print their respective next lines. For an example, in the below table, I will look for 3 different patterns :
1) # ATC_Codes:
2) # Generic_Name:
3) # Drug_Target_1_Gene_Name:
#BEGIN_DRUGCARD DB00001
# AHFS_Codes:... (3 Replies)
I have two files. The first containing a header and six columns of data.
Example file 1:
Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position
787066 SNP_A-8575395 RS6650104 1 NOCALL 564477
786872 SNP_A-8575125 RS10458597 1 AA ... (13 Replies)
Hi,
i have been trying to extract multiple lines based on two different patterns as below:-
file1
@jkm|kdo|aas012|192.2.3.1 blablbalablablkabblablabla
sjfdsakfjladfjefhaghfagfkafagkjsghfalhfk
fhajkhfadjkhfalhflaffajkgfajkghfajkhgfkf
jahfjkhflkhalfdhfwearhahfl
@jkm|sdf|wud08q|168.2.1.3... (8 Replies)
Hi all,
I have a file , which has 6 tab delimited fields, with $3 and $4 subfielded with spaces. I wamt to match cols $2,$3,$4 of tmp1 with tmp2, ..and then flag the 5th col if found.
tmp1
1756 Xerm XermA XermB XermC XermD AA TT AA GG A 1
1763 Xerm XermA XermB XermC... (3 Replies)
GM,
I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed.
I am assuming that sed, awk or even perl could do what I need.
I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
I have a file similar to the below. I am selecting only the paragraphs with @inlineifset.
I am using the following command
sed '/@inlineifset/,/^ *$/!d;
s/@inlineifset{mrg, @btpar{@//' $flnm >> $ofln
This produces
@section Correlations between
seismograms,,,,}}
... (5 Replies)