Removing file lines that each match to a different patterns
I have a very large file (10,000,000 lines), that contains a sample id and a property of that sample. I have another file that contains around 1,000,000 lines with sample ids that I want to remove from the original file (create a new file without these lines).
I know how to do this in Perl, but it is too time consuming to run. I am aware of sed and awk as commands that should be able to complete this task in a much faster time. I have tried to implement codes that I thought would work, even after consulting previous posts, none seem to quite cover it. I also find it hard to debug as the server I'm working on is French so I don't understand the error messages of my command.
Please could anyone suggest a quick way of achieving this ?
Here are examples of the files I'm dealing with.
Here is a tab delineated sample id and property.
Here is a list of ids (The common prefix is missing) I wish to remove:
Many thanks in advance for any help you can provide.
Last edited by Franklin52; 04-15-2010 at 08:47 AM..
Reason: Please use code tags!
If your second example 1:2:3 is representative of the actual file contents of the small file, i.e., it has no prefix and no suffixed data either
Also
may help your error message language problem.
Last edited by jim mcnamara; 04-14-2010 at 08:37 AM..
Great, thank you!
The grep works, but I was afraid it would also run too slowly. I just did a sample that searched 10,000 lines for 1,000 ids and it worked in about 2 seconds. I'm rather happy with that. I just hope the large files don't add too much load.
@jim mcnamara. The awk works wonderfully, but how do I get the data into a new file rather than print?
Thanks a lot for the help with the language problem. I'll definitely use that.
I have a file similar to the below. I am selecting only the paragraphs with @inlineifset.
I am using the following command
sed '/@inlineifset/,/^ *$/!d;
s/@inlineifset{mrg, @btpar{@//' $flnm >> $ofln
This produces
@section Correlations between
seismograms,,,,}}
... (5 Replies)
In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
GM,
I have an issue at work, which requires a simple solution. But, after multiple attempts, I have not been able to hit on the code needed.
I am assuming that sed, awk or even perl could do what I need.
I have an application that adds extra blank page feeds, for multiple reports, when... (7 Replies)
Hi all,
I have a file , which has 6 tab delimited fields, with $3 and $4 subfielded with spaces. I wamt to match cols $2,$3,$4 of tmp1 with tmp2, ..and then flag the 5th col if found.
tmp1
1756 Xerm XermA XermB XermC XermD AA TT AA GG A 1
1763 Xerm XermA XermB XermC... (3 Replies)
Hi,
i have been trying to extract multiple lines based on two different patterns as below:-
file1
@jkm|kdo|aas012|192.2.3.1 blablbalablablkabblablabla
sjfdsakfjladfjefhaghfagfkafagkjsghfalhfk
fhajkhfadjkhfalhflaffajkgfajkghfajkhgfkf
jahfjkhflkhalfdhfwearhahfl
@jkm|sdf|wud08q|168.2.1.3... (8 Replies)
I have two files. The first containing a header and six columns of data.
Example file 1:
Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position
787066 SNP_A-8575395 RS6650104 1 NOCALL 564477
786872 SNP_A-8575125 RS10458597 1 AA ... (13 Replies)
Dear all,
I need to search multiple patterns and then I need to print their respective next lines. For an example, in the below table, I will look for 3 different patterns :
1) # ATC_Codes:
2) # Generic_Name:
3) # Drug_Target_1_Gene_Name:
#BEGIN_DRUGCARD DB00001
# AHFS_Codes:... (3 Replies)
Hi Gurus,
I have a file say for ex. file1 which has 3500 lines in it which are different account numbers and another file (file2) which has 230000 lines in it. I want to read all the lines in file1 and delete all those lines from file2 which has that same pattern as in file1. I am not quite... (4 Replies)
Hi,
From the pattern mentioned below remove lines based on pattern range.
Conditions
1 Look For all lines starting with ALTER TABLE and Ending with ; and contains the word MOVE.I wanto to remove these lines from the file sample below.
Note : The above pattern list could be found in... (1 Reply)