Hello,
I have two files.
File 1 is a list of interested IDs
Code:
Ex1
Ex2
Ex3
File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz
Code:
Ex1 xx xx xx xx ....
Ex2 xx xx xx xx ....
Ex2 xx xx xx xx ....
Now I need to extract the information for all the IDs of interest from File 1. I have a script that should do that
Code:
import argparse
import gzip
if __name__ == '__main__':
parser = argparse.ArgumentParser
parser.add_argument('--file',action='store',dest='file',help="FILE2")
parser.add_argument('--IDs', action='store',dest='ids',help='FILE1')
parser.add_argument('--header', action='store_true',dest='header',help='TRUE or FALSE')
args = parser.parse_args()
file = gzip.open(args.file, 'rb')
idfile = open(args.ids, 'r')
if(args.header):
idfile.next()
id = set([s.rstrip() for s in idfile])
idfile.close()
oname = args.file[:-7] + 'result.txt'
o = open(oname, 'w')
o.write(file.next())
for l in file:
tmp = l.rsplit('\t')
if(tmp[0].rstrip() in ids):
o.write(l)
o.close()
but I get an error, which I don't understand as this script was used on the same file as before and it worked.. not sure what is going on in here... anyone help?
Code:
File "extract.py", line 24, in <module>
for l in file:
File "/usr/lib64/python2.7/gzip.py", line 450, in readline
c = self.read(readsize)
File "/usr/lib64/python2.7/gzip.py", line 256, in read
self._read(readsize)
File "/usr/lib64/python2.7/gzip.py", line 307, in _read
uncompress = self.decompress.decompress(buf)
zlib.error: Error -3 while decompressing: invalid block type
PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below):
,,,,,,,,,,,,,,,,,,,
9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,,
Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
frnds,
I m having prob woth doing some 2-3 task simultaneously...
what I want is...
I have lots ( lacs ) of files in a dir...
I want.. these info from arround 2-3 months files
filename convention is - abc20080403sdas.xyz ( for todays files )
I want
1. total no of files for 1 dec... (1 Reply)
I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... ,... (1 Reply)
Hi,
I have several files that look like this:
File1.txt
Data1
Data2
Data20
File2.txt
Data1
Data5
Data10
File3.txt
Data1
Data2
Data17
File4.txt (6 Replies)
Hi,
I am trying to extract data from multiple output files.
I am able to extract the data from a single output file by
using the following awk commands:
awk '/ test-file*/{print;m=0}' out1.log > out1a.txt
awk '/ test-string/{m=1;c=0}m&&++c==3{print $2 " " $3 " " $4 ;m=0}' out1.log >... (12 Replies)
Hi,
I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to... (5 Replies)
Hi
I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this?
For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
I have 2 files generated in linux that has common output and were produced across multiple hosts with the same setup/configs. These files do some simple reporting on resource allocation and user sessions. So, essentially, say, 10 hosts, with the same (2) system reporting in the files, so a... (0 Replies)
I have a series of csv files in the following format
eg file1
Experiment Name,XYZ_07/28/15,
Specimen Name,Specimen_001,
Tube Name, Control,
Record Date,7/28/2015 14:50,
$OP,XYZYZ,
GUID,abc,
Population,#Events,%Parent
All Events,10500,
P1,10071,95.9
Early Apoptosis,1113,11.1
Late... (6 Replies)
Hi All,
I have log files as below.
log1.txt
<table name="content_analyzer" primary-key="id">
<type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
<type="global" />
</table>
Time taken: 1.008 seconds
ID = gd54321bbvbvbcvb
<table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies
LEARN ABOUT DEBIAN
prophet::test
Prophet::Test(3pm) User Contributed Perl Documentation Prophet::Test(3pm)
set_editor($code)
Sets the subroutine that Prophet should use instead of "Prophet::CLI::Command::edit_text" (as this routine invokes an interactive editor)
to $code.
set_editor_script SCRIPT
Sets the editor that Proc::InvokeEditor uses.
This should be a non-interactive script found in t/scripts.
import_extra($class, $args)
in_gladiator($code)
Run the given code using Devel::Gladiator.
repo_path_for($username)
Returns a path on disk for where $username's replica is stored.
repo_uri_for($username)
Returns a file:// URI for $USERNAME'S replica (with the correct replica type prefix).
replica_uuid
Returns the UUID of the test replica.
database_uuid
Returns the UUID of the test database.
replica_last_rev
Returns the sequence number of the last change in the test replica.
as_user($username, $coderef)
Run this code block as $username. This routine sets up the %ENV hash so that when we go looking for a repository, we get the user's repo.
replica_uuid_for($username)
Returns the UUID of the given user's test replica.
database_uuid_for($username)
Returns the UUID of the given user's test database.
ok_added_revisions( { CODE }, $numbers_of_new_revisions, $msg)
Checks that the given code block adds the given number of changes to the test replica. $msg is optional and will be printed with the test
if given.
serialize_conflict($conflict_obj)
Returns a simple, serialized version of a Prophet::Conflict object suitable for comparison in tests.
The serialized version is a hash reference containing the following keys:
meta => { original_source_uuid => 'source_replica_uuid' }
records => { 'record_uuid' =>
{ change_type => 'type',
props => { propchange_name => { source_old => 'old_val',
source_new => 'new_val',
target_old => 'target_val',
}
}
},
'another_record_uuid' =>
{ change_type => 'type',
props => { propchange_name => { source_old => 'old_val',
source_new => 'new_val',
target_old => 'target_val',
}
}
},
}
serialize_changeset($changeset_obj)
Returns a simple, serialized version of a Prophet::ChangeSet object suitable for comparison in tests (a hash).
run_command($command, @args)
Run the given command with (optionally) the given args using a new Prophet::CLI object. Returns the standard output of that command in
scalar form or, in array context, the STDOUT in scalar form *and* the STDERR in scalar form.
Examples:
run_command('create', '--type=Foo');
load_record($type, $uuid)
Loads and returns a record object for the record with the given type and uuid.
as_alice CODE, as_bob CODE, as_charlie CODE, as_david CODE
Runs CODE as alice, bob, charlie or david.
perl v5.10.1 2009-09-02 Prophet::Test(3pm)