Python script for extracting data using two files


 
Thread Tools Search this Thread
Top Forums Programming Python script for extracting data using two files
# 1  
Old 10-28-2014
Python script for extracting data using two files

Hello,
I have two files.
File 1 is a list of interested IDs
Code:
Ex1
Ex2
Ex3

File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz
Code:
Ex1 xx xx xx xx ....
Ex2 xx xx xx xx ....
Ex2 xx xx xx xx ....

Now I need to extract the information for all the IDs of interest from File 1. I have a script that should do that
Code:
import argparse
import gzip
if __name__ == '__main__':
    parser = argparse.ArgumentParser
    parser.add_argument('--file',action='store',dest='file',help="FILE2")
    parser.add_argument('--IDs', action='store',dest='ids',help='FILE1')
    parser.add_argument('--header', action='store_true',dest='header',help='TRUE or FALSE') 
    args = parser.parse_args()
    
    file = gzip.open(args.file, 'rb')
    idfile = open(args.ids, 'r')
    if(args.header):
        idfile.next()
    id = set([s.rstrip() for s in idfile])
    idfile.close()
    oname = args.file[:-7] + 'result.txt' 
    o = open(oname, 'w')
    o.write(file.next())
    for l in file:
        tmp = l.rsplit('\t')
        if(tmp[0].rstrip() in ids):
            o.write(l)
    o.close()

but I get an error, which I don't understand as this script was used on the same file as before and it worked.. not sure what is going on in here... anyone help?

Code:
File "extract.py", line 24, in <module>
    for l in file:
  File "/usr/lib64/python2.7/gzip.py", line 450, in readline
    c = self.read(readsize)
  File "/usr/lib64/python2.7/gzip.py", line 256, in read
    self._read(readsize)
  File "/usr/lib64/python2.7/gzip.py", line 307, in _read
    uncompress = self.decompress.decompress(buf)
zlib.error: Error -3 while decompressing: invalid block type

# 2  
Old 10-28-2014
Is it possible that your gzipped file is corrupt?
# 3  
Old 10-28-2014
I don't think so as this worked before but is there any way I could find out if the file is corrupt ?
# 4  
Old 10-28-2014
Try to gunzip it from the command line.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 10-28-2014
yes, you were correct. I tried to gunzip it and it gave an error. The problem has been sorted out now. Thank you.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies

2. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

3. Shell Programming and Scripting

Bash script with python slicing on multiple data files

I have 2 files generated in linux that has common output and were produced across multiple hosts with the same setup/configs. These files do some simple reporting on resource allocation and user sessions. So, essentially, say, 10 hosts, with the same (2) system reporting in the files, so a... (0 Replies)
Discussion started by: jdubbz
0 Replies

4. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
Discussion started by: Buddyluv
2 Replies

5. UNIX for Dummies Questions & Answers

Extracting data from PDF files into CSV file

Hi, I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to... (5 Replies)
Discussion started by: Xterra
5 Replies

6. Shell Programming and Scripting

awk - extracting data from a series of files

Hi, I am trying to extract data from multiple output files. I am able to extract the data from a single output file by using the following awk commands: awk '/ test-file*/{print;m=0}' out1.log > out1a.txt awk '/ test-string/{m=1;c=0}m&&++c==3{print $2 " " $3 " " $4 ;m=0}' out1.log >... (12 Replies)
Discussion started by: p_sun
12 Replies

7. UNIX for Dummies Questions & Answers

Finding and Extracting uniq data in multiple files

Hi, I have several files that look like this: File1.txt Data1 Data2 Data20 File2.txt Data1 Data5 Data10 File3.txt Data1 Data2 Data17 File4.txt (6 Replies)
Discussion started by: Fahmida
6 Replies

8. UNIX for Dummies Questions & Answers

Extracting data from many compressed files

I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... ,... (1 Reply)
Discussion started by: Boltzmann
1 Replies

9. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies

10. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
Discussion started by: kregh99
3 Replies
Login or Register to Ask a Question