Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extracting data from many compressed files Post 302411785 by Boltzmann on Friday 9th of April 2010 12:18:30 PM
Old 04-09-2010
Extracting data from many compressed files

I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... , file_name.50000).

So, I need to uncompress, pull out relevant lines and write to combined_file_name and compress again. I wrote the following bash script to do this:

Code:
i=1
num=50000
while [ $i -le $num ]
  do
  bunzip2 file_name.$i.bz2
  grep key_word file_name.$i >> combined_file_name
  bzip2 file_name.$i
  i=$[$i+1]
done

What I am wondering is whether there is a better/faster way to accomplish this.

Any advice would be much appreciated.

Thank you

---------- Post updated at 12:18 PM ---------- Previous update was at 11:50 AM ----------

While reading another post on here, I just discovered the bzcat command.

I am guessing something like this would be faster:

Code:
i=1
num=50000
while [ $i -le $num ]
  do
  bzcat file_name.$i.bz2 | grep key_word >> combined_file_name
  i=$[$i+1]
done

Is there something even better?

Thanks again in advance.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
Discussion started by: kregh99
3 Replies

2. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies

3. Shell Programming and Scripting

Ucompress the compressed data

Hi, I have a file that has got compressed data. I would want to uncompress the packed decimal data(not the file). is there a way to do that in ksh? (6 Replies)
Discussion started by: ahmedwaseem2000
6 Replies

4. UNIX for Dummies Questions & Answers

Finding and Extracting uniq data in multiple files

Hi, I have several files that look like this: File1.txt Data1 Data2 Data20 File2.txt Data1 Data5 Data10 File3.txt Data1 Data2 Data17 File4.txt (6 Replies)
Discussion started by: Fahmida
6 Replies

5. Shell Programming and Scripting

awk - extracting data from a series of files

Hi, I am trying to extract data from multiple output files. I am able to extract the data from a single output file by using the following awk commands: awk '/ test-file*/{print;m=0}' out1.log > out1a.txt awk '/ test-string/{m=1;c=0}m&&++c==3{print $2 " " $3 " " $4 ;m=0}' out1.log >... (12 Replies)
Discussion started by: p_sun
12 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from PDF files into CSV file

Hi, I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to... (5 Replies)
Discussion started by: Xterra
5 Replies

7. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
Discussion started by: Buddyluv
2 Replies

8. Programming

Python script for extracting data using two files

Hello, I have two files. File 1 is a list of interested IDs Ex1 Ex2 Ex3File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz Ex1 xx xx xx xx .... Ex2 xx xx xx xx .... Ex2 xx xx xx xx ....Now I need to extract the information for all the IDs of... (4 Replies)
Discussion started by: nans
4 Replies

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

10. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies
MBK_OUT_FILTER(1)					     MBK ENVIRONMENT VARIABLES						 MBK_OUT_FILTER(1)

NAME
MBK_OUT_FILTER - define the input filter ORIGIN
This software belongs to the ALLIANCE CAD SYSTEM developed by the ASIM team at LIP6 laboratory of Universite Pierre et Marie CURIE, in Paris, France. Web : http://asim.lip6.fr/recherche/alliance/ E-mail : alliance-users@asim.lip6.fr DESCRIPTION
MBK_OUT_FILTER sets the output filter for writting compressed Alliance files. Filter is typically a string containing filename and options. This filter must read non compressed data flow on it standard input and write compressed data flow on it standard output. If a non com- pressed version of a file exist in the same target directory the designer want the save a file's compressed version, to ensure that file will be read later and not the non compressed one, the non compressed file is DELETED. To activate filters, variable MBK_FILTER_SFX must be set. EXAMPLE
Writing compressed files with gzip : setenv MBK_OUT_FILTER "/asim/gnu/bin/gzip -c" setenv MBK_FILTER_SFX ".gz" SEE ALSO
mbk(3), MBK_FILTER_SFX(1), MBK_IN_FILTER(1), mbkenv(1). BUG REPORT
This tool is under development at the ASIM department of the LIP6 laboratory. We need your feedback to improve documentation and tools. ASIM
/LIP6 October 1, 1999 MBK_OUT_FILTER(1)
All times are GMT -4. The time now is 09:36 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy