Extracting data from many compressed files Post: 302411785

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Extracting data from many compressed files Post 302411785 by Boltzmann on Friday 9th of April 2010 12:18:30 PM

04-09-2010

Registered User

Extracting data from many compressed files

I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... , file_name.50000).

So, I need to uncompress, pull out relevant lines and write to combined_file_name and compress again. I wrote the following bash script to do this:

Code:

i=1
num=50000
while [ $i -le $num ]
  do
  bunzip2 file_name.$i.bz2
  grep key_word file_name.$i >> combined_file_name
  bzip2 file_name.$i
  i=$[$i+1]
done

What I am wondering is whether there is a better/faster way to accomplish this.

Any advice would be much appreciated.

Thank you

---------- Post updated at 12:18 PM ---------- Previous update was at 11:50 AM ----------

While reading another post on here, I just discovered the bzcat command.

I am guessing something like this would be faster:

Code:

i=1
num=50000
while [ $i -le $num ]
  do
  bzcat file_name.$i.bz2 | grep key_word >> combined_file_name
  i=$[$i+1]
done

Is there something even better?

Thanks again in advance.

Boltzmann

View Public Profile for Boltzmann

Find all posts by Boltzmann

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,...

2. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec...

3. Shell Programming and Scripting

Ucompress the compressed data

Hi, I have a file that has got compressed data. I would want to uncompress the packed decimal data(not the file). is there a way to do that in ksh?

4. UNIX for Dummies Questions & Answers

Finding and Extracting uniq data in multiple files

Hi, I have several files that look like this: File1.txt Data1 Data2 Data20 File2.txt Data1 Data5 Data10 File3.txt Data1 Data2 Data17 File4.txt

5. Shell Programming and Scripting

awk - extracting data from a series of files

Hi, I am trying to extract data from multiple output files. I am able to extract the data from a single output file by using the following awk commands: awk '/ test-file*/{print;m=0}' out1.log > out1a.txt awk '/ test-string/{m=1;c=0}m&&++c==3{print $2 " " $3 " " $4 ;m=0}' out1.log >...

6. UNIX for Dummies Questions & Answers

Extracting data from PDF files into CSV file

Hi, I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to...

7. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file...

8. Programming

Python script for extracting data using two files

Hello, I have two files. File 1 is a list of interested IDs Ex1 Ex2 Ex3File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz Ex1 xx xx xx xx .... Ex2 xx xx xx xx .... Ex2 xx xx xx xx ....Now I need to extract the information for all the IDs of...

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late...

10. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"...

LEARN ABOUT CENTOS

dh_compress

DH_COMPRESS(1)							     Debhelper							    DH_COMPRESS(1)

NAME

       dh_compress - compress files and fix symlinks in package build directories

SYNOPSIS

       dh_compress [debhelperoptions] [-Xitem] [-A] [file...]

DESCRIPTION

       dh_compress is a debhelper program that is responsible for compressing the files in package build directories, and makes sure that any
       symlinks that pointed to the files before they were compressed are updated to point to the new files.

       By default, dh_compress compresses files that Debian policy mandates should be compressed, namely all files in usr/share/info,
       usr/share/man, files in usr/share/doc that are larger than 4k in size, (except the copyright file, .html and other web files, image files,
       and files that appear to be already compressed based on their extensions), and all changelog files. Plus PCF fonts underneath
       usr/share/fonts/X11/

FILES

       debian/package.compress
	   These files are deprecated.

	   If this file exists, the default files are not compressed. Instead, the file is ran as a shell script, and all filenames that the shell
	   script outputs will be compressed. The shell script will be run from inside the package build directory. Note though that using -X is a
	   much better idea in general; you should only use a debian/package.compress file if you really need to.

OPTIONS

       -Xitem, --exclude=item
	   Exclude files that contain item anywhere in their filename from being compressed. For example, -X.tiff will exclude TIFF files from
	   compression.  You may use this option multiple times to build up a list of things to exclude.

       -A, --all
	   Compress all files specified by command line parameters in ALL packages acted on.

       file ...
	   Add these files to the list of files to compress.

CONFORMS TO

       Debian policy, version 3.0

SEE ALSO

       debhelper(7)

       This program is a part of debhelper.

AUTHOR

       Joey Hess <joeyh@debian.org>

11.1.6ubuntu2							    2018-05-10							    DH_COMPRESS(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

Discussion started by: kregh99

2. Shell Programming and Scripting

extracting data from files..

Discussion started by: clx

3. Shell Programming and Scripting

Ucompress the compressed data

Discussion started by: ahmedwaseem2000

4. UNIX for Dummies Questions & Answers

Finding and Extracting uniq data in multiple files

Discussion started by: Fahmida