Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extracting data from many compressed files Post 302411785 by Boltzmann on Friday 9th of April 2010 12:18:30 PM
Old 04-09-2010
Extracting data from many compressed files

I have a large number (50,000) of pretty large compressed files and I need only certain lines of data from them (each relevant line contains a certain key word). Each file contains 300 such lines. The individual file names are indexed by file number (file_name.1, file_name.2, ... , file_name.50000).

So, I need to uncompress, pull out relevant lines and write to combined_file_name and compress again. I wrote the following bash script to do this:

Code:
i=1
num=50000
while [ $i -le $num ]
  do
  bunzip2 file_name.$i.bz2
  grep key_word file_name.$i >> combined_file_name
  bzip2 file_name.$i
  i=$[$i+1]
done

What I am wondering is whether there is a better/faster way to accomplish this.

Any advice would be much appreciated.

Thank you

---------- Post updated at 12:18 PM ---------- Previous update was at 11:50 AM ----------

While reading another post on here, I just discovered the bzcat command.

I am guessing something like this would be faster:

Code:
i=1
num=50000
while [ $i -le $num ]
  do
  bzcat file_name.$i.bz2 | grep key_word >> combined_file_name
  i=$[$i+1]
done

Is there something even better?

Thanks again in advance.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,... (3 Replies)
Discussion started by: kregh99
3 Replies

2. Shell Programming and Scripting

extracting data from files..

frnds, I m having prob woth doing some 2-3 task simultaneously... what I want is... I have lots ( lacs ) of files in a dir... I want.. these info from arround 2-3 months files filename convention is - abc20080403sdas.xyz ( for todays files ) I want 1. total no of files for 1 dec... (1 Reply)
Discussion started by: clx
1 Replies

3. Shell Programming and Scripting

Ucompress the compressed data

Hi, I have a file that has got compressed data. I would want to uncompress the packed decimal data(not the file). is there a way to do that in ksh? (6 Replies)
Discussion started by: ahmedwaseem2000
6 Replies

4. UNIX for Dummies Questions & Answers

Finding and Extracting uniq data in multiple files

Hi, I have several files that look like this: File1.txt Data1 Data2 Data20 File2.txt Data1 Data5 Data10 File3.txt Data1 Data2 Data17 File4.txt (6 Replies)
Discussion started by: Fahmida
6 Replies

5. Shell Programming and Scripting

awk - extracting data from a series of files

Hi, I am trying to extract data from multiple output files. I am able to extract the data from a single output file by using the following awk commands: awk '/ test-file*/{print;m=0}' out1.log > out1a.txt awk '/ test-string/{m=1;c=0}m&&++c==3{print $2 " " $3 " " $4 ;m=0}' out1.log >... (12 Replies)
Discussion started by: p_sun
12 Replies

6. UNIX for Dummies Questions & Answers

Extracting data from PDF files into CSV file

Hi, I have several hundreds of PDFfiles number 01.pdf, 02.pdf, 03.pdf, etc in one folder. These are vey long documentd with a lot of information (text, tables, figures, etc). I need to extract the information asociated with one disease in particular (Varicella). The information I need to... (5 Replies)
Discussion started by: Xterra
5 Replies

7. Shell Programming and Scripting

Extracting Delimiter 'TAG' Data From log files

Hi I am trying to extract data from within a log file and output format to a new file for further manipulation can someone provide script to do this? For example I have a file as below and just want to extract all delimited variances of tag 32=* up to the delimiter "|" and output to a new file... (2 Replies)
Discussion started by: Buddyluv
2 Replies

8. Programming

Python script for extracting data using two files

Hello, I have two files. File 1 is a list of interested IDs Ex1 Ex2 Ex3File 2 is the original file with over 8000 columns and 20 millions rows and is a compressed file .gz Ex1 xx xx xx xx .... Ex2 xx xx xx xx .... Ex2 xx xx xx xx ....Now I need to extract the information for all the IDs of... (4 Replies)
Discussion started by: nans
4 Replies

9. Shell Programming and Scripting

Extracting data from specific rows and columns from multiple csv files

I have a series of csv files in the following format eg file1 Experiment Name,XYZ_07/28/15, Specimen Name,Specimen_001, Tube Name, Control, Record Date,7/28/2015 14:50, $OP,XYZYZ, GUID,abc, Population,#Events,%Parent All Events,10500, P1,10071,95.9 Early Apoptosis,1113,11.1 Late... (6 Replies)
Discussion started by: pawannoel
6 Replies

10. Shell Programming and Scripting

Extracting part of data from files

Hi All, I have log files as below. log1.txt <table name="content_analyzer" primary-key="id"> <type="global" /> </table> <table name="content_analyzer2" primary-key="id"> <type="global" /> </table> Time taken: 1.008 seconds ID = gd54321bbvbvbcvb <table name="content_analyzer"... (7 Replies)
Discussion started by: ROCK_PLSQL
7 Replies
DH_COMPRESS(1)							     Debhelper							    DH_COMPRESS(1)

NAME
dh_compress - compress files and fix symlinks in package build directories SYNOPSIS
dh_compress [debhelperoptions] [-Xitem] [-A] [file...] DESCRIPTION
dh_compress is a debhelper program that is responsible for compressing the files in package build directories, and makes sure that any symlinks that pointed to the files before they were compressed are updated to point to the new files. By default, dh_compress compresses files that Debian policy mandates should be compressed, namely all files in usr/share/info, usr/share/man, files in usr/share/doc that are larger than 4k in size, (except the copyright file, .html and other web files, image files, and files that appear to be already compressed based on their extensions), and all changelog files. Plus PCF fonts underneath usr/share/fonts/X11/ FILES
debian/package.compress These files are deprecated. If this file exists, the default files are not compressed. Instead, the file is ran as a shell script, and all filenames that the shell script outputs will be compressed. The shell script will be run from inside the package build directory. Note though that using -X is a much better idea in general; you should only use a debian/package.compress file if you really need to. OPTIONS
-Xitem, --exclude=item Exclude files that contain item anywhere in their filename from being compressed. For example, -X.tiff will exclude TIFF files from compression. You may use this option multiple times to build up a list of things to exclude. -A, --all Compress all files specified by command line parameters in ALL packages acted on. file ... Add these files to the list of files to compress. CONFORMS TO
Debian policy, version 3.0 SEE ALSO
debhelper(7) This program is a part of debhelper. AUTHOR
Joey Hess <joeyh@debian.org> 11.1.6ubuntu2 2018-05-10 DH_COMPRESS(1)
All times are GMT -4. The time now is 06:13 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy