Sponsored Content
Top Forums Shell Programming and Scripting How to extract a subset from a huge dataset Post 302403613 by cliffyiu on Saturday 13th of March 2010 11:06:43 AM
Old 03-13-2010
How to extract a subset from a huge dataset

Hi, All

I have a huge file which has 450G. Its tab-delimited format is as below

Code:
x1 A 50020 1
x1 B 50021 8
x1 C 50022 9
x1 A 50023 10
x2 D 50024 5
x2 C 50025 7
x2 F 50026 8
x2 N 50027 1
:
:

Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is from 600000 to 30000000. I wrote the following perl script but it doesn't work:

Code:
#!/usr/bin/perl

$file1 = $ARGV[0]; # Input file
$file2 = $ARGV[1]; # Output file

open (IN, $file1);
while ($line = <IN>)
{
  chomp($line);
  @array = split(/\t/,$line);

  if ($array[0] eq 'x10')
  {
    if (($array[2] >= 600000) && ($array[2] <= 26279795))
    {
      open (OUT, ">>$file2");
      print OUT "$line\n";
      close OUT;
    }
  }
}
close IN;
exit;

I guess the input file and output file are both too big that my script can't handle it.

Anyone knows if there is any good way to do it? Perl or Shell scripts are preferred..

All your help will be appreciated!

Last edited by Franklin52; 03-13-2010 at 01:47 PM.. Reason: Please indent your code and use code tags!!
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Accessing Mainframe Dataset

Hi May I know is there a way to read/copy a mainframe (IBM OS/390) dataset (sequential file) into a UNIX directory? Thank you for your time. IcyGuava (4 Replies)
Discussion started by: IcyGuava
4 Replies

2. Shell Programming and Scripting

How to extract data from a huge file?

Hi, I have a huge file of bibliographic records in some standard format.I need a script to do some repeatable task as follows: 1. Needs to create folders as the strings starts with "item_*" from the input file 2. Create a file "contents" in each folders having "license.txt(tab... (5 Replies)
Discussion started by: srsahu75
5 Replies

3. Shell Programming and Scripting

How to extract a piece of information from a huge file

Hello All, I need some assistance to extract a piece of information from a huge file. The file is like this one : database information ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc ccccccccccccccccc os information cccccccccccccccccc cccccccccccccccccc... (2 Replies)
Discussion started by: Marcor
2 Replies

4. Shell Programming and Scripting

Normalize a dataset with AWK

Hello everyone, i have to normalize this dataset (with 20.000 rows): 2,4,4,3,2,7,8,2,9,11,7,7,1,8,5,6 4,7,5,5,5,5,9,6,4,8,7,9,2,9,7,10 7,10,8,7,4,8,8,5,10,11,2,8,2,5,5,10 4,9,5,7,4,7,7,13,1,7,6,8,3,8,0,8,8 6,7,8,5,4,7,6,3,7,10,7,9,3,8,3,7,8 in this form:... (1 Reply)
Discussion started by: [raven]
1 Replies

5. Programming

Dataset Library for C?

I am looking for an opensource dataset library for C. Something equivalent to ADO.Net. Specifically, I am looking for the following features: 1. Create a Dataset from a file (XML or CSV). 2. Create a Dataset from a select query using an ODBC connection. 3. Load a created Dataset into a... (1 Reply)
Discussion started by: a_programmer
1 Replies

6. Solaris

flarecreate for zfs root dataset and ignore multiple dataset

Hi All, I want to write a script to create flar images on multiple servers. In non zfs filesystem I am using -X option to refer a file to exclude mounts on different servers. but on ZFS -X option is not working. I want multiple mounts to be ignore on ZFS base system during flarecreate. I... (0 Replies)
Discussion started by: uxravi
0 Replies

7. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

8. UNIX for Advanced & Expert Users

How to extract subset file from dataset?

Hello I have a data set which looks like this : progeny sire dam gender 12 1 3 M 13 2 4 F 14 2 5 F 15 6 5 ... (13 Replies)
Discussion started by: sajmar
13 Replies

9. Shell Programming and Scripting

Extract few content from a huge list of files

I have a huge list of files (about 300,000) which have a pattern like this. .I 1 .U 87049087 .S Am J Emerg .M Allied Health Personnel/*; Electric Countershock/*; .T Refibrillation managed by EMT-Ds: .P ARTICLE. .W Some patients converted from ventricular fibrillation to organized... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

10. UNIX for Advanced & Expert Users

SAS dataset to CSV

Hi Guys, Is there a way to export a sas file i.e .sas7bdat file to .csv file with header and data using unix. I dont want to use SAS program instead using unix tool or unix scripting is it possible ? (25 Replies)
Discussion started by: Master_Mind
25 Replies
tabs(1) 						      General Commands Manual							   tabs(1)

NAME
tabs - set tabs on a terminal SYNOPSIS
tabs [-v[n]] [-ahuUV] file... DESCRIPTION
The tabs program clears and sets tab-stops on the terminal. This uses the terminfo clear_all_tabs and set_tab capabilities. If either is absent, tabs is unable to clear/set tab-stops. The terminal should be configured to use hard tabs, e.g., stty tab0 OPTIONS
General Options -Tname Tell tabs which terminal type to use. If this option is not given, tabs will use the $TERM environment variable. If that is not set, it will use the ansi+tabs entry. -d The debugging option shows a ruler line, followed by two data lines. The first data line shows the expected tab-stops marked with asterisks. The second data line shows the actual tab-stops, marked with asterisks. -n This option tells tabs to check the options and run any debugging option, but not to modify the terminal settings. The tabs program processes a single list of tab stops. The last option to be processed which defines a list is the one that determines the list to be processed. Implicit Lists Use a single number as an option, e.g., "-5" to set tabs at the given interval (in this case 1, 6, 11, 16, 21, etc.). Tabs are repeated up to the right margin of the screen. Explicit Lists An explicit list can be defined after the options (this does not use a "-"). The values in the list must be in increasing numeric order, and greater than zero. They are separated by a comma or a blank, for example, tabs 1,6,11,16,21 tabs 1 6 11 16 21 Use a '+' to treat a number as an increment relative to the previous value, e.g., tabs 1,+5,+5,+5,+5 which is equivalent to the 1,6,11,16,21 example. Predefined Tab-Stops X/Open defines several predefined lists of tab stops. -a Assembler, IBM S/370, first format -a2 Assembler, IBM S/370, second format -c COBOL, normal format -c2 COBOL compact format -c3 COBOL compact format extended -f FORTRAN -p PL/I -s SNOBOL -u UNIVAC 1100 Assembler PORTABILITY
X/Open describes a +m option, to set a terminal's left-margin. None of the entries in the terminal database provide this capability. The -d (debug) and -n (no-op) options are extensions not provided by other implementations. Documentation for other implementations states that there is a limit on the number of tab stops. While some terminals may not accept an arbitrary number of tab stops, this implementation will attempt to set tab stops up to the right margin of the screen, if the given list happens to be that long. SEE ALSO
tset(1), infocmp(1), ncurses(3NCURSES), terminfo(5). This describes ncurses version 5.7 (patch 20100109). tabs(1)
All times are GMT -4. The time now is 02:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy