Sponsored Content
Top Forums UNIX for Advanced & Expert Users How to extract subset file from dataset? Post 302850253 by Corona688 on Wednesday 4th of September 2013 10:59:12 AM
Old 09-04-2013
Code:
awk 'NR==1 { print > "M" ; print > "F"; next }
{ print > $4 }' inputfile

This User Gave Thanks to Corona688 For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Total file size of a subset list

Hello! I'm trying to find out the total file size of a subset list in a directory. For example, I do not need to know the total file size of all the files in a directory, but I need to know what the total size is of say, "ls -l *FEB08*" in a directory. Is there any easy way of doing this? ... (3 Replies)
Discussion started by: tekster757
3 Replies

2. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

3. Shell Programming and Scripting

Count the number of words in some subset of file and disregard others

Hi All, I have some 6000 text files in a directory. My files are named like 1.txt, 2.txt 3.txt and so on until 6000.txt. I want to count the "number of words" in only first 3000 of them. Any suggestions? I know wc -w can count the number of words in a text file. I am using Red Hat Linux. (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

4. Solaris

flarecreate for zfs root dataset and ignore multiple dataset

Hi All, I want to write a script to create flar images on multiple servers. In non zfs filesystem I am using -X option to refer a file to exclude mounts on different servers. but on ZFS -X option is not working. I want multiple mounts to be ignore on ZFS base system during flarecreate. I... (0 Replies)
Discussion started by: uxravi
0 Replies

5. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

6. UNIX for Dummies Questions & Answers

how to get a subset of such a file

Dear all, I have a file lik below: n of row=420, n of letters in each row=100000 like below: there is no space between the letters. what I want is: the 75000th letter to the 85000th letter in each row. how to do that? thanks a lot! ... (2 Replies)
Discussion started by: forevertl
2 Replies

7. UNIX for Dummies Questions & Answers

Swapping the columns of a text file for a subset of rows

Hi, I'd like to swap the columns 1 and 2 of a space-delimited text file but only for the first 1000 rows. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

8. UNIX for Dummies Questions & Answers

Random selection of subset of sample from file

Hello Could you please help me to find a code that can randomly select 1224 lines from a file of 12240 and make tn output with 1224 line each. my input is txt file with 12240 lines like : 13474 999003507 0 0 2 -9 13475 999003508 0 0 2 -9 13476 999003509 0 0 1 -9 13477 999003510 0 0 1 -9 ... (7 Replies)
Discussion started by: biopsy
7 Replies

9. Shell Programming and Scripting

Creating subset of a file based on specific columns

Hello Unix experts, I need a help to create a subset file. I know with cut comand, its very easy to select many different columns, or threshold. But here I have a bit problem as in my data file is big. And I don't want to identify the column numbers or names manually. I am trying to find any... (7 Replies)
Discussion started by: smitra
7 Replies

10. Shell Programming and Scripting

awk to filter file using another working on smaller subset

In the below awk if I use the attached file as the input, I get no results for TCF4. However, if I just copy that line from the attached file and use that as input I get results for TCF4. Basically the gene file is a 1 column list that is used to filter $8 of the attached file. When there is a... (9 Replies)
Discussion started by: cmccabe
9 Replies
bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
All times are GMT -4. The time now is 09:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy