How to extract subset file from dataset?


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users How to extract subset file from dataset?
# 8  
Old 09-04-2013
The output file was not included in my instructions, for the reason that it would be empty. It doesn't use it.

Check for the files 'M' and 'F' in the same directory, they will not be empty.
# 9  
Old 09-04-2013
Quote:
Originally Posted by Corona688
The output file was not included in my instructions, for the reason that it would be empty. It doesn't use it.

Check for the files 'M' and 'F' in the same directory, they will not be empty.
When I run the program I had M, F file but there is just one line.
What I have in my data set is more lines than the example. I have 2600 lines which contains M and F which are genders. What I want is how to separate 2 files from the data set in 2 file that have separate gender M and gender F.
# 10  
Old 09-04-2013
That is what my example does, yes. It writes to different file names depending on what the value of the fourth column is.

If the fourth column isn't what you showed it to be in your example data, it won't do what I expect. Check the contents of your folder with 'ls', it may have made weird names.

Could you show a more complete example of your input data please?
# 11  
Old 09-04-2013
Quote:
Originally Posted by Corona688
That is what my example does, yes. It writes to different file names depending on what the value of the fourth column is.

If the fourth column isn't what you showed it to be in your example data, it won't do what I expect. Check the contents of your folder with 'ls', it may have made weird names.

Could you show a more complete example of your input data please?
you can find my data set which I want to subset base on gender M and F in 2 separate file.
# 12  
Old 09-04-2013
The data you posted clearly shows M/F in the fifth column, not the fourth.

Also, the data you posted has no header row, which your original data did. I can simplify my code a lot knowing it's not there.
Code:
awk '{ print > $5 }' inputfile

# 13  
Old 09-08-2013
This is really bad, but seems to work.
Making the assumption that M or F will only appear once on each line
and separated by white space.

Code:
while read line
	do
	    if [[ $line == *M* ]]; then  
	    echo "$line"
	    ## cat to file	
	    fi
	    if [[ $line == *F* ]]; then
	    echo "$line"
	    ## cat to file
	    fi
	done < file


Last edited by briandanielz; 09-08-2013 at 06:25 AM..
# 14  
Old 09-10-2013
The solution works

---------- Post updated at 11:57 AM ---------- Previous update was at 11:52 AM ----------

Code:
grep M aa.txt > M
grep F aa.txt > F

This will get you what you need
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to filter file using another working on smaller subset

In the below awk if I use the attached file as the input, I get no results for TCF4. However, if I just copy that line from the attached file and use that as input I get results for TCF4. Basically the gene file is a 1 column list that is used to filter $8 of the attached file. When there is a... (9 Replies)
Discussion started by: cmccabe
9 Replies

2. Shell Programming and Scripting

Creating subset of a file based on specific columns

Hello Unix experts, I need a help to create a subset file. I know with cut comand, its very easy to select many different columns, or threshold. But here I have a bit problem as in my data file is big. And I don't want to identify the column numbers or names manually. I am trying to find any... (7 Replies)
Discussion started by: smitra
7 Replies

3. UNIX for Dummies Questions & Answers

Random selection of subset of sample from file

Hello Could you please help me to find a code that can randomly select 1224 lines from a file of 12240 and make tn output with 1224 line each. my input is txt file with 12240 lines like : 13474 999003507 0 0 2 -9 13475 999003508 0 0 2 -9 13476 999003509 0 0 1 -9 13477 999003510 0 0 1 -9 ... (7 Replies)
Discussion started by: biopsy
7 Replies

4. UNIX for Dummies Questions & Answers

Swapping the columns of a text file for a subset of rows

Hi, I'd like to swap the columns 1 and 2 of a space-delimited text file but only for the first 1000 rows. How do I go about doing that? Thanks! (1 Reply)
Discussion started by: evelibertine
1 Replies

5. UNIX for Dummies Questions & Answers

how to get a subset of such a file

Dear all, I have a file lik below: n of row=420, n of letters in each row=100000 like below: there is no space between the letters. what I want is: the 75000th letter to the 85000th letter in each row. how to do that? thanks a lot! ... (2 Replies)
Discussion started by: forevertl
2 Replies

6. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

7. Solaris

flarecreate for zfs root dataset and ignore multiple dataset

Hi All, I want to write a script to create flar images on multiple servers. In non zfs filesystem I am using -X option to refer a file to exclude mounts on different servers. but on ZFS -X option is not working. I want multiple mounts to be ignore on ZFS base system during flarecreate. I... (0 Replies)
Discussion started by: uxravi
0 Replies

8. Shell Programming and Scripting

Count the number of words in some subset of file and disregard others

Hi All, I have some 6000 text files in a directory. My files are named like 1.txt, 2.txt 3.txt and so on until 6000.txt. I want to count the "number of words" in only first 3000 of them. Any suggestions? I know wc -w can count the number of words in a text file. I am using Red Hat Linux. (3 Replies)
Discussion started by: shoaibjameel123
3 Replies

9. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies

10. UNIX for Dummies Questions & Answers

Total file size of a subset list

Hello! I'm trying to find out the total file size of a subset list in a directory. For example, I do not need to know the total file size of all the files in a directory, but I need to know what the total size is of say, "ls -l *FEB08*" in a directory. Is there any easy way of doing this? ... (3 Replies)
Discussion started by: tekster757
3 Replies
Login or Register to Ask a Question