How to subset data?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to subset data?
# 1  
Old 04-04-2013
Question How to subset data?

Hi. I have a large data file. the first column has unique identifiers. I have approximately 5 of these files and they have varying number of columns in their rows. I need to extract ~300 of the rows in to a separate file. I'm not looking for something that would do all 5 files at once, but just something that would let me do each file separately would be great. A plus is if I can get these rows transposed in the output file. Any assistance would be greatly appreciated.

Thanks
# 2  
Old 04-04-2013
Extract which rows?

Transpose what into what?

We can probably do what you want, once we know what that is.
# 3  
Old 04-04-2013
Please give us a representative sample of your input, show us how to choose the lines you want to extract, and explain the transpositions that you want to appear in your output file(s).
# 4  
Old 04-04-2013
I have a list of the unique identifiers I want to extract.

Then I was hoping to just transpose the entire dataset for the output file. Something similar to copy/transpose in excel. I do have a script that lets me do the transpose. So, I could do it separately, but of course if I could do it all at once it is more convenient.

Thanks

---------- Post updated at 10:14 AM ---------- Previous update was at 10:10 AM ----------

large dataset:

unique_identifier_1 entry1 entry2 entry3.....entryn
unique_identifier_2 entry1 entry2 entry3.....entryn
unique_identifier_3 entry1 entry2 entry3.....entryn
unique_identifier_4 entry1 entry2 entry3.....entryn
unique_identifier_n entry1 entry2 entry3.....entryn

I have a list of the unique identifiers I would like:

List:
unique_identifier_1
unique_identifier_3

output_desired
unique_identiefier_1 unique_identifier_3
entry1 entry1
entry2 entry2
entry3 entry3
entryn entryn

Last edited by kadm; 04-04-2013 at 03:18 PM.. Reason: mistyped and had two entry3s instead of entry 2 and entry 3
# 5  
Old 04-04-2013
If you already have a transpose that works for you, I'll let you use it. Extracting the identifiers you want is straightforward enough:

Code:
awk 'NR==FNR { A[$1]++ ; next } $1 in A' listfile datafile > subsetfile

# 6  
Old 04-04-2013
Thanks so much.
# 7  
Old 04-04-2013
I'm getting the following error when I try the awk command suggested:

-bash-4.1$ awk 'NR==FNR {a[$1}++;next} $1 in a' missing.txt local.txt > missing_subset.txt

Error Code:
awk: NR==FNR {A[$1}++ ; next} $1 in A
awk: ^ syntax error
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cannot subset ranges from another range set

Ca21chr2_C_albicans_SC5314 2159343 2228327 Ca21chr2_C_albicans_SC5314 636587 638608 Ca21chr2_C_albicans_SC5314 5286 50509 Ca21chr2_C_albicans_SC5314 634021 636276 Ca21chr2_C_albicans_SC5314 1886545 1900975 Ca21chr2_C_albicans_SC5314 610758 613544... (9 Replies)
Discussion started by: cryptodice
9 Replies

2. Shell Programming and Scripting

How to check if file2 is a subset of file1?

In-order to check and print if file2 is a subset of file one i do the below. var1=$(cat //tmp/file1 | sort -u | wc) var2=$(cat /tmp/file2 /tmp/file1 | sort -u | wc) if ; then echo "file2 is a subset of file1 becoz var1 and var2 have the same values." fi However, i get the following error ... (1 Reply)
Discussion started by: mohtashims
1 Replies

3. Shell Programming and Scripting

How-to check if file1 a subset of file2 ?

I need to know if file1 is a subset of file2 i.e all the contents of file1 are present in file2 or not. Here is how i would do it. Read line by line file1 and grep every line in file2 in a for loop. any failing grep would means that it is not a subset. Is there a quicker or easier way... (3 Replies)
Discussion started by: mohtashims
3 Replies

4. Shell Programming and Scripting

Parsing a subset of data from a large matrix

I do have a large matrix of the following format and it is tab delimited ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78 ch-ab1-20 0 2 3 4 5 6 ch-bb2-23 3 0 5 ... (6 Replies)
Discussion started by: Kanja
6 Replies

5. Shell Programming and Scripting

Detecting subset of a word

Each line of the file has some words exactly same letters as of the first one. But has zero or more "_+" inserted. I am interested in those words and remove the other cases. Example: abcde abcd_+e abcd_+de fghig fghigi fghi_+g klmn klmn I want to get this: abcde abcd_+e fghig fghi_+g ... (7 Replies)
Discussion started by: Viernes
7 Replies

6. Shell Programming and Scripting

Creating subset of compilation errors

I am compiling a fortran program using gfortran and the result looks as below I want to write a bash or awk script that will scan the information and output only problems within a range of line numbers Example: If I specify the file createmodl.f08, start line 1000 and end line 1100, I will... (8 Replies)
Discussion started by: kristinu
8 Replies

7. UNIX for Dummies Questions & Answers

how to get a subset of such a file

Dear all, I have a file lik below: n of row=420, n of letters in each row=100000 like below: there is no space between the letters. what I want is: the 75000th letter to the 85000th letter in each row. how to do that? thanks a lot! ... (2 Replies)
Discussion started by: forevertl
2 Replies

8. Shell Programming and Scripting

How to remove a subset of data from a large dataset based on values on one line

Hello. I was wondering if anyone could help. I have a file containing a large table in the format: marker1 marker2 marker3 marker4 position1 position2 position3 position4 genotype1 genotype2 genotype3 genotype4 with marker being a name, position a numeric... (2 Replies)
Discussion started by: davegen
2 Replies

9. UNIX for Dummies Questions & Answers

Help with subset and if-then statements

Hello everyone. I'm new to the boards, I hope I can get and possibly give some help through these forums. I need some help. I have two CSV files, let's call them File A and File B. This is the structure for File A: ID, VAR1, VAR2, VAR3 - VAR50 (where the VAR 1-VAR50 are either 0 or 1) ... (1 Reply)
Discussion started by: JWill
1 Replies

10. Shell Programming and Scripting

How to extract a subset from a huge dataset

Hi, All I have a huge file which has 450G. Its tab-delimited format is as below x1 A 50020 1 x1 B 50021 8 x1 C 50022 9 x1 A 50023 10 x2 D 50024 5 x2 C 50025 7 x2 F 50026 8 x2 N 50027 1 : : Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies
Login or Register to Ask a Question