11-23-2011
How to remove a subset of data from a large dataset based on values on one line
Hello. I was wondering if anyone could help. I have a file containing a large table in the format:
marker1 marker2 marker3 marker4
position1 position2 position3 position4
genotype1 genotype2 genotype3 genotype4
with marker being a name, position a numeric measure of distance and genotype also a number.
I need to remove columns based on the values in the "position" line i.e. I need a script to take each position and remove adjacent columns that are within a certain distance of that marker, which is indicated by the value the position line which is a measure of distance. So if the file looked like this
rs1 rs2 rs3 rs4 rs5
1 2 3 4 5
2 3 1 1 2
and I was dealing with rs3 and the distance I wanted to remove was 1, I would want the output:
rs1 rs3 rs5
1 3 5
2 1 2
Does anyone know any way can do this? I appreciate any help and I hope I haven't been too confusing!
10 More Discussions You Might Find Interesting
1. Programming
I have C++ exe file( no source code) and need to run many large dataset under unix, but how to know the memeroy usage for one dataset?http://www.codeproject.com/script/Forums/Images/New.gif
I think "top" is not good and if using the profiler, it seems no free download, any ideas? (1 Reply)
Discussion started by: Danielwang1986
1 Replies
2. Shell Programming and Scripting
Hi guys,
i have a really big file, and i want to remove a specific line.
sed -i '5d' fileThis doesn't really work, it takes a lot of time...
The whole script is supposed to remove every word containing less than 5 characters and currently looks like this:
#!/bin/bash
line="1"... (2 Replies)
Discussion started by: blubbiblubbkekz
2 Replies
3. Shell Programming and Scripting
My input file:
AVI.out <detail>named as the RRM .</detail>
AVI.out <detail>Contains 1 RRM .</detail>
AR0.out <detail>named as the tellurite-resistance.</detail>
AWG.out <detail>Contains 2 HTH .</detail>
ADV.out <detail>named as the DENR family.</detail>
ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies
4. Shell Programming and Scripting
Hi, All
I have a huge file which has 450G. Its tab-delimited format is as below
x1 A 50020 1
x1 B 50021 8
x1 C 50022 9
x1 A 50023 10
x2 D 50024 5
x2 C 50025 7
x2 F 50026 8
x2 N 50027 1
:
:
Now, I want to extract a subset from this file. In this subset, column 1 is x10, column 2 is... (3 Replies)
Discussion started by: cliffyiu
3 Replies
5. Shell Programming and Scripting
Hi Forum.
I was trying to search the following scenario on the forum but was not able to.
Let's say that I have a very large file that has some bad data in it (for ex: 0.0015 in the 12th column) and I would like to find the line number and remove that particular line.
What's the easiest... (3 Replies)
Discussion started by: pchang
3 Replies
6. UNIX for Advanced & Expert Users
Hello
I have a data set which looks like this :
progeny sire dam gender
12 1 3 M
13 2 4 F
14 2 5 F
15 6 5 ... (13 Replies)
Discussion started by: sajmar
13 Replies
7. Shell Programming and Scripting
Hi all,
I have a log file say Test.log that gets updated continuously and it has data in pipe separated format. A sample log file would look like:
<date1>|<data1>|<url1>|<result1>
<date2>|<data2>|<url2>|<result2>
<date3>|<data3>|<url3>|<result3>
<date4>|<data4>|<url4>|<result4>
What I... (3 Replies)
Discussion started by: pat_pramod
3 Replies
8. Shell Programming and Scripting
Dear folks
I have a large data set which contains 400K columns. I decide to select 50K determined columns from the whole 400K columns. Is there any command in unix which could do this process for me? I need to also mention that I store all of the columns id in one file which may help to select... (5 Replies)
Discussion started by: sajmar
5 Replies
9. Shell Programming and Scripting
Hi i have some large data files that contain several fields and rows the data in a field have a numeric value that is in a sine wave pattern what i would like todo is locate each peak and pick the highest value and print that complete line. the data looks something like this it is field nr4 which... (4 Replies)
Discussion started by: ninjaunx
4 Replies
10. Shell Programming and Scripting
I do have a large matrix of the following format and it is tab delimited
ch-ab1-20 ch-bb2-23 ch-ab1-34 ch-ab1-24 er-cc1-45 bv-cc1-78
ch-ab1-20 0 2 3 4 5 6
ch-bb2-23 3 0 5 ... (6 Replies)
Discussion started by: Kanja
6 Replies