Size Selecting rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Size Selecting rows
# 1  
Old 08-14-2018
Size Selecting rows

I have a rather convoluted script that I use to extract information from CSV files:


Code:
sed '1d' PeakTable.txt | awk -F ',' '!/Ladder/{ if ( $4 > 430 && $4 < 490 && $5 > 45 ) print $2, $5; else print $2, 0 }' | awk '{a[$1]+=$2}END{for(i in a){print i, a[i]}}' | sed 's/\([A-z]\)\([1-9]\) /\10\2 /' | sort | awk '{if ($2 > 10) print $1, $2, 800 / $2; else print $0, 0}' | awk '{print "PCR_Plate", NR, $3, 1, "Deadpool" }' | sed '1 i\Source Well_Source Volume Destination_Well Destination' | sed 's/ /,/g' > output.txt

Here is the breakdown:

Code:
sed '1d'                                                                                             # delete first line
awk -F ',' '!/Ladder/{ if ( $4 > 430 && $4 < 490 && $5 > 45 ) print $2, $5; else print $2, 0 }'      # Exclude rows including "Ladder" and selecting the ones with values between 430-490 in clumn 4; and greather than 45 in column 5. 
awk '{a[$1]+=$2}END{for(i in a){print i, a[i]}}'                                                     # collapsing rows with identical values in column 1
sed 's/\([A-z]\)\([1-9]\) /\10\2 /' | sort                                                           # adding 0 to single digits and sorting
awk '{if ($2 > 10) print $1, $2, 800 / $2; else print $0, 0}'                                        # if column 2 > 10, divide 800 by value in column 2 and add it to the 3rd column. Otherwise print the line and add a 0 in 3rd column
awk '{print "PCR_Plate", NR, $3, 1, "Deadpool" }'                                                    # add constants to columns 1, 2 (consecutive numbers), value in column 3, 1, Deadpool 
sed '1 i\Source Well_Source Volume Destination_Well Destination'                                     # add headers
sed 's/ /,/g'                                                                                        # replace blank spaces with commas -CSV format

I would like to have one well structured awk script handling the entire process. Any help will be greatly appreciated.

PS. I am attaching both the infile and outfiles.

Last edited by RudiC; 08-15-2018 at 06:00 PM.. Reason: making comments more readable.
# 2  
Old 08-15-2018
Unfortunately I'm not at my own *nix system but on a - hrrrrgh - windows machine, so I can't test what I'm proposing here. It might still be a pointer in a desireable direction. Read and apply to taste:
Code:
awk -F, '
NR == 1    {print "Source,Well_Source,Volume,Destination_Well,Destination"
            OFS = ","
            next
           }
 
!/Ladder/  {A[sprintf ("%s%02d", substr ($2, 1, 1), substr ($2, 2))] += ( $4 > 430 && $4 < 490 && $5 > 45 ) ? $5 : 0
           }
END        {for (a in A) print "PCR_Plate", ++CNT, A[a] > 10)?800/A[a]:0, 1, "Deadpool"
           }
 ' peaktable.txt

and sort to taste as my awk doesn't offer a sort function.
I'm not sure I caught all subtleties of your request, please come back with any questions that might arise.




EDIT: Well, back at my desk, able to test, and I modified above to satisfy the request:
Code:
sort -t, -k2.1,2.1 -k2.2,2n /tmp/PeakTable.txt |
awk -F, '
NR == 1         {print "Source,Well_Source,Volume,Destination_Well,Destination"
                 OFS = ","
                 next
                }

/Ladder/        {next
                }

LAST2 && 
$2 != LAST2     {print  "PCR_Plate", ++CNT, (SUM > 10)?800/SUM:0, 1, "Deadpool"
                 SUM = 0
                }

                {SUM   += ( $4 > 430 && $4 < 490 && $5 > 45 ) ? $5 : 0
                 LAST2  = $2
                }

 ' > /tmp/output.txt

yields exactly the output desired. No END section needed as the header is sorted last and prints the H12 entry...

Last edited by RudiC; 08-15-2018 at 07:08 PM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 08-18-2018
Rudi
Thanks a TON! One more favor, could you please explain me how
LAST2 &&
$2 != LAST2


works in your script?


Appreciate it!
# 4  
Old 08-18-2018
It means
IF LAST2 has a value which means we're at least in input line 2 (remember variables are unset thus "empty" thus FALSE at the beginning of the script; minor risk that $2 will be 0 or empty doesn't exist with your data)

AND $2 differs from LAST2 which means we have a new $2 value,
THEN execute the print and reset SUM.

Last edited by RudiC; 08-18-2018 at 10:29 AM..
This User Gave Thanks to RudiC For This Post:
# 5  
Old 08-18-2018
Thanks !
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with shell script: selecting rows that have the same values in two columns

Hello, everyone I am beginner for shell programming. I want to print all lines that have the same values in first two columns data: a b 1 2 a a 3 4 b b 5 6 a b 4 6 what I expected is : a a 3 4 b b 5 6 but I searched for one hour in... (2 Replies)
Discussion started by: nengcheng
2 Replies

2. UNIX for Dummies Questions & Answers

How to separate a single column file into files of the same size (i.e. number of rows)?

I have a text file with 1,000,000 rows (It is a single column text file of numbers). I would like to separate the text file into 100 files of equal size (i.e. number of rows). The first file will contain the first 10,000 rows, the second row will contain the second 10,000 rows (rows 10,001-20,000)... (2 Replies)
Discussion started by: evelibertine
2 Replies

3. Shell Programming and Scripting

Selecting rows from a pipe delimited file based on condition

HI all, I have a simple challenge for you.. I have the following pipe delimited file 2345|98|1809||x|969|0 2345|98|0809||y|0|537 2345|97|9809||x|544|0 2345|97|0909||y|0|651 9685|98|7809||x|321|0 9685|98|7909||y|0|357 9685|98|7809||x|687|0 9685|98|0809||y|0|234 2315|98|0809||x|564|0 ... (2 Replies)
Discussion started by: nithins007
2 Replies

4. UNIX for Dummies Questions & Answers

Removing duplicate rows & selecting only latest date

Gurus, From a file I need to remove duplicate rows based on the first column data but also we need to consider a date column where we need to keep the latest date (13th column). Ex: Input File: Output File: I know how to take out the duplicates but I couldn't figure out... (5 Replies)
Discussion started by: shash
5 Replies

5. Shell Programming and Scripting

rows to columns - different size

hi buddies; i want to convert lines to tabs with a different sized. this is my text: 192.14.2.1 Sector=1 height 2500 Sector=3 height 2500 Sector=2 height 2500 PredefRbsScannerGpeh=1 fileLocation /c/pm_data/ SectorAntenna=3,AntennaBranch=A fqBandHighEdge 21550... (3 Replies)
Discussion started by: gc_sw
3 Replies

6. Shell Programming and Scripting

Selecting rows based on values in columns

Hi My pipe delimited .txt file contains rows with 10 columns. Can anyone advise how I output to file only those rows with the letters ‘ci' as the first 2 characters in the 3rd column ? Many thanks (4 Replies)
Discussion started by: malts18
4 Replies

7. Shell Programming and Scripting

Deleting specific rows in large files having rows greater than 100000

Hi Guys, I need help in modifying a large text file containing more than 1-2 lakh rows of data using unix commands. I am quite new to the unix language the text file contains data in a pipe delimited format sdfsdfs sdfsdfsd START_ROW sdfsd|sdfsdfsd|sdfsdfasdf|sdfsadf|sdfasdf... (9 Replies)
Discussion started by: manish2009
9 Replies

8. UNIX for Dummies Questions & Answers

Help selecting some rows with awk

Hi there, I have a text file with several colums separated by "|;#" I need to search the file extracting all columns starting with the value of "1" or "2" saving in a separate file just the first 7 columns of each row maching the criteria, with replacement of the saparators in the nearly created... (2 Replies)
Discussion started by: capnino
2 Replies

9. Shell Programming and Scripting

Selecting rows with a specific criteria

Hi, I want a UNIX command that can filter out rows with certain criteria. The file is tab deliminated. Row one is just a value. Basically what I want to do is select based on the name and character at the end (o). So lets lets say i want a row that has WashU and (o) then it would print... (2 Replies)
Discussion started by: phil_heath
2 Replies

10. Programming

selecting rows with specific IDs for downstream analysis

Hi, I'm working hard on SQL and I came across a hurdle I'm hoping you can help me out with. I have two tables table1 headers: chrom start end name score strand 11 9720685 9720721 U0 0 + 21 9721043 9721079 U0 0 - 1 9721093 9721129 U0 0 + 20 ... (2 Replies)
Discussion started by: labrazil
2 Replies
Login or Register to Ask a Question