Sponsored Content
Top Forums Shell Programming and Scripting Getting unique based on clusters Post 302844148 by Diya123 on Friday 16th of August 2013 04:51:13 PM
Old 08-16-2013
Getting unique based on clusters

Hi,

I have a file with 25 clusters and each cluster has multiple rows. I need to find the unique genes in each cluster and assign them

Code:
Annotation Cluster 2	Enrichment Score: 10.199579524507685											
Category	Term	Count	%	PValue	Genes	List Total	Pop Hits	Pop Total	Fold Enrichment	Bonferroni	Benjamini	FDR
GOTERM_CC_FAT	GO:0031012~extracellular matrix	38	9.268292683	1.42E-14	WNT5B, LTBP2, TNC, SPOCK1, POSTN, MMP3, MMP2, MMP1, TGFB2, OGN, LAMB4, SMOC2, CD44, CRISPLD2, TGFBI, COL6A3, COL6A2, COL6A1, LOX, THBS1, SPON2, TFPI2, COL11A1, FN1, COL18A1, COL4A1, LGALS3, FBN1, CCDC80, MGP, NID1, SPARC, COL5A1, PRELP, ADAMTS7, THSD4, VCAN, COL1A1	307	345	12782	4.585903791	3.92E-12	9.81E-13	1.88E-11
SP_PIR_KEYWORDS	extracellular matrix	30	7.317073171	2.91E-14	WNT5B, TNC, SPOCK1, POSTN, MMP3, MMP2, MMP1, SMOC2, LAMB4, OGN, TGFBI, COL6A3, COL6A2, COL6A1, SPON2, COL11A1, FN1, SPP1, COL18A1, COL4A1, FBN1, CCDC80, NID1, SPARC, COL5A1, PRELP, ADAMTS7, THSD4, VCAN, COL1A1	403	241	19235	5.941435087	1.16E-11	3.87E-12	4.06E-11
GOTERM_CC_FAT	GO:0005578~proteinaceous extracellular matrix	34	8.292682927	1.26E-12	WNT5B, LTBP2, TNC, SPOCK1, POSTN, MMP3, MMP2, MMP1, LAMB4, SMOC2, OGN, TGFBI, COL6A3, COL6A2, COL6A1, LOX, SPON2, TFPI2, COL11A1, FN1, COL18A1, COL4A1, LGALS3, FBN1, CCDC80, MGP, NID1, SPARC, COL5A1, PRELP, ADAMTS7, THSD4, VCAN, COL1A1	307	320	12782	4.423737785	3.48E-10	6.97E-11	1.67E-09
GOTERM_CC_FAT	GO:0044420~extracellular matrix part	16	3.902439024	1.21E-07	COL18A1, COL4A1, TNC, FBN1, CCDC80, NID1, SPARC, COL5A1, SMOC2, LAMB4, COL6A3, COL6A1, LOX, COL1A1, COL11A1, FN1	307	117	12782	5.693699713	3.34E-05	5.56E-06	1.60E-04
GOTERM_CC_FAT	GO:0005604~basement membrane	11	2.682926829	1.59E-05	COL18A1, SMOC2, LAMB4, COL4A1, TNC, FBN1, CCDC80, NID1, SPARC, COL5A1, FN1	307	78	12782	5.871627829	0.004382322	5.49E-04	0.020991649
												
Annotation Cluster 3	Enrichment Score: 8.477392514797849											
Category	Term	Count	%	PValue	Genes	List Total	Pop Hits	Pop Total	Fold Enrichment	Bonferroni	Benjamini	FDR
GOTERM_BP_FAT	GO:0001568~blood vessel development	27	6.585365854	1.05E-10	TNFRSF12A, PGF, FOXO1, ANPEP, MMP2, TGFB2, CD44, HMOX1, IL1B, LOX, THBS1, KLF5, COL18A1, FLT1, IL8, EPAS1, MYO1E, TGFBR2, APOLD1, ITGA4, ARHGAP24, COL5A1, CDH13, FOXC2, FOXC1, COL1A1, PLAU	316	245	13528	4.717850685	2.10E-07	1.05E-07	1.80E-07
GOTERM_BP_FAT	GO:0001944~vasculature development	27	6.585365854	1.79E-10	TNFRSF12A, PGF, FOXO1, ANPEP, MMP2, TGFB2, CD44, HMOX1, IL1B, LOX, THBS1, KLF5, COL18A1, FLT1, IL8, EPAS1, MYO1E, TGFBR2, APOLD1, ITGA4, ARHGAP24, COL5A1, CDH13, FOXC2, FOXC1, COL1A1, PLAU	316	251	13528	4.605073377	3.58E-07	1.19E-07	3.07E-07
GOTERM_BP_FAT	GO:0001525~angiogenesis	18	4.390243902	6.10E-08	KLF5, COL18A1, FLT1, IL8, EPAS1, TNFRSF12A, PGF, TGFBR2, APOLD1, ANPEP, ARHGAP24, TGFB2, CDH13, HMOX1, IL1B, FOXC2, THBS1, PLAU	316	148	13528	5.206637017	1.22E-04	8.70E-06	1.05E-04
GOTERM_BP_FAT	GO:0048514~blood vessel morphogenesis	21	5.12195122	1.08E-07	KLF5, COL18A1, FLT1, EPAS1, IL8, TNFRSF12A, PGF, MYO1E, TGFBR2, APOLD1, ANPEP, ITGA4, ARHGAP24, TGFB2, CDH13, HMOX1, IL1B, FOXC2, FOXC1, THBS1, PLAU	316	211	13528	4.260723499	2.15E-04	1.27E-05	1.85E-04

My output shouls be

Annotation cluster1 ABC,DEF,XYZ,MNO

Can I do this in awk?

Thanks
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk : extracting unique lines based on columns

Hi, snp.txt CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B 5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081 5 ... (12 Replies)
Discussion started by: genehunter
12 Replies

2. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ... (6 Replies)
Discussion started by: miclow
6 Replies

3. Shell Programming and Scripting

Converting a list to a row to create clusters based on numerical identity

Hello. I have a long list of data which has the following structure The number shows the unique identity of the word. And all homophones are clustered with the same number ID. An example will make this clear The awk script I have allows conversion of a list to row but on condition that each... (4 Replies)
Discussion started by: gimley
4 Replies

4. Shell Programming and Scripting

Calculate difference in timestamps based on unique column value

Hi Friends, Require a quick help to write the difference between 2 timestamps based on a unique column value: Input file: 08/23/2012 12:36:09,JOB_5340,08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350,08/23/2012 12:36:26,JOB_5350 08/23/2012 13:08:51,JOB_5360,08/23/2012... (4 Replies)
Discussion started by: asnandhakumar
4 Replies

5. Shell Programming and Scripting

Find unique lines based off of bytes

Hello All, I have two VERY large .csv files that I want to compare values based on substrings. If the lines are unique, then print the line. For example, if I run a diff file1.csv and file2.csv I get results similar to +_id34,brown,car,2006 +_id1,blue,train,1985... (5 Replies)
Discussion started by: jl487
5 Replies

6. UNIX for Dummies Questions & Answers

Sorting and saving values based on unique entries

Hi all, I wanted to save the values of a file that contains unique entries based on a specific column (column 4). my sample file looks like the following: input file: 200006-07file.txt 145 35 10 3 147 35 12 4 146 36 11 3 145 34 12 5 143 31 15 4 146 30 14 5 desired output files:... (5 Replies)
Discussion started by: ida1215
5 Replies

7. Shell Programming and Scripting

Unique entries based on a range of numbers.

Hi, I have a matrix like this: Algorithm predicted_gene start_point end_point A x 65 85 B x 70 80 C x 75 85 D x 10 20 B y 125 130 C y 120 140 D y 200 210 Here there are four tab-separated columns. The first column is the used algorithm for prediction, and there are 4 of them A-D.... (8 Replies)
Discussion started by: flyfisherman
8 Replies

8. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

9. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies
iconv_koi8-r(5) 					Standards, Environments, and Macros					   iconv_koi8-r(5)

NAME
iconv_koi8-r - code set conversion tables for KOI8-R DESCRIPTION
The following code set conversions are supported: +-------------------------------------------------------------------------+ | Code Set Conversions Supported | +--------------+--------+--------------+--------+-------------------------+ | Code |Symbol |Target Code |Symbol | Target Output | +--------------+--------+--------------+--------+-------------------------+ |KOI8-R |koi8 |ISO 8859-5 |iso5 | ISO 8859-5 Cyrillic | +--------------+--------+--------------+--------+-------------------------+ |KOI8-R |koi8 |PC Cyrillic |alt | Alternative PC Cyrillic | +--------------+--------+--------------+--------+-------------------------+ |KOI8-R |koi8 |MS 1251 |win5 | Windows Cyrillic | +--------------+--------+--------------+--------+-------------------------+ |KOI8-R |koi8 |Mac Cyrillic |mac | Macintosh Cyrillic | +--------------+--------+--------------+--------+-------------------------+ CONVERSIONS
The conversions are performed according to the following tables. All values in the tables are given in octal. KOI8-R to ISO 8859-5 For the conversion of KOI8-R to ISO 8859-5, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | KOI8-R | ISO 8859-5 | KOI8-R | ISO 8859-5 | |24 | 4 |320 |337 | |241 | 362 |321 |357 | |242 | 363 |322 |340 | |243 | 361 |323 |341 | |244 | 364 |324 |342 | |245 | 365 |325 |343 | |246 | 366 |327 |322 | |247 | 367 |330 |354 | |250 | 370 |331 |353 | |251 | 371 |332 |327 | |252 | 372 |333 |350 | |253 | 373 |334 |355 | |254 | 374 |335 |351 | |256 | 376 |336 |347 | |257 | 377 |337 |352 | |260 | 360 |340 |316 | |261 | 242 |341 |260 | |262 | 243 |342 |261 | |263 | 241 |343 |306 | |264 | 244 |344 |264 | |265 | 245 |345 |265 | |266 | 246 |346 |304 | |267 | 247 |347 |263 | |270 | 250 |350 |305 | |271 | 251 |351 |270 | |272 | 252 |352 |271 | |273 | 253 |353 |272 | |274 | 254 |354 |273 | |275 | 255 |355 |274 | |276 | 256 |356 |275 | |277 | 257 |357 |276 | |300 | 356 |360 |277 | |301 | 320 |361 |317 | |302 | 321 |362 |300 | |303 | 346 |363 |301 | |304 | 324 |364 |302 | |305 | 325 |365 |303 | |306 | 344 |366 |266 | |307 | 323 |367 |262 | |310 | 345 |370 |314 | |311 | 330 |371 |313 | |312 | 331 |372 |267 | |313 | 332 |373 |310 | |314 | 333 |374 |315 | |315 | 334 |375 |311 | |316 | 335 |376 |307 | |317 | 336 | | | +---------------+----------------+----------------+---------------+ KOI8-R to PC Cyrillic For the conversion of KOI8-R to PC Cyrillic, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | KOI8-R | PC Cyrillic | KOI8-R | PC Cyrillic | |24 | 4 |333 |350 | |200-242 | 40 |334 |355 | |243 | 361 |335 |351 | |244-254 | 40 |336 |347 | |255 | 260 |337 |352 | |256-262 | 40 |340 |236 | |263 | 360 |341 |200 | |264-274 | 40 |342 |201 | |275 | 260 |343 |226 | |276-277 | 40 |344 |204 | |300 | 356 |345 |205 | |301 | 240 |346 |224 | |302 | 241 |347 |203 | |303 | 346 |350 |225 | |304 | 244 |351 |210 | |305 | 245 |352 |211 | |306 | 344 |353 |212 | |307 | 243 |354 |213 | |310 | 345 |355 |214 | |311 | 250 |356 |215 | |312 | 251 |357 |216 | |313 | 252 |360 |217 | |314 | 253 |361 |237 | |315 | 254 |362 |220 | |316 | 255 |363 |221 | |317 | 256 |364 |222 | |320 | 257 |365 |223 | |321 | 357 |366 |206 | |322 | 340 |367 |202 | |323 | 341 |370 |234 | |324 | 342 |371 |233 | |325 | 343 |372 |207 | |326 | 246 |373 |230 | |327 | 242 |374 |235 | |330 | 354 |375 |231 | |331 | 353 |376 |227 | |332 | 247 | | | +---------------+----------------+----------------+---------------+ KOI8-R to MS 1251 For the conversion of KOI8-R to MS 1251, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | KOI8-R | MS 1251 | KOI8-R | MS 1251 | |24 | 4 |317 |356 | |200-237 | 40 |320 |357 | |241 | 220 |321 |377 | |242 | 203 |322 |360 | |243 | 270 |323 |361 | |244 | 272 |324 |362 | |245 | 276 |325 |363 | |246 | 263 |326 |346 | |247 | 277 |327 |342 | |250 | 274 |330 |374 | |251 | 232 |331 |373 | |252 | 234 |332 |347 | |253 | 236 |333 |370 | |254 | 235 |334 |375 | |255 | 210 |335 |371 | |256 | 242 |336 |367 | |257 | 237 |337 |372 | |260 | 271 |340 |336 | |261 | 200 |341 |300 | |262 | 201 |342 |301 | |263 | 250 |343 |326 | |264 | 252 |344 |304 | |265 | 275 |345 |305 | |266 | 262 |346 |324 | |267 | 257 |347 |303 | |270 | 243 |350 |325 | |271 | 212 |351 |310 | |272 | 214 |352 |311 | |273 | 216 |353 |312 | |274 | 215 |354 |313 | |275 | 210 |355 |314 | |276 | 241 |356 |315 | |277 | 217 |357 |316 | |300 | 376 |360 |317 | |301 | 340 |361 |337 | |302 | 341 |362 |320 | |303 | 366 |363 |321 | |304 | 344 |364 |322 | |305 | 345 |365 |323 | |306 | 364 |366 |306 | |307 | 343 |367 |302 | |310 | 365 |370 |334 | |311 | 350 |371 |333 | |312 | 351 |372 |307 | |313 | 352 |373 |330 | |314 | 353 |374 |335 | |315 | 354 |375 |331 | |316 | 355 |376 |327 | |376 | 227 | | | +---------------+----------------+----------------+---------------+ KOI8-R to Mac Cyrillic For the conversion of KOI8-R to Mac Cyrillic, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | KOI8-R | Mac Cyrillic | KOI8-R | Mac Cyrillic | |24 | 4 |317 |356 | |200-237 | 40 |320 |357 | |240 | 312 |321 |337 | |241 | 254 |322 |360 | |242 | 257 |323 |361 | |243 | 336 |324 |362 | |244 | 271 |325 |363 | |245 | 317 |326 |346 | |246 | 264 |327 |342 | |247 | 273 |330 |374 | |250 | 300 |331 |373 | |251 | 275 |332 |347 | |252 | 277 |333 |370 | |253 | 40 |334 |375 | |254 | 316 |335 |371 | |255 | 40 |336 |367 | |256 | 331 |337 |372 | |257 | 333 |340 |236 | |260 | 334 |341 |200 | |261 | 253 |342 |201 | |262 | 256 |343 |226 | |263 | 335 |344 |204 | |264 | 270 |345 |205 | |265 | 301 |346 |224 | |266 | 247 |347 |203 | |267 | 272 |350 |225 | |270 | 267 |351 |210 | |271 | 274 |352 |211 | |272 | 276 |353 |212 | |273 | 40 |354 |213 | |274 | 315 |355 |214 | |275 | 40 |356 |215 | |276 | 330 |357 |216 | |277 | 332 |360 |217 | |300 | 376 |361 |237 | |301 | 340 |362 |220 | |302 | 341 |363 |221 | |303 | 366 |364 |222 | |304 | 344 |365 |223 | |305 | 345 |366 |206 | |306 | 364 |367 |202 | |307 | 343 |370 |234 | |310 | 365 |371 |233 | |311 | 350 |372 |207 | |312 | 351 |373 |230 | |313 | 352 |374 |235 | |314 | 353 |375 |231 | |315 | 354 |376 |227 | |316 | 355 | | | +---------------+----------------+----------------+---------------+ FILES
/usr/lib/iconv/*.so conversion modules /usr/lib/iconv/*.t conversion tables /usr/lib/iconv/iconv_data list of conversions supported by conversion tables SEE ALSO
iconv(1), iconv(3C), iconv(5) SunOS 5.10 18 Apr 1997 iconv_koi8-r(5)
All times are GMT -4. The time now is 12:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy