Sponsored Content
Top Forums Shell Programming and Scripting Getting unique based on clusters Post 302844148 by Diya123 on Friday 16th of August 2013 04:51:13 PM
Old 08-16-2013
Getting unique based on clusters

Hi,

I have a file with 25 clusters and each cluster has multiple rows. I need to find the unique genes in each cluster and assign them

Code:
Annotation Cluster 2	Enrichment Score: 10.199579524507685											
Category	Term	Count	%	PValue	Genes	List Total	Pop Hits	Pop Total	Fold Enrichment	Bonferroni	Benjamini	FDR
GOTERM_CC_FAT	GO:0031012~extracellular matrix	38	9.268292683	1.42E-14	WNT5B, LTBP2, TNC, SPOCK1, POSTN, MMP3, MMP2, MMP1, TGFB2, OGN, LAMB4, SMOC2, CD44, CRISPLD2, TGFBI, COL6A3, COL6A2, COL6A1, LOX, THBS1, SPON2, TFPI2, COL11A1, FN1, COL18A1, COL4A1, LGALS3, FBN1, CCDC80, MGP, NID1, SPARC, COL5A1, PRELP, ADAMTS7, THSD4, VCAN, COL1A1	307	345	12782	4.585903791	3.92E-12	9.81E-13	1.88E-11
SP_PIR_KEYWORDS	extracellular matrix	30	7.317073171	2.91E-14	WNT5B, TNC, SPOCK1, POSTN, MMP3, MMP2, MMP1, SMOC2, LAMB4, OGN, TGFBI, COL6A3, COL6A2, COL6A1, SPON2, COL11A1, FN1, SPP1, COL18A1, COL4A1, FBN1, CCDC80, NID1, SPARC, COL5A1, PRELP, ADAMTS7, THSD4, VCAN, COL1A1	403	241	19235	5.941435087	1.16E-11	3.87E-12	4.06E-11
GOTERM_CC_FAT	GO:0005578~proteinaceous extracellular matrix	34	8.292682927	1.26E-12	WNT5B, LTBP2, TNC, SPOCK1, POSTN, MMP3, MMP2, MMP1, LAMB4, SMOC2, OGN, TGFBI, COL6A3, COL6A2, COL6A1, LOX, SPON2, TFPI2, COL11A1, FN1, COL18A1, COL4A1, LGALS3, FBN1, CCDC80, MGP, NID1, SPARC, COL5A1, PRELP, ADAMTS7, THSD4, VCAN, COL1A1	307	320	12782	4.423737785	3.48E-10	6.97E-11	1.67E-09
GOTERM_CC_FAT	GO:0044420~extracellular matrix part	16	3.902439024	1.21E-07	COL18A1, COL4A1, TNC, FBN1, CCDC80, NID1, SPARC, COL5A1, SMOC2, LAMB4, COL6A3, COL6A1, LOX, COL1A1, COL11A1, FN1	307	117	12782	5.693699713	3.34E-05	5.56E-06	1.60E-04
GOTERM_CC_FAT	GO:0005604~basement membrane	11	2.682926829	1.59E-05	COL18A1, SMOC2, LAMB4, COL4A1, TNC, FBN1, CCDC80, NID1, SPARC, COL5A1, FN1	307	78	12782	5.871627829	0.004382322	5.49E-04	0.020991649
												
Annotation Cluster 3	Enrichment Score: 8.477392514797849											
Category	Term	Count	%	PValue	Genes	List Total	Pop Hits	Pop Total	Fold Enrichment	Bonferroni	Benjamini	FDR
GOTERM_BP_FAT	GO:0001568~blood vessel development	27	6.585365854	1.05E-10	TNFRSF12A, PGF, FOXO1, ANPEP, MMP2, TGFB2, CD44, HMOX1, IL1B, LOX, THBS1, KLF5, COL18A1, FLT1, IL8, EPAS1, MYO1E, TGFBR2, APOLD1, ITGA4, ARHGAP24, COL5A1, CDH13, FOXC2, FOXC1, COL1A1, PLAU	316	245	13528	4.717850685	2.10E-07	1.05E-07	1.80E-07
GOTERM_BP_FAT	GO:0001944~vasculature development	27	6.585365854	1.79E-10	TNFRSF12A, PGF, FOXO1, ANPEP, MMP2, TGFB2, CD44, HMOX1, IL1B, LOX, THBS1, KLF5, COL18A1, FLT1, IL8, EPAS1, MYO1E, TGFBR2, APOLD1, ITGA4, ARHGAP24, COL5A1, CDH13, FOXC2, FOXC1, COL1A1, PLAU	316	251	13528	4.605073377	3.58E-07	1.19E-07	3.07E-07
GOTERM_BP_FAT	GO:0001525~angiogenesis	18	4.390243902	6.10E-08	KLF5, COL18A1, FLT1, IL8, EPAS1, TNFRSF12A, PGF, TGFBR2, APOLD1, ANPEP, ARHGAP24, TGFB2, CDH13, HMOX1, IL1B, FOXC2, THBS1, PLAU	316	148	13528	5.206637017	1.22E-04	8.70E-06	1.05E-04
GOTERM_BP_FAT	GO:0048514~blood vessel morphogenesis	21	5.12195122	1.08E-07	KLF5, COL18A1, FLT1, EPAS1, IL8, TNFRSF12A, PGF, MYO1E, TGFBR2, APOLD1, ANPEP, ITGA4, ARHGAP24, TGFB2, CDH13, HMOX1, IL1B, FOXC2, FOXC1, THBS1, PLAU	316	211	13528	4.260723499	2.15E-04	1.27E-05	1.85E-04

My output shouls be

Annotation cluster1 ABC,DEF,XYZ,MNO

Can I do this in awk?

Thanks
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk : extracting unique lines based on columns

Hi, snp.txt CHR_A SNP_A BP_A_st BP_A_End CHR_B BP_B SNP_B R2 p-SNP_A p-SNP_B 5 rs1988728 74904317 74904318 5 74960646 rs1427924 0.377333 0.000740085 0.013930081 5 ... (12 Replies)
Discussion started by: genehunter
12 Replies

2. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ... (6 Replies)
Discussion started by: miclow
6 Replies

3. Shell Programming and Scripting

Converting a list to a row to create clusters based on numerical identity

Hello. I have a long list of data which has the following structure The number shows the unique identity of the word. And all homophones are clustered with the same number ID. An example will make this clear The awk script I have allows conversion of a list to row but on condition that each... (4 Replies)
Discussion started by: gimley
4 Replies

4. Shell Programming and Scripting

Calculate difference in timestamps based on unique column value

Hi Friends, Require a quick help to write the difference between 2 timestamps based on a unique column value: Input file: 08/23/2012 12:36:09,JOB_5340,08/23/2012 12:36:14,JOB_5340 08/23/2012 12:36:22,JOB_5350,08/23/2012 12:36:26,JOB_5350 08/23/2012 13:08:51,JOB_5360,08/23/2012... (4 Replies)
Discussion started by: asnandhakumar
4 Replies

5. Shell Programming and Scripting

Find unique lines based off of bytes

Hello All, I have two VERY large .csv files that I want to compare values based on substrings. If the lines are unique, then print the line. For example, if I run a diff file1.csv and file2.csv I get results similar to +_id34,brown,car,2006 +_id1,blue,train,1985... (5 Replies)
Discussion started by: jl487
5 Replies

6. UNIX for Dummies Questions & Answers

Sorting and saving values based on unique entries

Hi all, I wanted to save the values of a file that contains unique entries based on a specific column (column 4). my sample file looks like the following: input file: 200006-07file.txt 145 35 10 3 147 35 12 4 146 36 11 3 145 34 12 5 143 31 15 4 146 30 14 5 desired output files:... (5 Replies)
Discussion started by: ida1215
5 Replies

7. Shell Programming and Scripting

Unique entries based on a range of numbers.

Hi, I have a matrix like this: Algorithm predicted_gene start_point end_point A x 65 85 B x 70 80 C x 75 85 D x 10 20 B y 125 130 C y 120 140 D y 200 210 Here there are four tab-separated columns. The first column is the used algorithm for prediction, and there are 4 of them A-D.... (8 Replies)
Discussion started by: flyfisherman
8 Replies

8. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

9. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies
iconv_mac_cyr(5)					Standards, Environments, and Macros					  iconv_mac_cyr(5)

NAME
iconv_mac_cyr - code set conversion tables for Macintosh Cyrillic DESCRIPTION
The following code set conversions are supported: +---------------------------------------------------------------------+ | Code Set Conversions Supported | +--------------+--------+--------------+--------+---------------------+ | Code |Symbol |Target Code |Symbol | Target | +--------------+--------+--------------+--------+---------------------+ |Output | | | | | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |ISO 8859-5 |iso5 | ISO 8859-5 Cyrillic | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |KOI8-R |koi8 | KOI8-R | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |PC Cyrillic |alt | Alternative PC | +--------------+--------+--------------+--------+---------------------+ |Cyrillic | | | | | +--------------+--------+--------------+--------+---------------------+ |Mac Cyrillic |mac |MS 1251 |win5 | Windows Cyrillic | +--------------+--------+--------------+--------+---------------------+ CONVERSIONS
The conversions are performed according to the following tables. All values in the tables are given in octal. Mac Cyrillic to ISO 8859-5 For the conversion of Mac Cyrillic to ISO 8859-5, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | Conversions Performed | | Mac Cyrillic ISO 8859-5 Mac Cyrillic ISO 8859-5 | |24 4 276 252 | |200 260 277 372 | |201 261 300 370 | |202 262 301 245 | |203 263 302-311 40 | |204 264 312 240 | |205 265 313 242 | |206 266 314 362 | |207 267 315 254 | |210 270 316 374 | |211 271 317 365 | |212 272 320-327 40 | |213 273 330 256 | |214 274 331 376 | |215 275 332 257 | |216 276 333 377 | |217 277 334 360 | |220 300 335 241 | |221 301 336 361 | |222 302 337 357 | |223 303 340 320 | |224 304 341 321 | |225 305 342 322 | |226 306 343 323 | |227 307 344 324 | |230 310 345 325 | |231 311 346 326 | |232 312 347 327 | |233 313 350 330 | |234 314 351 331 | |235 315 352 332 | |236 316 353 333 | |237 317 354 334 | |240-246 40 355 335 | |247 246 356 336 | |250-252 40 357 337 | |253 242 360 340 | |254 362 361 341 | |255 40 362 342 | |256 243 363 343 | |257 363 364 344 | |260-263 40 365 345 | |264 366 366 346 | |265-266 40 367 347 | |267 250 370 350 | |270 244 371 351 | |271 364 372 352 | |272 247 373 353 | |273 367 374 354 | |274 251 375 355 | |275 371 376 356 | |375 370 | +-----------------------------------------------------------------+ Mac Cyrillic to KOI8-R For the conversion of Mac Cyrillic to KOI8-R, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | Mac Cyrillic | KOI8-R | Mac Cyrillic | KOI8-R | |24 | 4 |276 |272 | |200 | 341 |277 |252 | |201 | 342 |300 |250 | |202 | 367 |301 |265 | |203 | 347 |302-311 |40 | |204 | 344 |312 |240 | |205 | 345 |313 |261 | |206 | 366 |314 |241 | |207 | 372 |315 |274 | |210 | 351 |316 |254 | |211 | 352 |317 |245 | |212 | 353 |320-327 |40 | |213 | 354 |330 |276 | |214 | 355 |331 |256 | |215 | 356 |332 |277 | |216 | 357 |333 |257 | |217 | 360 |334 |260 | |220 | 362 |335 |263 | |221 | 363 |336 |243 | |222 | 364 |337 |321 | |223 | 365 |340 |301 | |224 | 346 |341 |302 | |225 | 350 |342 |327 | |226 | 343 |343 |307 | |227 | 376 |344 |304 | |230 | 373 |345 |305 | |231 | 375 |346 |326 | |232 | 377 |347 |332 | |233 | 371 |350 |311 | |234 | 370 |351 |312 | |235 | 374 |352 |313 | |236 | 340 |353 |314 | |237 | 361 |354 |315 | |240-246 | 40 |355 |316 | |247 | 266 |356 |317 | |250-252 | 40 |357 |320 | |253 | 261 |360 |322 | |254 | 241 |361 |323 | |255 | 40 |362 |324 | |256 | 262 |363 |325 | |257 | 242 |364 |306 | |260-263 | 40 |365 |310 | |264 | 246 |366 |303 | |265-266 | 40 |367 |336 | |267 | 270 |370 |333 | |270 | 264 |371 |335 | |271 | 244 |372 |337 | |272 | 267 |373 |331 | |273 | 247 |374 |330 | |274 | 271 |375 |334 | |275 | 251 |376 |300 | |375 | 370 | | | +---------------+----------------+----------------+---------------+ Mac Cyrillic to PC Cyrillic For the conversion of Mac Cyrillic to PC Cyrillic, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | Mac Cyrillic | PC Cyrillic | Mac Cyrillic | PC Cyrillic | |24 | 4 |355 |255 | |240-334 | 40 |356 |256 | |335 | 360 |357 |257 | |336 | 361 |360 |340 | |337 | 357 |361 |341 | |340 | 240 |362 |342 | |341 | 241 |363 |343 | |342 | 242 |364 |344 | |343 | 243 |365 |345 | |344 | 244 |366 |346 | |345 | 245 |367 |347 | |346 | 246 |370 |350 | |347 | 247 |371 |351 | |350 | 250 |372 |352 | |351 | 251 |373 |353 | |352 | 252 |374 |354 | |353 | 253 |375 |355 | |354 | 254 |376 |356 | |303 | 366 | | | +---------------+----------------+----------------+---------------+ Mac Cyrillic to MS 1251 For the conversion of Mac Cyrillic to MS 1251, all characters not in the following table are mapped unchanged. +-----------------------------------------------------------------+ | | Conversions|Performed | | | Mac Cyrillic | MS 1251 | Mac Cyrillic | MS 1251 | |24 | 4 |255 |40 | |200 | 300 |256 |201 | |201 | 301 |257 |203 | |202 | 302 |260-263 |40 | |203 | 303 |264 |263 | |204 | 304 |266 |264 | |205 | 305 |267 |243 | |206 | 306 |270 |252 | |207 | 307 |271 |272 | |210 | 310 |272 |257 | |211 | 311 |273 |277 | |212 | 312 |274 |212 | |213 | 313 |275 |232 | |214 | 314 |276 |214 | |215 | 315 |277 |234 | |216 | 316 |300 |274 | |217 | 317 |301 |275 | |220 | 320 |302 |254 | |221 | 321 |303-306 |40 | |222 | 322 |307 |253 | |223 | 323 |310 |273 | |224 | 324 |311 |205 | |225 | 325 |312 |240 | |226 | 326 |313 |200 | |227 | 327 |314 |220 | |230 | 330 |315 |215 | |231 | 331 |316 |235 | |232 | 332 |317 |276 | |233 | 333 |320 |226 | |234 | 334 |321 |227 | |235 | 335 |322 |223 | |236 | 336 |323 |224 | |237 | 337 |324 |221 | |240 | 206 |325 |222 | |241 | 260 |326 |40 | |242 | 245 |327 |204 | |243 | 40 |330 |241 | |244 | 247 |331 |242 | |245 | 267 |332 |217 | |246 | 266 |333 |237 | |247 | 262 |334 |271 | |250 | 256 |335 |250 | |252 | 231 |336 |270 | |253 | 200 |337 |377 | |254 | 220 |362 |324 | +---------------+----------------+----------------+---------------+ FILES
/usr/lib/iconv/*.so conversion modules /usr/lib/iconv/*.t conversion tables /usr/lib/iconv/iconv_data list of conversions supported by conversion tables SEE ALSO
iconv(1), iconv(3C), iconv(5) SunOS 5.10 18 Apr 1997 iconv_mac_cyr(5)
All times are GMT -4. The time now is 02:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy