Data parsing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Data parsing
# 1  
Old 09-29-2011
Data parsing

Hi,
I do have a data file which is divided into compartments by ---------. I would like to extract (parse) some of the data and numbers either using awk or sed
The file has the format:
Code:
[CGT]CATGC[AT]

Best GO enrichment: 

Genes/ORF that have the motifs (genes are sorted by max(pa+pd+po)):

A1_1100
  Expression: 103

  Position	 pa	 pd	 po
      -313	5.3	0.0	0.0	tgctttggctGCATGCAttttttatat
      -286	5.3	0.0	0.0	atgcaaaattGCATGCAacgcatgcag
      -277	5.3	0.0	0.0	tgcatgcaacGCATGCAgattttacat
      -314	5.3	0.0	0.0	ttgctttggcTGCATGCattttttata
      -287	5.3	0.0	0.0	tatgcaaaatTGCATGCaacgcatgca

A1_3110
  Expression: 103

  Position	 pa	 pd	 po
      -382	5.3	0.0	0.0	aataataattGCATGCAtgcaattttt
      -104	5.3	0.0	0.0	taccgtcactGCATGCAttacgtgttt
      -383	5.3	0.0	0.0	gaataataatTGCATGCatgcaatttt
      -105	5.3	0.0	0.0	ataccgtcacTGCATGCattacgtgtt

A1_1690
  Expression: 44

  Position	 pa	 pd	 po
      -274	4.8	0.0	0.0	ttaactagttGCATGCAtgaaagaaag
      -275	4.8	0.0	0.0	tttaactagtTGCATGCatgaaagaaa
      -239	4.8	0.0	0.0	aataaatgatTGCATGCgactagaata
---------------------------------------------------------------------------

CC[CT]CAC.

Best GO enrichment: 

Genes/ORF that have the motifs (genes are sorted by max(pa+pd+po)):

A1_2970
  Expression: 38

  Position	 pa	 pd	 po
      -315	2.7	0.0	1.1	ttttggacgcAGTGAGGattaaaatat

A1_3030
  Expression: 38

  Position	 pa	 pd	 po
       -57	2.7	0.0	1.1	aaacgggaaaAGTGAGGaataaatgag
---------------------------------------------------------------------------
.TTCCA.

Best GO enrichment: 

Genes/ORF that have the motifs (genes are sorted by max(pa+pd+po)):

A3_3490
  Expression: 104

  Position	 pa	 pd	 po
      -210	2.7	0.0	0.0	tttaactgaaTTTCCAAttttagttac

A3_880
  Expression: 104

  Position	 pa	 pd	 po
      -245	2.7	0.0	0.0	aaccctattaTTTCCAAatataaaatc
      -317	2.7	0.0	0.0	aaagttgaagCTGGAACactcaaatat
---------------------------------------------------------------------------

What I need to be parsed out is a file with the following format.

Code:
[CGT]CATGC[AT]
A1_1100 103
A1_3110 103
A1_1690 44

CC[CT]CAC.
A1_2970 38
A1_3030 38

.TTCCA.

A3_3490 104
A3_880 104

Please let me know.
# 2  
Old 09-29-2011
Code:
 awk '{printf NF==1?RS $0:FS $0}' infile |awk '/Best GO/{print $1}/Expression/{print $1,$3}/----/{printf RS}'

# 3  
Old 09-29-2011
I got the following syntax error:
Code:
awk: syntax error at source line 1
 context is
	{printf >>>  NF== <<< 
awk: illegal statement at source line 1
awk: illegal statement at source line 1

Would you please look into it
# 4  
Old 09-29-2011
use nawk or /usr/xpg4/bin/awk in Solaris
# 5  
Old 09-29-2011
sorry I am on a mac
# 6  
Old 09-29-2011
Can you show us the awk version?

Code:
awk -V

# 7  
Old 09-29-2011
In my machine awk -V seems not the right command to know the version.
Code:
awk: unknown option -V ignored

awk: no program given

---------- Post updated at 10:17 PM ---------- Previous update was at 10:08 PM ----------

I got it

awk version 20070501
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing Bulk Data

Hi All, :D Actullay I am looking for a smart way :b: to parse files in a directory whose count is around 2000000 :eek: in a single day. Find is working with me but taking a lot of times :confused:, sometimes even a day which is not helping me.:wall: So anyone can help me know a smart... (5 Replies)
Discussion started by: jojo123
5 Replies

2. Shell Programming and Scripting

Parsing XML (and insert data) then output data (bash / Solaris)

Hi folks I have a script I wrote that basically parses a bunch of config and xml files works out were to add in the new content then spits out the data into a new file. It all works - apart from the xml and config file format in the new file with XML files the original XML (that ends up in... (2 Replies)
Discussion started by: dfinch
2 Replies

3. Shell Programming and Scripting

Parsing file data

Hey Guys, I'm a novice at shell scripts and i need some help parsing file data. Basically, I want to write a script that retrieves URLs. Here is what I have so far. #!/bin/bash echo "Please enter start date (format: yyyy-mm-dd):\c" read STARTDATE echo "Please enter end date... (7 Replies)
Discussion started by: silverdust
7 Replies

4. Shell Programming and Scripting

Help with parsing data with awk , eliminating unwanted data

Experts , Below is the data: --- Physical volumes --- PV Name /dev/dsk/c1t2d0 VG Name /dev/vg00 PV Status available Allocatable yes VGDA 2 Cur LV 8 PE Size (Mbytes) 8 Total PE 4350 Free PE 2036 Allocated PE 2314 Stale PE 0 IO Timeout (Seconds) default --- Physical volumes ---... (5 Replies)
Discussion started by: rveri
5 Replies

5. Shell Programming and Scripting

Help in Parsing data

I have below string Transaction_ID:SDP-DM-151204679 , Transaction_DateTime:2011-02-11 00:00:15 GMT+05:30 , Transaction_Mode:WAP , Circle_ID:4 , Circle_Name:BJ ,Zone: , CustomerID:B_31563486 , MSISDN:7870904329 , IMSI:405876122068099 , IMEI: , Sub_Profile:Pre-Paid , CPID:Nazara , CPNAME:Nazara ,... (6 Replies)
Discussion started by: poweroflinux
6 Replies

6. Shell Programming and Scripting

Parsing the data

Hi friends, I need to parse the following data in the given format and get the desired output. I need a function, which takes the input as a parameter and the desired output will be returned from the function. INPUT(single parameter as complete string) A;BCF;DFG;FD ... (3 Replies)
Discussion started by: sumesh.1988
3 Replies

7. Shell Programming and Scripting

Parsing data

Hi all , I have a file with billing CDR records in it. I need to parse that information (row format) . The purpose is to compare full content. The example I have given below is a single line record but it has two portions, (1) the line start with “!” and end with “1.2.1.8” and (2) second part... (5 Replies)
Discussion started by: jaygamini
5 Replies

8. Shell Programming and Scripting

More efficent Data Parsing

I am looking for a way to parse out some numbers from text. This is an excerpt from a larger script that I am trying to make run a little smoother. Specifically this script is used to Capture DV video streams on a linux machine from the terminal. The setup does 6 streams at once, and this part... (3 Replies)
Discussion started by: Starcast
3 Replies

9. Shell Programming and Scripting

Parsing the data

Hi I need to parse the following data using shell script Table ----- stage4n_abc 48 stage4o_abcd 4 adashpg_abc_HeartBeat 1 stage4l_asc 168 Can anyone gimme the solution. I want each value to get stored in an array or variable and want the value to be greped from another file.... (1 Reply)
Discussion started by: Archana.Dheepan
1 Replies

10. Shell Programming and Scripting

Parsing the data in a file

Hi, I have file (FILE.tmp) having contents, FILE.tmp ======== filename=menudata records=0000000000037 ldbname=pinsys timestamp=2005/05/14-18:32:33 I want to parse it bring a new file which will look like, filename records ldbname timestamp... (2 Replies)
Discussion started by: Omkumar
2 Replies
Login or Register to Ask a Question