awk for text processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk for text processing
# 1  
Old 05-22-2018
awk for text processing

Hi,my file is in this format
Code:
"[ {
  \"_id\": \"56190\",
  \"_score\": 1.0,
  \"generif\": [
    {
      \"pubmed\": 21764855,
      \"text\": \"loss of RNPC1 in mouse embryonic fibroblasts increased the level of p53 protein, leading to enhanced premature senescence in a p53-dependent manner\"
    },
    {
      \"pubmed\": 22371495,
      \"text\": \"a novel mechanism by which HuR is regulated by RNPC1 via mRNA stability and HuR is a mediator of RNPC1-induced growth suppression.\"
    },
    {
      \"pubmed\": 22508983,
      \"text\": \"knockdown of p73 or p21, another target of RNPC1, attenuates the inhibitory effect of RNPC1 on cell proliferation and premature senescence, whereas combined knockdown of p73 and p21 completely abolishes it\"
    },
    {
      \"pubmed\": 23836903,
      \"text\": \"knockdown of MIC-1 can decrease RNPC1-induced cell growth suppression.\"
    },
    {
      \"pubmed\": 25512531,
      \"text\": \"Rbm38 deficiency markedly decreases the tumor penetrance in mice heterozygous for p53 via enhanced p53 expression.\"
    },
    {
      \"pubmed\": 28850611,
      \"text\": \"The hearts of Rbm38 -/- mice were mildly hypertrophic, but cardiac function was not affected. Furthermore, Rbm38 deficiency did not affect cardiac remodeling (i.e. hypertrophy, LV dilation and fibrosis) or performance (i.e. fractional shortening) after pressure-overload induced by transverse aorta constriction.\"
    }
  ],
  \"symbol\": \"Rbm38\"
} ]"

I want to convert it to a more user readable format

Code:
_id pubmed  text  symbol    
67196 18667844  Overexpression of UBE2T in NIH3T3 cells significantly promoted colony formation in mouse cell cultures  Ube2t
56190 21764855  loss of RNPC1 in mouse embryonic fibroblasts increased the level of p53 protein, leading to enhanced premature senescence in a p53-dependent manner Rbm38
56190 22371495  a novel mechanism by which HuR is regulated by RNPC1 via mRNA stability and HuR is a mediator of RNPC1-induced growth suppression Rbm38
56190 22508983  knockdown of p73 or p21, another target of RNPC1, attenuates the inhibitory effect of RNPC1 on cell proliferation and premature senescence, whereas combined knockdown of p73 and p21 completely abolishes it  Rbm38
56190 23836903  knockdown of MIC-1 can decrease RNPC1-induced cell growth suppression.  Rbm38
56190 25512531  Rbm38 deficiency markedly decreases the tumor penetrance in mice heterozygous for p53 via enhanced p53 expression Rbm38
56190 28850611  The hearts of Rbm38 -/- mice were mildly hypertrophic, but cardiac function was not affected. Furthermore, Rbm38 deficiency did not affect cardiac remodeling (i.e. hypertrophy, LV dilation and fibrosis) or performance (i.e. fractional shortening) after pressure-overload induced by transverse aorta constriction. Rbm38

# 2  
Old 05-22-2018
Welcome to the forum.


Any attempts / ideas / thoughts from your side?




Where does the 67196 18667844 info come from?
# 3  
Old 05-22-2018
I am not sure how to edit my post. I have mistakenly given that. please read the expected outcome from second line on wards. I was just trying something like below to get what i wanted

Code:
sed 's/:/\t/' t.txt | awk '{ gsub(/\[/,"") }1'

# 4  
Old 05-22-2018
How about
Code:
awk -F: '
                {gsub (/[\\",]/, _)
                }
/^ *_id/        {ID = $2
                }
/^ *pubmed/     {PM[++CUR] = $2
                }
/^ *text/       {TX[CUR] = $2
                }
/^ *symbol/     {SY = $2
                }
/^ *\} \]/      {for (p in PM) print ID, PM[p], TX[p], SY
                 CUR = 0
                 split ("", PM)
                }
' file
 56190  21764855  loss of RNPC1 in mouse embryonic fibroblasts increased the level of p53 protein leading to enhanced premature senescence in a p53-dependent manner  Rbm38
 56190  22371495  a novel mechanism by which HuR is regulated by RNPC1 via mRNA stability and HuR is a mediator of RNPC1-induced growth suppression.  Rbm38
 56190  22508983  knockdown of p73 or p21 another target of RNPC1 attenuates the inhibitory effect of RNPC1 on cell proliferation and . . .   Rbm38
 56190  23836903  knockdown of MIC-1 can decrease RNPC1-induced cell growth suppression.  Rbm38
 56190  25512531  Rbm38 deficiency markedly decreases the tumor penetrance in mice heterozygous for p53 via enhanced p53 expression.  Rbm38
 56190  28850611  The hearts of Rbm38 -/- mice were mildly hypertrophic but cardiac function was not affected. Furthermore Rbm38 deficiency did not affect . . .  Rbm38

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Text processing

Hi, Need an advise on $ cat test.txt START field1 field2 field3 field4 field5 field6 END 12345|6|1|2|3|4|111|119 67890|6|1|3|8|9|112|000 $ (4 Replies)
Discussion started by: getmilo
4 Replies

2. Shell Programming and Scripting

Help with text processing

I have an Input file which has a series of lines(which could vary) followed by two blank lines and then another series of lines(Could be any number of lines) followed by two blank lines and then repeats. I need to use filters to convert the following input file(which is an example) to an output... (7 Replies)
Discussion started by: bikerboy
7 Replies

3. Shell Programming and Scripting

Text columns processing using awk

P { margin-bottom: 0.25cm; line-height: 120%; }CODE.cjk { font-family: "WenQuanYi Micro Hei",monospace; }CODE.ctl { font-family: "Lohit Hindi",monospace; }A:link { } I'm trying to build an awk statement to print from a file (file1): A 1,2,3 * A 4,5,6 ** B 1 ... (4 Replies)
Discussion started by: dovah
4 Replies

4. Shell Programming and Scripting

Text processing using awk

I dispose of two tab-delimited files (the first column is the primary key): File 1 (there are multiple rows sharing the same key, I cannot merge them) A 28,29,30,31 A 17,18,19 B 11,13,14,15 B 8,9File 2 (there is one only row beginning with a given key) A 2,8,18,30,31 B ... (3 Replies)
Discussion started by: dovah
3 Replies

5. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

6. Shell Programming and Scripting

Awk text processing

Hi Very much appreciate if somebody could give me a clue .. I undestand that it could be done with awk but have a limited experience. I have the following text in the file 1 909 YES NO 2 500 No NO . ... 1 ... (8 Replies)
Discussion started by: zam
8 Replies

7. Shell Programming and Scripting

awk, perl Script for processing a single line text file

I need a script to process a huge single line text file: The sample of the text is: "forward_inline_item": "Inline", "options_region_Australia": "Australia", "server_event_err_msg": "There was an error attempting to save", "Token": "Yes", "family": "Family","pwd_login_tab": "Enter Your... (1 Reply)
Discussion started by: hmsadiq
1 Replies

8. Shell Programming and Scripting

text processing ( sed/awk)

hi.. I have a file having record on in 1 line.... I want every 400 characters in a new line... means in 1st line 1-400 in 2nd line - 401-800 etc pl help. (12 Replies)
Discussion started by: clx
12 Replies

9. UNIX for Dummies Questions & Answers

text file processing

Hello! There is a text file, that contains hierarchy of menues, like: Aaaaa->Bbbbb Aaaaa->Cccc Aaaaa-> {spaces} Ddddd (it means that the full path is Aaaaa->Cccc->Ddddd ) Aaaaa-> {more spaces} Eeeee (it means that the full path is Aaaaa->Cccc->Ddddd->Eeeee ) Fffffff->Ggggg... (1 Reply)
Discussion started by: alias47
1 Replies

10. UNIX for Dummies Questions & Answers

Processing a text file

A file contains one name per line, such as: john doe jack bruce nancy smith sam riley When I 'cat' the file, the white space is treated as a new line. For example list=`(cat /path/to/file.txt)` for items in $list do echo $items done I get: john doe (1 Reply)
Discussion started by: TheCrunge
1 Replies
Login or Register to Ask a Question