Shell script to extract data in a file

 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Shell script to extract data in a file
# 1  
Old 02-15-2017
Shell script to extract data in a file

I have this 5GB file, and i want to extract from the file particulars pattern.

this is my script:
//
Code:
count=`grep -wc "MSISDN" file_name`
k=1
>OUTPUT
>OUTPUT_Final
while [ $k -le $count ]
do
cat file_name | awk -F":" -v var="$k" '$1=="MSISDN" {m++}m==var{print; exit}' >> OUTPUT
cat file_name |awk -F":" -v var="$k" '$1=="IMSI" {m++}m==var{print; exit}' >> OUTPUT
cat file_name |awk -F":" -v var="$k" '$1=="NAM"  {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="TS11" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="TS21" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="TS22" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="TS62" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="BAIC" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="BAOC" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="APNID1" {m++}m==var{print; exit}' >> OUTPUT
cat file_name | awk -F":" -v var="$k" '$1=="APNID2" {m++}m==var{print; exit}' >> OUTPUT
echo " " >> OUTPUT
k=`expr $k + 1`
done
paste -d"," - - - - - - - - - - - - <OUTPUT > OUTPUT_Final

//

So in my file called file_name, i want to extract only the values MSISDN,NAM,OBO,TS,etc... append the results in the OUTPUT file then use the paste command to put them in the same line.
The script is working fine for a smaller size of file. But with a file size of 5GB it's 2 days running.

Please i need help!

Last edited by Corona688; 02-15-2017 at 01:38 PM..
# 2  
Old 02-15-2017
Please use code tags for code.

I'm not surprised it takes days to run, you are running shell externals 22 times per line, and processing the entire file each time when you probably only meant to process a line. So you are processing the file 22*n times more than you needed to, with n being the number of lines in the file.

How about:
Code:
awk '{ A[$1]++ }
END {
        for(X in A) { printf("%s%s", P, X); P="\t" }
        printf("\n");
        P="" ;
        for(X in A) { printf("%s%s", P, A[X]); P="\t" }
        printf("\n"); }' < inputfile > outputfile

If that doesn't work, please show the input you have and the output you want.
# 3  
Old 02-15-2017
Without reasonable, representative input and output samples it's difficult to see WHAT you really want done. To me it seems you have count records in the file, and want to print the records' corresponding lines, blockwise. (Which is not what you specify verbally: "... extract only the values MSISDN,NAM,OBO,TS,etc...", except $1 is the only field in a line)
- Are a single record's lines contiguous? Or do records overlap?
- Are the input records in the order that you want them printed?
- Are there more fields in a line or just $1? Should those be printed?
- Are there more lines that you want suppressed?
# 4  
Old 02-15-2017
Let's say this is the input, 2(actually they are 4milion) blocks of line.
(dn:serv=...) begining of a block and space the end of block.
In each block, i want to extract only the MSISDN,IMSI,NAM,TS11,21,22,BAIC,APNID,OBO,OBI then for each block the expected values should be in the same line separated by semi colon or colon.


Code:
dn:serv=CSPS,mscId=00015640ccf345a7914718d3d3eff6f6,ou=multiSCs,dc=mtncg
structuralObjectClass: CP1
objectClass: CP1
objectClass: CUDBServiceAuxiliary
objectClass: CP2
objectClass: CP3
objectClass: CP4
objectClass: CP5
objectClass: CP6
objectClass: CP7
objectClass: CP8
objectClass: CP9
objectClass: CPA
objectClass: CPB
objectClass: CPC
objectClass: CPD
objectClass: CPE
objectClass: CPF
objectClass: CPG
objectClass: CPH
objectClass: CPI
objectClass: CPJ
objectClass: CPK
objectClass: CPL
objectClass: CPM
objectClass: CPM1
objectClass: CPM2
objectClass: CPM3
objectClass: CPM4
objectClass: CPZ
objectClass: CP04s
objectClass: CP0A
objectClass: CP11
entryDS: 1
nodeId: 1
createTimestamp: 20170104222519Z
modifyTimestamp: 20170113111014Z
MSISDN: 242064493944
IMSI: 629100113334650
NAM: 1
CDC: 6
CSP: 6
SUBSCSPVERS: 3
PDPCP: 12
SUBSPDPCPVERS: 1
RSA: 3
SUBSRSAVERS: 10
APNID1: 2
APNVERS1: 1
APNID2: 1
APNVERS2: 1
APNID3: 0
APNVERS3: 1
EQOSIDV1:: AAAC
EQOSIDV2:: AAAC
EQOSIDV3:: AAAC
serv: CSPS
CSLOC: 2
RVLRI: 0
RSGSNI: 0
GSMUEFEAT: 0
OBO: 1
OBI: 1
MCA: 1
CAT: 10
DBSG: 1
OFA: 0
SOCB: 1
PWD: 0000
PWDC: 0
SOCFB: 0
SOCFNRC: 0
SOCFNRY: 0
SOCFU: 0
SODCF: 0
SOSDCF: 7
SOCLIP: 0
SOCLIR: 0
SOCOLP: 0
BS26: 1
BS3G: 1
TS11: 1
TS21: 1
TS22: 1
TS62: 1
CAW: 1
HOLD: 1
MPTY: 1
OICK: 10
BAIC: 1
BAOC: 1
BICRO: 1
BOIC: 1
BOIEXH: 1
CFB: 1
CFNRC: 1
CFNRY: 1
CFU: 1
CLIP: 1
CAWTS10ST: 8
CFBTS10ST: 8
CFUTS10ST: 8
CFNRCTS10ST: 8
CFNRYTS10ST: 8
BAICTS10ST: 8
BAOCTS10ST: 8
BICROTS10ST: 8
BOICTS10ST: 8
BOIEXHTS10ST: 8
BAICTS20ST: 8
BAOCTS20ST: 8
BICROTS20ST: 8
BOICTS20ST: 8
BOIEXHTS20ST: 8
CAWTS60ST: 8
CFBTS60ST: 8
CFUTS60ST: 8
CFNRCTS60ST: 8
CFNRYTS60ST: 8
BAICTS60ST: 8
BAOCTS60ST: 8
BICROTS60ST: 8
BOICTS60ST: 8
BOIEXHTS60ST: 8
CAWBS30ST: 8
CFBBS30ST: 8
CFUBS30ST: 8
CFNRCBS30ST: 8
CFNRYBS30ST: 8
BAICBS30ST: 8
BAOCBS30ST: 8
BICROBS30ST: 8
BOICBS30ST: 8
BOIEXHBS30ST: 8
CAWBS20ST: 8
CFBBS20ST: 8
CFUBS20ST: 8
CFNRCBS20ST: 8
CFNRYBS20ST: 8
BAICBS20ST: 8
BAOCBS20ST: 8
BICROBS20ST: 8
BOICBS20ST: 8
BOIEXHBS20ST: 8
EQOSID1: 0
PDPTYPE1: 0
VPAA1: 0
EQOSID2: 0
PDPTYPE2: 0
VPAA2: 0
EQOSID3: 0
PDPTYPE3: 0
VPAA3: 0

dn: serv=CSPS,mscId=0001b7b4ad4d44bbb73484e858270eb3,ou=multiSCs,dc=mtncg
structuralObjectClass: CP1
objectClass: CP1
objectClass: CUDBServiceAuxiliary
objectClass: CP2
objectClass: CP3
objectClass: CP4
objectClass: CP5
objectClass: CP6
objectClass: CP7
objectClass: CP8
objectClass: CP9
objectClass: CPA
objectClass: CPB
objectClass: CPC
objectClass: CPD
objectClass: CPE
objectClass: CPF
objectClass: CPG
objectClass: CPH
objectClass: CPI
objectClass: CPJ
objectClass: CPK
objectClass: CPL
objectClass: CPM
objectClass: CPM1
objectClass: CPM2
objectClass: CPM3
objectClass: CPM4
objectClass: CPZ
objectClass: CP04s
objectClass: CP0A
objectClass: CP11
entryDS: 1
nodeId: 1
createTimestamp: 20170119174955Z
modifyTimestamp: 20170119174956Z
MSISDN: 242068626345
IMSI: 629100114187228
NAM: 0
CDC: 3
CSP: 6
SUBSCSPVERS: 3
PDPCP: 12
SUBSPDPCPVERS: 1
RSA: 3
SUBSRSAVERS: 10
APNID1: 2
APNVERS1: 1
APNID2: 1
APNVERS2: 1
APNID3: 0
APNVERS3: 1
EQOSIDV1:: AAAC
EQOSIDV2:: AAAC
EQOSIDV3:: AAAC
serv: CSPS
CSLOC: 2
PSLOC: 2
RVLRI: 0
RSGSNI: 0
GSMUEFEAT: 0
MCA: 1
CAT: 10
DBSG: 1
OFA: 0
SOCB: 1
PWD: 0000
PWDC: 0
SOCFB: 0
SOCFNRC: 0
SOCFNRY: 0
SOCFU: 0
SODCF: 0
SOSDCF: 7
SOCLIP: 0
SOCLIR: 0
SOCOLP: 0
BS26: 1
BS3G: 1
TS11: 1
TS21: 1
TS22: 1
TS62: 1
CAW: 1
HOLD: 1
MPTY: 1
OICK: 10
BAIC: 1
BAOC: 1
BICRO: 1
BOIC: 1
BOIEXH: 1
CFB: 1
CFNRC: 1
CFNRY: 1
CFU: 1
CLIP: 1
CAWTS10ST: 8
CFBTS10ST: 8
CFUTS10ST: 8
CFNRCTS10ST: 8
CFNRYTS10ST: 8
BAICTS10ST: 8
BAOCTS10ST: 8
BICROTS10ST: 8
BOICTS10ST: 8
BOIEXHTS10ST: 8
BAICTS20ST: 8
BAOCTS20ST: 8
BICROTS20ST: 8
BOICTS20ST: 8
BOIEXHTS20ST: 8
CAWTS60ST: 8
CFBTS60ST: 8
CFUTS60ST: 8
CFNRCTS60ST: 8
CFNRYTS60ST: 8
BAICTS60ST: 8
BAOCTS60ST: 8
BICROTS60ST: 8
BOICTS60ST: 8
BOIEXHTS60ST: 8
CAWBS30ST: 8
CFBBS30ST: 8
CFUBS30ST: 8
CFNRCBS30ST: 8
CFNRYBS30ST: 8
BAICBS30ST: 8
BAOCBS30ST: 8
BICROBS30ST: 8
BOICBS30ST: 8
BOIEXHBS30ST: 8
CAWBS20ST: 8
CFBBS20ST: 8
CFUBS20ST: 8
CFNRCBS20ST: 8
CFNRYBS20ST: 8
BAICBS20ST: 8
BAOCBS20ST: 8
BICROBS20ST: 8
BOICBS20ST: 8
BOIEXHBS20ST: 8
EQOSID1: 0
PDPTYPE1: 0
VPAA1: 0
EQOSID2: 0
PDPTYPE2: 0
VPAA2: 0
EQOSID3: 0
PDPTYPE3: 0
VPAA3: 0


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 02-15-2017 at 03:31 PM.. Reason: Added CODE tags.
# 5  
Old 02-15-2017
something along these lines...
awk -f gil.awk myInputFile where gil.awk is:
Code:
BEGIN {
  FS=": *"
  OFS=","

  tags="MSISDN,IMSI,NAM,TS11,TS21,TS22,TS62,BAIC,BAOC,APNID1,APNID2,OBO,OBI"
  ntagsA=split(tags, tA, OFS)
  for(i=1; i<=ntagsA;i++)
    tagsA[tA[i]]=i

  split("", outA)
}

function outRec(a,   i)
{
  for(i=1; i<=ntagsA;i++)
    printf("%s%s", a[i], (i==ntagsA)?ORS:OFS)
}
FNR==1 { print tags}
$1=="dn" {
   if (1 in outA) outRec(outA)
   split("", outA)
}
$1 in tagsA {
  outA[tagsA[$1]]=$2
}
END {
  if (1 in outA) outRec(outA)
}


Last edited by vgersh99; 02-15-2017 at 04:22 PM..
This User Gave Thanks to vgersh99 For This Post:
# 6  
Old 02-15-2017
Similar problems have been solved in these fora umpteen times. Would this adaption of one of those come close to what you need?
Code:
awk -F: '
BEGIN                   {HD="MSISDN,IMSI,NAM,TS11,TS21,TS22,TS62,BAIC,BAOC,APNID1,APNID2"
#                        print HD
                         HDCnt  = split(HD, HDArr, ",")
                         NXTREC = "dn"
                         HDCM   = ","HD","
                        }

#                       {gsub (/[\t ]*|\*/, "", $1)}

$1 == NXTREC && PR      {for (i=1; i<=HDCnt; i++) printf "%s,", RES[HDArr[i]]
                         printf RS
                         delete RES
                        }

$1 == NXTREC            {PR=1}

HDCM ~ "," $1 ","       {RES[$1]=$0
                        }

END                     {for (i=1; i<=HDCnt; i++) printf "%s,", RES[HDArr[i]]
                         printf RS
                        }
' FS=":" OFS="," file
MSISDN: 242064493944,IMSI: 629100113334650,NAM: 1,TS11: 1,TS21: 1,TS22: 1,TS62: 1,BAIC: 1,BAOC: 1,APNID1: 2,APNID2: 1,
MSISDN: 242068626345,IMSI: 629100114187228,NAM: 0,TS11: 1,TS21: 1,TS22: 1,TS62: 1,BAIC: 1,BAOC: 1,APNID1: 2,APNID2: 1,

This User Gave Thanks to RudiC For This Post:
# 7  
Old 02-15-2017
Thanks man, but still got some questions. where does the program really start from?and gil.awk it's a file that i should save as .awk

---------- Post updated at 03:16 PM ---------- Previous update was at 03:11 PM ----------

Yes i just hope it'll give me the results fast. Thanks man, i'll get back to you when i run it
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to extract data from csv file

Hi everyone, I have a csv file which has data with different heading and column names as below. Static Data Ingested ,,,,,,,,,,,,Known Explained Rejections Column_1,column_2,Column_3,Column_4,,Column_6,Column_7,,% Column_8,,Column_9 ,Column_10 ,... (14 Replies)
Discussion started by: Vivekit82
14 Replies

2. UNIX for Dummies Questions & Answers

Shell script to extract data from csv file

Hi Guys, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 7 columns having values say column 1,column 2.....column 7 as below along with their values. Name, Address,... (7 Replies)
Discussion started by: Vivekit82
7 Replies

3. UNIX for Dummies Questions & Answers

Shell script to extract data from csv file based on certain conditions

Hi Guys, I am new to shell script.I need your help to write a shell script. I need to write a shell script to extract data from a .csv file where columns are ',' separated. The file has 5 columns having values say column 1,column 2.....column 5 as below along with their valuesm.... (1 Reply)
Discussion started by: Vivekit82
1 Replies

4. Shell Programming and Scripting

regular expression with shell script to extract data out of a text file

hi i am trying to extract some specific data out of a text file using regular expressions with shell script that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl's regular expressions for a sample data like this, i am trying to grab... (6 Replies)
Discussion started by: vemkiran
6 Replies

5. Shell Programming and Scripting

need a shell script to extract data from a log file.

If I have a log like : Mon Jul 19 05:07:34 2010; TCP; eth3; 52 bytes; from abc to def Mon Jul 19 05:07:35 2010; UDP; eth3; 46 bytes; from aaa to bbb Mon Jul 19 05:07:35 2010; TCP; eth3; 52 bytes; from def to ghi I will need an output like this : Time abc to def... (1 Reply)
Discussion started by: hitha87
1 Replies

6. Shell Programming and Scripting

Need shell script to extract data from oracle database

shell script (4 Replies)
Discussion started by: frns5
4 Replies

7. Shell Programming and Scripting

Help with shell script to extract data from XML file

Hello Scripting Gurus, I need help with extracting data from the XML file using shell script. The data is in a large XML and I need to extract the id values of all completedworkflows. Here is a sample of it. Input and output data is also in the attached text files. <wfregistry>... (5 Replies)
Discussion started by: yajaykumar
5 Replies

8. Shell Programming and Scripting

shell-script which extract data from log file

give me a shell-script which extract data from log file on a server by giving date and time as input (for both start time and end time) and it will give the logs generated during the given time as output. (4 Replies)
Discussion started by: abhishek27
4 Replies

9. Shell Programming and Scripting

extract data from xml- shell script using awk

Hi, This is the xml file that i have. - <front-servlet platform="WAS4.0" request-retriever="SiteMinder-aware" configuration-rescan-interval="60000"> <concurrency-throttle maximum-concurrency="50" redirect-page="/jsp/defaulterror.jsp" /> - <loggers> <instrumentation... (5 Replies)
Discussion started by: nishana
5 Replies

10. Shell Programming and Scripting

How to extract data using UNIX shell script?

Hello All, I am starting with UNIX. Any help is highly appreciated. How to extract data using UNIX shell script? And how do you export data using UNIX shell scripts into Microsoft Excel format? Thank you. (3 Replies)
Discussion started by: desiondarun
3 Replies
Login or Register to Ask a Question