Visit The New, Modern Unix Linux Community


Split large xml into mutiple files and with header and footer in file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split large xml into mutiple files and with header and footer in file
# 22  
Hi Don Cragun,

Please forget about the above awk commands it would be confusing below is the sample xml file
i want string value JOB_ID to be extracted and assigned to a variable NEW_VAR

Output Expected:
Code:
 NEW_VAR ='30544,30545,30546'

This value i will pass to Database later

Code:
 <?xml version='1.0' encoding='UTF-8'?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><ns5:DoPublishFromImportResponse xmlns:ns12="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/response" xmlns:ns11="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/response" xmlns:ns10="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/response" xmlns:ns9="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1" xmlns:ns8="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/request" xmlns:ns7="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/request" xmlns:ns6="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1" xmlns:ns5="oracle/documaker/schema/ws/publishing" xmlns:ns4="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1" xmlns:ns3="oracle/documaker/schema/common" xmlns:ns2="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/request" xmlns="oracle/documaker/schema/ws/publishing/common"><ns5:DoPublishFromImportResponseV1><Result>0</Result><ServiceTimeMillis>13</ServiceTimeMillis><ns6:JobResponse CorrelationId="?"><ns11:JobPayloadType>1</ns11:JobPayloadType><ns11:JobPriority>10</ns11:JobPriority><ns11:JobStatus>111</ns11:JobStatus><ns11:JobUnique_Id>010d9363-6362-4f66-a48a-b3a1e4b90bc9</ns11:JobUnique_Id><ns11:Job_Id>30544</ns11:Job_Id></ns6:JobResponse><ns6:ServiceInfo><ns3:Operation>doPublishFromImport</ns3:Operation><ns3:Version><ns3:Number>1</ns3:Number><ns3:Used>true</ns3:Used></ns3:Version></ns6:ServiceInfo></ns5:DoPublishFromImportResponseV1></ns5:DoPublishFromImportResponse></S:Body></S:Envelope><?xml version='1.0' encoding='UTF-8'?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><ns5:DoPublishFromImportResponse xmlns:ns12="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/response" xmlns:ns11="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/response" xmlns:ns10="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/response" xmlns:ns9="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1" xmlns:ns8="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/request" xmlns:ns7="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/request" xmlns:ns6="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1" xmlns:ns5="oracle/documaker/schema/ws/publishing" xmlns:ns4="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1" xmlns:ns3="oracle/documaker/schema/common" xmlns:ns2="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/request" xmlns="oracle/documaker/schema/ws/publishing/common"><ns5:DoPublishFromImportResponseV1><Result>0</Result><ServiceTimeMillis>14</ServiceTimeMillis><ns6:JobResponse CorrelationId="?"><ns11:JobPayloadType>1</ns11:JobPayloadType><ns11:JobPriority>10</ns11:JobPriority><ns11:JobStatus>111</ns11:JobStatus><ns11:JobUnique_Id>f8268dda-9357-45ec-baab-e6fbb30744bd</ns11:JobUnique_Id><ns11:Job_Id>30545</ns11:Job_Id></ns6:JobResponse><ns6:ServiceInfo><ns3:Operation>doPublishFromImport</ns3:Operation><ns3:Version><ns3:Number>1</ns3:Number><ns3:Used>true</ns3:Used></ns3:Version></ns6:ServiceInfo></ns5:DoPublishFromImportResponseV1></ns5:DoPublishFromImportResponse></S:Body></S:Envelope><?xml version='1.0' encoding='UTF-8'?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><ns5:DoPublishFromImportResponse xmlns:ns12="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/response" xmlns:ns11="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/response" xmlns:ns10="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/response" xmlns:ns9="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1" xmlns:ns8="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/request" xmlns:ns7="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/request" xmlns:ns6="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1" xmlns:ns5="oracle/documaker/schema/ws/publishing" xmlns:ns4="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1" xmlns:ns3="oracle/documaker/schema/common" xmlns:ns2="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/request" xmlns="oracle/documaker/schema/ws/publishing/common"><ns5:DoPublishFromImportResponseV1><Result>0</Result><ServiceTimeMillis>12</ServiceTimeMillis><ns6:JobResponse CorrelationId="?"><ns11:JobPayloadType>1</ns11:JobPayloadType><ns11:JobPriority>10</ns11:JobPriority><ns11:JobStatus>111</ns11:JobStatus><ns11:JobUnique_Id>35b40e14-77b8-4f63-80c4-6ac0d8020985</ns11:JobUnique_Id><ns11:Job_Id>30546</ns11:Job_Id></ns6:JobResponse><ns6:ServiceInfo><ns3:Operation>doPublishFromImport</ns3:Operation><ns3:Version><ns3:Number>1</ns3:Number><ns3:Used>true</ns3:Used></ns3:Version></ns6:ServiceInfo></ns5:DoPublishFromImportResponseV1></ns5:DoPublishFromImportResponse></S:Body></S:Envelope><?xml version='1.0' encoding='UTF-8'?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><ns5:DoPublishFromImportResponse xmlns:ns12="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/response" xmlns:ns11="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/response" xmlns:ns10="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/response" xmlns:ns9="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1" xmlns:ns8="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/request" xmlns:ns7="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/request" xmlns:ns6="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1" xmlns:ns5="oracle/documaker/schema/ws/publishing" xmlns:ns4="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1" xmlns:ns3="oracle/documaker/schema/common" xmlns:ns2="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/request" xmlns="oracle/documaker/schema/ws/publishing/common"><ns5:DoPublishFromImportResponseV1><Result>0</Result><ServiceTimeMillis>15</ServiceTimeMillis><ns6:JobResponse CorrelationId="?"><ns11:JobPayloadType>1</ns11:JobPayloadType><ns11:JobPriority>10</ns11:JobPriority><ns11:JobStatus>111</ns11:JobStatus><ns11:JobUnique_Id>9e4e8e04-167f-46dd-9801-27776728fe05</ns11:JobUnique_Id><ns11:Job_Id>30547</ns11:Job_Id></ns6:JobResponse><ns6:ServiceInfo><ns3:Operation>doPublishFromImport</ns3:Operation><ns3:Version><ns3:Number>1</ns3:Number><ns3:Used>true</ns3:Used></ns3:Version></ns6:ServiceInfo></ns5:DoPublishFromImportResponseV1></ns5:DoPublishFromImportResponse></S:Body></S:Envelope><?xml version='1.0' encoding='UTF-8'?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><ns5:DoPublishFromImportResponse xmlns:ns12="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/response" xmlns:ns11="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/response" xmlns:ns10="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/response" xmlns:ns9="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1" xmlns:ns8="oracle/documaker/schema/ws/publishing/doGetPublishingInfo/v1/request" xmlns:ns7="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1/request" xmlns:ns6="oracle/documaker/schema/ws/publishing/doPublishFromImport/v1" xmlns:ns5="oracle/documaker/schema/ws/publishing" xmlns:ns4="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1" xmlns:ns3="oracle/documaker/schema/common" xmlns:ns2="oracle/documaker/schema/ws/publishing/doPublishFromFactory/v1/request" xmlns="oracle/documaker/schema/ws/publishing/common"><ns5:DoPublishFromImportResponseV1><Result>0</Result><ServiceTimeMillis>15</ServiceTimeMillis><ns6:JobResponse CorrelationId="?"><ns11:JobPayloadType>1</ns11:JobPayloadType><ns11:JobPriority>10</ns11:JobPriority><ns11:JobStatus>111</ns11:JobStatus><ns11:JobUnique_Id>cfd9fba3-bc37-4f2f-936e-7b38f7c59f57</ns11:JobUnique_Id><ns11:Job_Id>30548</ns11:Job_Id></ns6:JobResponse><ns6:ServiceInfo><ns3:Operation>doPublishFromImport</ns3:Operation><ns3:Version><ns3:Number>1</ns3:Number><ns3:Used>true</ns3:Used></ns3:Version></ns6:ServiceInfo></ns5:DoPublishFromImportResponseV1></ns5:DoPublishFromImportResponse></S:Body></S:Envelope>

Moderator's Comments:
Mod Comment Please use CODE tags when displaying sample input (which you did), sample output (which you did not), and code segments (which you did not).

Moderator's Comments:
Mod Comment Duplicated update message deleted to save space and to reduce confusion about possible differences in the given sample XML file contents.

Last edited by Don Cragun; 02-10-2019 at 11:25 PM.. Reason: Add missing CODE tags and remove duplicated message update.
# 23  
Hi Karthik,
PLEASE pay attention to what you are doing! There cannot be a <space> between the name of a shell variable and the <equals-sign> that follows it if you are trying to assign a value to that variable. This has been said several times in this thread and yet you still write that you want the result to be:
Code:
 NEW_VAR ='30544,30545,30546'

which, as stated before tells the shell to run a utility named NEW_VAR with one operand that is the string =30544,30545,30546 and note that that operand does not contain the <single-quote> characters that will be removed by the shell as it prepares the arguments to be passed to the NEW_VAR utility when it is invoked.

Note also that you have not told us what operating system you're using. With a sample file that is 8,157 bytes long and contains only a single line, that is not a text file on many BSD, Linux, and UNIX systems and the awk, sed, and most other standard text processing utilities have undefined behavior if the input files being processed are not text files.

Note also that you say that the output to be produced from your sample input should have three numbers (Job IDs) in the output, but there are five Job IDs in the sample input? Why shouldn't all five values be extracted from the XML file?

If we assume that the awk utility on your system can handle text files with unlimited line lengths, the following might do what you want:
Code:
NEW_VAR=$(awk -v sq="'" -F'<ns11:Job_Id>' '
		{	for(i = 2; i <= NF; i++) {
				sub(/<.*/, "", $i)
				printf("%s%s", cnt++ ? "," : sq, $i)
			}
		}
		END {	print sq
		}' file
	)

printf 'NEW_VAR has been assigned the value: %s\n' "$NEW_VAR"

which, on macOS Mojave version 10.14.3, produces the output:
Code:
NEW_VAR has been assigned the value: '30544,30545,30546,30547,30548'

if the file named file contains the sample data you provided in post #22 in this thread.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 24  
I am using Linux OS and the file is .xml , And I need all the values of job_id as you mentioned not just 3.

--- Post updated at 04:42 AM ---

Thanks a lot it worked and my apologies for all the confusion.
# 25  
Quote:
Originally Posted by karthik
Iam using Linux OS and the file is .xml , And i need all the values of job_id as you mentioned not just 3.
OK. So does the code I suggested in post #23 produce the output you want if you change the name of the file in the script to match the name of your input file?
This User Gave Thanks to Don Cragun For This Post:
# 26  
Yes Don Cragun it worked thanks for your help .
# 27  
Hi Don Cragun/Rudic ,

I have built the script based on all the inputs one last thing is renaming files it is still creating just one standard file name kindly assist

Below command is not creating unique names as expected

sample input:
Code:
filename:sampletest.xml
                                           sampletest_111.xml

Actual Output:
Code:
Extrfile001.xml just 1 file is getting created

Expected Output:
Code:
Extrfile001.xml
Extrfile002.xml

Code:
arr=($(ls | grep "../Inbound/Extrfile[0-9]*.xml"))

Code:
#!/bin/sh

# Add all Input files to array
FileList=($(ls | grep "../Inbound/sampletest*\\_[0-9]"))
  
echo  "$FileList"  

#loop array for Input files

for x in "${FileList[@]}"
do
 #for each element in array
 

#File Split Begin
awk -f xml_tag_handler.awk -f File_split.awk OUT=$x"" ROWS="500" $x $x
mv $x ../Staging
done

rm Response.xml Extr*.xml 


for f in ../Inbound/sampletest_*
  do    TMP="${f/sampletest_/Extrfile}"
         mv "$f" "${TMP%.*}"
  done

# add all files to array
arr=($(ls | grep "../Inbound/Extrfile[0-9]*.xml"))


Last edited by Don Cragun; 02-18-2019 at 12:19 AM.. Reason: Remove quote tags around update; get rid of earlier version of duplicated post.
# 28  
Quote:
Originally Posted by karthik
Hi Don Cragun/Rudic ,

I have built the script based on all the inputs one last thing is renaming files it is still creating just one standard file name kindly assist

Below command is not creating unique names as expected

sample input:
Code:
filename:sampletest.xml
                                           sampletest_111.xml

Actual Output:
Code:
Extrfile001.xml just 1 file is getting created

Expected Output:
Code:
Extrfile001.xml
Extrfile002.xml

Code:
arr=($(ls | grep "../Inbound/Extrfile[0-9]*.xml"))

Code:
#!/bin/sh

# Add all Input files to array
FileList=($(ls | grep "../Inbound/sampletest*\\_[0-9]"))
  
echo  "$FileList"  

#loop array for Input files

for x in "${FileList[@]}"
do
 #for each element in array
 

#File Split Begin
awk -f xml_tag_handler.awk -f File_split.awk OUT=$x"" ROWS="500" $x $x
mv $x ../Staging
done

rm Response.xml Extr*.xml 


for f in ../Inbound/sampletest_*
  do    TMP="${f/sampletest_/Extrfile}"
         mv "$f" "${TMP%.*}"
  done

# add all files to array
arr=($(ls | grep "../Inbound/Extrfile[0-9]*.xml"))

Hi karthik,
All of the code marked in red above will ALWAYS expand to nothing because the output from ls when invoked with no operands will NEVER yield any string containing ../. Therefore the script you showed us is logically equivalent to the script:
Code:
#!/bin/sh

# Add all Input files to array
FileList=()
  
echo  ""  
#loop array for Input files
#for each element in array
#File Split Begin

rm Response.xml Extr*.xml

for f in ../Inbound/sampletest_*
  do    TMP="${f/sampletest_/Extrfile}"
         mv "$f" "${TMP%.*}"
  done

# add all files to array
arr=()

I assume that you are not getting what you want because you never run any of the awk scripts in your shell script; you only move around and change the names of files that already existed before you started running this script.
This User Gave Thanks to Don Cragun For This Post:

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #505
Difficulty: Medium
An executable statement may not invoke (or call or execute) another procedure (also called subroutine, function, method, etc.),
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Eliminate Header and footer from EBCDIC file

Is there any command to eliminate Header and footer from EBCDIC file (4 Replies)
Discussion started by: abhilashnair
4 Replies

2. UNIX for Dummies Questions & Answers

File Row Line Count without Header Footer

Hi There! I am saving the file count of all files in a directory to an output file using: wc -l * > FileCount.txt I get: 114 G4SXORD 3 G4SXORH 0 G4SXORP 117 total But this count includes header and footer. I want to subtract 2 from the count and get ... (7 Replies)
Discussion started by: gagan8877
7 Replies

3. Shell Programming and Scripting

Is there a way to append both at header and footer of a file

currently I've a file Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('024', '024', 2, sysdate); Insert into CD_CARD_TYPE (CODE, DESCRIPTION, LAST_UPDATE_BY, LAST_UPDATE_DATE) Values ('032', '032', 2, sysdate); ........ is it... (3 Replies)
Discussion started by: jediwannabe
3 Replies

4. Shell Programming and Scripting

Removing header or footer from file

Hi Every one, what is the coomand to remove header or footer from a file. Please help me by providing command/syntax to remove header/footer from unix. Thanks in advance for all your support. (5 Replies)
Discussion started by: sridhardwh
5 Replies

5. Shell Programming and Scripting

Add header and footer with record count in footer

This is my file(Target.txt) name|age|locaction abc|23|del xyz|24|mum jkl|25|kol The file should be like this 1|03252012 1|name|age|location 2|abc|23|del 2|xyz|24|mum 2|jkl|25|kol 2|kkk|26|hyd 3|4 Column 1 is row indicator for row 1 and 2, column indicator is 1,for data rows... (1 Reply)
Discussion started by: itsranjan
1 Replies

6. Shell Programming and Scripting

sort a report file having header and footer

I am having report file with header and footer . The details in between header and footer are separated by a pipe charater. I want to sort the file by considering multiple columns in between header and footer. pls help (4 Replies)
Discussion started by: suryanarayana
4 Replies

7. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (7 Replies)
Discussion started by: ashish4422
7 Replies

8. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies

9. Shell Programming and Scripting

Total of lines w/out header and footer incude for a file

I am trying to get a total number of tapes w/out headers or footers in a ERV file and append it to the file. For some reason I cannot get it to work. Any ideas? #!/bin/sh dat=`date +"%b%d_%Y"` + date +%b%d_%Y dat=Nov16_2006 tapemgr="/export/home/legato/tapemgr/rpts"... (1 Reply)
Discussion started by: gzs553
1 Replies

10. Shell Programming and Scripting

Need to Chop Header and Footer record from input file

Hi, I need to chope the header and footer record from an input file and make a new output file, please let me know how i can do it in unix.thanks. (4 Replies)
Discussion started by: coolbudy
4 Replies

Featured Tech Videos