Visit Our UNIX and Linux User Community


Importing a unix file dump into a PC capable database


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Importing a unix file dump into a PC capable database
# 1  
Old 07-23-2002
Importing a unix file dump into a PC capable database

My development team has been trying to figure out how to import a unix data dump into SQL Server or convert it into an intermediate file format for several days.

The data dump in question looks like this:
$RecordID: 1<eof>
$Version: 1<eof>
Category: 1<eof>
Poster: John Doe<eof>
ProductName: Test Product<eof>
SKU: 10045689<eof>
Line1: Test Product Line 1 Description<eof>
Line2: Test Product Line 2 Description<eof>
Comments: Test Product Comments<eof>
<eor>

There are nearly 100,000 of records that have nearly 4,000 fields that vary based on the product's category. The field order varies per record within each category. When data does not exist for a given field, the field/value pair is simply excluded for the dump.

We were going to write a parsing application that converted this dump to XML, read it into a dataset, and then uploaded the dataset. About half way through that development, however, I realized that the parsing program would require a minimum of eight gigabytes to run. That obviously won't work.
(100,000 records X 4,000 fields = 400,000,000 fields) X 20 bytes per field name = 8,000,000,000 or 8 billion bytes

Do you know anyone that could tell us an easier way to import this unix dump into SQL? I'm sure there is a standard way of dealing with these dumps, but no one on our team has any experience with unix.

Any help or referral would be GREATLY APPRECIATED. This problem is holding up our entire development process.

Sincerely,

Dalton D. Franklin, MCP
Chief Executive Officer
Simplicity Technology
http://www.simplicitycorp.net
daltonf@simplicitycorp.net
615-327-9797 Telephone
615-985-0060 Fax
# 2  
Old 07-23-2002
Some comments:

1) However the conversion/translation is done, you want to do it using an on-the-fly approach, not by building a giant data structure (XML or whatever) that contains all the data. If you are using XML, there are two general approaches that are used: one builds a tree, and the other processes the XML piecewise. You want the latter.

2) Some more information about the eventual resting place would be helpful. Do you have a single relation defined with 4000-or-so columns?

3) Assuming a simple table such as #2, what I would think about is writing simple program to turn the original file into a sequence of DML INSERT statements to load the database.

I could help you figure this out if I had more information about the input, the database schema, and the tools available. If you want, you can contact me offline.
Marc Rochkind
# 3  
Old 07-23-2002
Dalton Franklin

OK here's the situation..

One of my larger suppliers is giving me detailed information about the products they sell for my website. This information was submitted to them by the MANUFACTURERS of those products.

There are roughly 200 categories. Each category has a different number of fields, but the maximum number of fields in any category is 80 (all categories have at least one field).

We are going to put this data into a ProductDetails table. In addition to regular product data (SKU, ManufacturerID, etc.), there will be eighty varchar fields named "Field1", "Field2", etc. An XML file will map those fields to the custom field names defined for each category.

We pondered completely normalizing the database by using a one-to-one related table for each category; but that would overcomplicated the parsing and this works fine, I belive.

The records in the dump are NOT sorted by field.

Each record in the same category does NOT have the same number of fields; fields that were not filled out by the manufacturer are not shown at all.

Each record in the same category does NOT use the same field order. (I'm thinking XML will solve this problem if I can get the file into XML)

Please email me at daltonf@simplicitycorp.net, if you don't mind. I would be glad to provide you with detailed information about the file, or a copy of it if you like. I just can't post it here on this discussion board for legal reasons.

Thanks very much and have a great day.
# 4  
Old 07-23-2002
Sorry to be a pain, but, as a moderator, one of my duties is to mention our rules. In particular we seem to be straying from:
Quote:
(10) Don't post your email address and ask for an email reply. The forums are for the benefit of all, so all Q&A should take place in the forums.
# 5  
Old 07-23-2002
How big is the non-XML dump file?

I am thinking that Perl with Win32::ODBC under Win32 could be real handy for processing the dump piece-wise and insert into SQL Server. I don't believe that Win32::ODBC exists under Unix so it would have to be done under Win32 ActivateState Perl.

You could have one hash that was the definition of <SomeField> into <Field_sub_x> relationship and load a hash table time and time again, translate the old field names into the new field names and insert via ODBC into the desired table.

But, due the the programability of perl, you could actually dump the products intelligently into normalized tables based upon product type (if the product types are discernable based upon fieldname, catalog numbers, etc).

The Win32::ODBC isn't the fastest thing around, but it could be used for this task.
# 6  
Old 07-23-2002
Importing a Unix file dump into SQL Server

The vendor is getting back to me on the size of the full file (right now I'm working with a sample).

I do know that there are approximately 100,000 products listed in field:value pairs.

In terms of database design, this data is going to be used to create a web store that will allow users to compare products in the same category by searching, filtering, or viewing the products' specifications.

Do you think it would be better to:
(a) Have one Product_Details table that had the fields present in all categories and "Field1", "Field2", etc. with XML defining the names of the fields based on the category in the web front-end

OR

(b) Have multiple Product_Details tables with all of the common product detail field and category-specific fields.
(Each product would be related to only one Product_Details table, in this case, so this latter method SHOULDN'T take up any more space)

Thank you for your help in advance.
# 7  
Old 07-23-2002
Hmmmmmm....

Maybe you could create a single UberTable with a union of all the fields in the database that are intelligently populated for similiar properties and do away with an XML based data-dictionary (I could see that getting pretty nitty gritty quicky). The downside would be that the table would have a bunch of columns that may not be fully utilized and could be rather large in size but the upside would be that the columns would have actual names without an extra lookup during the query process and be easier for humans to follow.

The real question is how disparate the meta-data for each category really is. It wouldn't surprise me if the categories had a lot of similiar data... "Catalog_Number" is the same as "CatNo" is the same as "Cat_No" (et cetera, et cetera). If you could find the intersection of the common fields and then just append the balance to the table you could reduce total number of columns and be easier to query and index the database.

I am trying to remember the proper mathematical terminology, but my Discrete Math is a tad bit rusty at the moment. Smilie

The parser would have to have some logic to split the fields into the proper columns. Instead of one hash, one hash per category/manufacturer/grouping that contains the From/To when populating the UberTable. A lot of work would have to be put into the From/To relationship during conversion but the amount of work saved on the database end my justify the cost on the front side of the conversion.

Hopefully I am not just running on about things that have already been discussed by your own development team.
 

Previous Thread | Next Thread
Test Your Knowledge in Computers #737
Difficulty: Medium
In 1979, the TRS-80 had the largest selection of software in the microcomputer market.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Load a file from UNIX into database

Hi, I want your help to see if there is a way wherein we can load a set of files which are lying in unix server into my database using sql loader. Also the person who will be running sql loader does not have access to unix system. So is there a way i can provide any sql loader script wherein... (1 Reply)
Discussion started by: Vivekit82
1 Replies

2. AIX

Issue while importing Oracle Dump File on AIX 5.3

Hi All, I am facing one problem related to importing Oracle Dump file. We have two different version of AIX boxes with oracle (version 10.2.0.4.0) installed. On one AIX box (version 6.1) we are creating oracle dump file (*.dmp) using oracle exp utility and importing it onto another AIX box... (1 Reply)
Discussion started by: gunjan_thakur
1 Replies

3. Boot Loaders

Is Unetbootin capable of making any iso file bootable

1)It is a great experience to achieve what I got today. I had downloaded GParted iso file (135 MB size). Now to make a bootable usb of it, I used unetbootin and to surprise I succeded in entering Gparted application and modified my partitions. Earlier I was in a notion that it is used only for... (0 Replies)
Discussion started by: ravisingh
0 Replies

4. UNIX for Dummies Questions & Answers

Importing R cosine similarity to UNIX?

I really need help in this :( I have a file and would like to calculate the cosine similarity of the values in it... For now I do use R which has an easy function for doing so test <- as.matrix(read.csv(file="file.csv", sep=",", header=FALSE)) result<- cosine(t(test)) I am using unix of... (3 Replies)
Discussion started by: A-V
3 Replies

5. UNIX for Dummies Questions & Answers

Import dump to database

Hi... I have dump in unix machine...How can I this import dump to Oracle database? Many thanks in advance. (2 Replies)
Discussion started by: agarwal
2 Replies

6. Shell Programming and Scripting

getting dump from database by UNIX script

can i make a dump to happen from my script in a databse.and download that dump. I want to schedule a script i cron so that i will make a dump and download that inot my server. (1 Reply)
Discussion started by: arunkumar_mca
1 Replies

7. Shell Programming and Scripting

Importing dump file

Hi, I am trying to import 22 .dmp files but facing the problem with the last table file it never ends the import command, only the table is created but the rows of the table don't get imported. This is the problem with only ine table rest 21 tables are being imported properly. Thanks in... (2 Replies)
Discussion started by: anushilrai
2 Replies

8. UNIX for Advanced & Expert Users

writing database tables into a file in unix

I want to collect the database tables(Ex: Emp,Sal,Bonus...etc) in a file & give this file at the command prompt, when prompted for the tables list at command prompt. How can i do that ? Ex:- Import jason/jason1 tables=emp,sal,bonus log=j1.log i want to change this into Ex:- Import... (3 Replies)
Discussion started by: dreams5617
3 Replies

9. UNIX for Advanced & Expert Users

building flat files in unix and importing them from windows

what is a flat file in unix? i have to import a unix flat files from windows based programme. my question is not to export from unix but only to import from windows only. how to build that flat files? how to create export to windows how to import from windows (3 Replies)
Discussion started by: tunirayavarapu
3 Replies

10. UNIX for Dummies Questions & Answers

importing database from unix to winnt

i am a unix-super-beginner (swaddled and weaned on windows) and am trying to import a database from a unix directory into winnt. can someone help me or am i a hopeless case? (4 Replies)
Discussion started by: sadiecutie
4 Replies

Featured Tech Videos