Importing a unix file dump into a PC capable database

07-23-2002

Registered User

4, 0

Join Date: Jul 2002

Last Activity: 5 August 2002, 1:24 AM EDT

Location: Nashville, Tn

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

Importing a unix file dump into a PC capable database

My development team has been trying to figure out how to import a unix data dump into SQL Server or convert it into an intermediate file format for several days.

The data dump in question looks like this:
$RecordID: 1<eof>
$Version: 1<eof>
Category: 1<eof>
Poster: John Doe<eof>
ProductName: Test Product<eof>
SKU: 10045689<eof>
Line1: Test Product Line 1 Description<eof>
Line2: Test Product Line 2 Description<eof>
Comments: Test Product Comments<eof>
<eor>

There are nearly 100,000 of records that have nearly 4,000 fields that vary based on the product's category. The field order varies per record within each category. When data does not exist for a given field, the field/value pair is simply excluded for the dump.

We were going to write a parsing application that converted this dump to XML, read it into a dataset, and then uploaded the dataset. About half way through that development, however, I realized that the parsing program would require a minimum of eight gigabytes to run. That obviously won't work.
(100,000 records X 4,000 fields = 400,000,000 fields) X 20 bytes per field name = 8,000,000,000 or 8 billion bytes

Do you know anyone that could tell us an easier way to import this unix dump into SQL? I'm sure there is a standard way of dealing with these dumps, but no one on our team has any experience with unix.

Any help or referral would be GREATLY APPRECIATED. This problem is holding up our entire development process.

Sincerely,

Dalton D. Franklin, MCP
Chief Executive Officer
Simplicity Technology
http://www.simplicitycorp.net
daltonf@simplicitycorp.net
615-327-9797 Telephone
615-985-0060 Fax

DaltonF

View Public Profile for DaltonF

Find all posts by DaltonF

07-23-2002

Marc Rochkind

Guest

n/a, 0

Posts: n/a

Some comments:

1) However the conversion/translation is done, you want to do it using an on-the-fly approach, not by building a giant data structure (XML or whatever) that contains all the data. If you are using XML, there are two general approaches that are used: one builds a tree, and the other processes the XML piecewise. You want the latter.

2) Some more information about the eventual resting place would be helpful. Do you have a single relation defined with 4000-or-so columns?

3) Assuming a simple table such as #2, what I would think about is writing simple program to turn the original file into a sequence of DML INSERT statements to load the database.

I could help you figure this out if I had more information about the input, the database schema, and the tools available. If you want, you can contact me offline.

Marc Rochkind

07-23-2002

Registered User

4, 0

Join Date: Jul 2002

Last Activity: 5 August 2002, 1:24 AM EDT

Location: Nashville, Tn

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

Dalton Franklin

OK here's the situation..

One of my larger suppliers is giving me detailed information about the products they sell for my website. This information was submitted to them by the MANUFACTURERS of those products.

There are roughly 200 categories. Each category has a different number of fields, but the maximum number of fields in any category is 80 (all categories have at least one field).

We are going to put this data into a ProductDetails table. In addition to regular product data (SKU, ManufacturerID, etc.), there will be eighty varchar fields named "Field1", "Field2", etc. An XML file will map those fields to the custom field names defined for each category.

We pondered completely normalizing the database by using a one-to-one related table for each category; but that would overcomplicated the parsing and this works fine, I belive.

The records in the dump are NOT sorted by field.

Each record in the same category does NOT have the same number of fields; fields that were not filled out by the manufacturer are not shown at all.

Each record in the same category does NOT use the same field order. (I'm thinking XML will solve this problem if I can get the file into XML)

Please email me at daltonf@simplicitycorp.net, if you don't mind. I would be glad to provide you with detailed information about the file, or a copy of it if you like. I just can't post it here on this discussion board for legal reasons.

Thanks very much and have a great day.

DaltonF

View Public Profile for DaltonF

Find all posts by DaltonF

07-23-2002

Administrator Emeritus

9,926, 461

Join Date: Aug 2001

Last Activity: 26 February 2016, 12:31 PM EST

Location: Ashburn, Virginia

Posts: 9,926

Thanks Given: 63

Thanked 461 Times in 270 Posts

Sorry to be a pain, but, as a moderator, one of my duties is to mention our rules. In particular we seem to be straying from:

Quote:

(10) Don't post your email address and ask for an email reply. The forums are for the benefit of all, so all Q&A should take place in the forums.

Perderabo

View Public Profile for Perderabo

Find all posts by Perderabo

07-23-2002

Registered User

537, 0

Join Date: Nov 2001

Last Activity: 28 May 2012, 1:32 PM EDT

Location: Wide Awake Wylie, Texas

Posts: 537

Thanks Given: 0

Thanked 0 Times in 0 Posts

How big is the non-XML dump file?

I am thinking that Perl with Win32::ODBC under Win32 could be real handy for processing the dump piece-wise and insert into SQL Server. I don't believe that Win32::ODBC exists under Unix so it would have to be done under Win32 ActivateState Perl.

You could have one hash that was the definition of <SomeField> into <Field_sub_x> relationship and load a hash table time and time again, translate the old field names into the new field names and insert via ODBC into the desired table.

But, due the the programability of perl, you could actually dump the products intelligently into normalized tables based upon product type (if the product types are discernable based upon fieldname, catalog numbers, etc).

The Win32::ODBC isn't the fastest thing around, but it could be used for this task.

auswipe

View Public Profile for auswipe

Find all posts by auswipe

07-23-2002

Registered User

4, 0

Join Date: Jul 2002

Last Activity: 5 August 2002, 1:24 AM EDT

Location: Nashville, Tn

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

Importing a Unix file dump into SQL Server

The vendor is getting back to me on the size of the full file (right now I'm working with a sample).

I do know that there are approximately 100,000 products listed in field:value pairs.

In terms of database design, this data is going to be used to create a web store that will allow users to compare products in the same category by searching, filtering, or viewing the products' specifications.

Do you think it would be better to:
(a) Have one Product_Details table that had the fields present in all categories and "Field1", "Field2", etc. with XML defining the names of the fields based on the category in the web front-end

OR

(b) Have multiple Product_Details tables with all of the common product detail field and category-specific fields.
(Each product would be related to only one Product_Details table, in this case, so this latter method SHOULDN'T take up any more space)

Thank you for your help in advance.

DaltonF

View Public Profile for DaltonF

Find all posts by DaltonF

07-23-2002

Registered User

537, 0

Join Date: Nov 2001

Last Activity: 28 May 2012, 1:32 PM EDT

Location: Wide Awake Wylie, Texas

Posts: 537

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hmmmmmm....

Maybe you could create a single UberTable with a union of all the fields in the database that are intelligently populated for similiar properties and do away with an XML based data-dictionary (I could see that getting pretty nitty gritty quicky). The downside would be that the table would have a bunch of columns that may not be fully utilized and could be rather large in size but the upside would be that the columns would have actual names without an extra lookup during the query process and be easier for humans to follow.

The real question is how disparate the meta-data for each category really is. It wouldn't surprise me if the categories had a lot of similiar data... "Catalog_Number" is the same as "CatNo" is the same as "Cat_No" (et cetera, et cetera). If you could find the intersection of the common fields and then just append the balance to the table you could reduce total number of columns and be easier to query and index the database.

I am trying to remember the proper mathematical terminology, but my Discrete Math is a tad bit rusty at the moment.

The parser would have to have some logic to split the fields into the proper columns. Instead of one hash, one hash per category/manufacturer/grouping that contains the From/To when populating the UberTable. A lot of work would have to be put into the From/To relationship during conversion but the amount of work saved on the database end my justify the cost on the front side of the conversion.

Hopefully I am not just running on about things that have already been discussed by your own development team.

auswipe

View Public Profile for auswipe

Find all posts by auswipe

UNIX for Dummies Questions & Answers

Importing a unix file dump into a PC capable database

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Load a file from UNIX into database

Discussion started by: Vivekit82

2. AIX

Issue while importing Oracle Dump File on AIX 5.3

Discussion started by: gunjan_thakur

3. Boot Loaders

Is Unetbootin capable of making any iso file bootable

Discussion started by: ravisingh

4. UNIX for Dummies Questions & Answers

Importing R cosine similarity to UNIX?

Discussion started by: A-V

5. UNIX for Dummies Questions & Answers

Import dump to database

Discussion started by: agarwal

6. Shell Programming and Scripting

getting dump from database by UNIX script

Discussion started by: arunkumar_mca

7. Shell Programming and Scripting

Importing dump file

Discussion started by: anushilrai

8. UNIX for Advanced & Expert Users

writing database tables into a file in unix

Discussion started by: dreams5617

9. UNIX for Advanced & Expert Users

building flat files in unix and importing them from windows

Discussion started by: tunirayavarapu

10. UNIX for Dummies Questions & Answers

importing database from unix to winnt

Discussion started by: sadiecutie