The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
getting dump from database by UNIX script arunkumar_mca Shell Programming and Scripting 1 07-13-2006 12:27 PM
Importing dump file anushilrai Shell Programming and Scripting 2 03-20-2006 06:56 AM
writing database tables into a file in unix dreams5617 UNIX for Advanced & Expert Users 3 03-22-2005 01:31 PM
building flat files in unix and importing them from windows tunirayavarapu UNIX for Advanced & Expert Users 3 08-09-2004 09:54 AM
importing database from unix to winnt sadiecutie UNIX for Dummies Questions & Answers 4 08-16-2001 01:04 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 07-23-2002
DaltonF DaltonF is offline
Registered User
  
 

Join Date: Jul 2002
Location: Nashville, Tn
Posts: 4
Importing a unix file dump into a PC capable database

My development team has been trying to figure out how to import a unix data dump into SQL Server or convert it into an intermediate file format for several days.

The data dump in question looks like this:
$RecordID: 1<eof>
$Version: 1<eof>
Category: 1<eof>
Poster: John Doe<eof>
ProductName: Test Product<eof>
SKU: 10045689<eof>
Line1: Test Product Line 1 Description<eof>
Line2: Test Product Line 2 Description<eof>
Comments: Test Product Comments<eof>
<eor>

There are nearly 100,000 of records that have nearly 4,000 fields that vary based on the product's category. The field order varies per record within each category. When data does not exist for a given field, the field/value pair is simply excluded for the dump.

We were going to write a parsing application that converted this dump to XML, read it into a dataset, and then uploaded the dataset. About half way through that development, however, I realized that the parsing program would require a minimum of eight gigabytes to run. That obviously won't work.
(100,000 records X 4,000 fields = 400,000,000 fields) X 20 bytes per field name = 8,000,000,000 or 8 billion bytes

Do you know anyone that could tell us an easier way to import this unix dump into SQL? I'm sure there is a standard way of dealing with these dumps, but no one on our team has any experience with unix.

Any help or referral would be GREATLY APPRECIATED. This problem is holding up our entire development process.

Sincerely,

Dalton D. Franklin, MCP
Chief Executive Officer
Simplicity Technology
http://www.simplicitycorp.net
daltonf@simplicitycorp.net
615-327-9797 Telephone
615-985-0060 Fax
  #2 (permalink)  
Old 07-23-2002
Marc Rochkind
Guest
  
 

Posts: n/a
Bits: 0 [Banking]
Some comments:

1) However the conversion/translation is done, you want to do it using an on-the-fly approach, not by building a giant data structure (XML or whatever) that contains all the data. If you are using XML, there are two general approaches that are used: one builds a tree, and the other processes the XML piecewise. You want the latter.

2) Some more information about the eventual resting place would be helpful. Do you have a single relation defined with 4000-or-so columns?

3) Assuming a simple table such as #2, what I would think about is writing simple program to turn the original file into a sequence of DML INSERT statements to load the database.

I could help you figure this out if I had more information about the input, the database schema, and the tools available. If you want, you can contact me offline.
  #3 (permalink)  
Old 07-23-2002
DaltonF DaltonF is offline
Registered User
  
 

Join Date: Jul 2002
Location: Nashville, Tn
Posts: 4
Dalton Franklin

OK here's the situation..

One of my larger suppliers is giving me detailed information about the products they sell for my website. This information was submitted to them by the MANUFACTURERS of those products.

There are roughly 200 categories. Each category has a different number of fields, but the maximum number of fields in any category is 80 (all categories have at least one field).

We are going to put this data into a ProductDetails table. In addition to regular product data (SKU, ManufacturerID, etc.), there will be eighty varchar fields named "Field1", "Field2", etc. An XML file will map those fields to the custom field names defined for each category.

We pondered completely normalizing the database by using a one-to-one related table for each category; but that would overcomplicated the parsing and this works fine, I belive.

The records in the dump are NOT sorted by field.

Each record in the same category does NOT have the same number of fields; fields that were not filled out by the manufacturer are not shown at all.

Each record in the same category does NOT use the same field order. (I'm thinking XML will solve this problem if I can get the file into XML)

Please email me at daltonf@simplicitycorp.net, if you don't mind. I would be glad to provide you with detailed information about the file, or a copy of it if you like. I just can't post it here on this discussion board for legal reasons.

Thanks very much and have a great day.
  #4 (permalink)  
Old 07-23-2002
Perderabo's Avatar
Perderabo Perderabo is offline Forum Staff  
Unix Daemon
  
 

Join Date: Aug 2001
Location: Ashburn, Virginia
Posts: 9,111
Sorry to be a pain, but, as a moderator, one of my duties is to mention our rules. In particular we seem to be straying from:
Quote:
(10) Don't post your email address and ask for an email reply. The forums are for the benefit of all, so all Q&A should take place in the forums.
  #5 (permalink)  
Old 07-23-2002
auswipe's Avatar
auswipe auswipe is offline Forum Advisor  
Registered User
  
 

Join Date: Nov 2001
Location: Wide Awake Wylie, Texas
Posts: 535
How big is the non-XML dump file?

I am thinking that Perl with Win32::ODBC under Win32 could be real handy for processing the dump piece-wise and insert into SQL Server. I don't believe that Win32::ODBC exists under Unix so it would have to be done under Win32 ActivateState Perl.

You could have one hash that was the definition of <SomeField> into <Field_sub_x> relationship and load a hash table time and time again, translate the old field names into the new field names and insert via ODBC into the desired table.

But, due the the programability of perl, you could actually dump the products intelligently into normalized tables based upon product type (if the product types are discernable based upon fieldname, catalog numbers, etc).

The Win32::ODBC isn't the fastest thing around, but it could be used for this task.
  #6 (permalink)  
Old 07-23-2002
DaltonF DaltonF is offline
Registered User
  
 

Join Date: Jul 2002
Location: Nashville, Tn
Posts: 4
Importing a Unix file dump into SQL Server

The vendor is getting back to me on the size of the full file (right now I'm working with a sample).

I do know that there are approximately 100,000 products listed in field:value pairs.

In terms of database design, this data is going to be used to create a web store that will allow users to compare products in the same category by searching, filtering, or viewing the products' specifications.

Do you think it would be better to:
(a) Have one Product_Details table that had the fields present in all categories and "Field1", "Field2", etc. with XML defining the names of the fields based on the category in the web front-end

OR

(b) Have multiple Product_Details tables with all of the common product detail field and category-specific fields.
(Each product would be related to only one Product_Details table, in this case, so this latter method SHOULDN'T take up any more space)

Thank you for your help in advance.
  #7 (permalink)  
Old 07-23-2002
auswipe's Avatar
auswipe auswipe is offline Forum Advisor  
Registered User
  
 

Join Date: Nov 2001
Location: Wide Awake Wylie, Texas
Posts: 535
Hmmmmmm....

Maybe you could create a single UberTable with a union of all the fields in the database that are intelligently populated for similiar properties and do away with an XML based data-dictionary (I could see that getting pretty nitty gritty quicky). The downside would be that the table would have a bunch of columns that may not be fully utilized and could be rather large in size but the upside would be that the columns would have actual names without an extra lookup during the query process and be easier for humans to follow.

The real question is how disparate the meta-data for each category really is. It wouldn't surprise me if the categories had a lot of similiar data... "Catalog_Number" is the same as "CatNo" is the same as "Cat_No" (et cetera, et cetera). If you could find the intersection of the common fields and then just append the balance to the table you could reduce total number of columns and be easier to query and index the database.

I am trying to remember the proper mathematical terminology, but my Discrete Math is a tad bit rusty at the moment.

The parser would have to have some logic to split the fields into the proper columns. Instead of one hash, one hash per category/manufacturer/grouping that contains the From/To when populating the UberTable. A lot of work would have to be put into the From/To relationship during conversion but the amount of work saved on the database end my justify the cost on the front side of the conversion.

Hopefully I am not just running on about things that have already been discussed by your own development team.
Closed Thread

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 08:34 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0