![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| getting dump from database by UNIX script | arunkumar_mca | Shell Programming and Scripting | 1 | 07-13-2006 12:27 PM |
| Importing dump file | anushilrai | Shell Programming and Scripting | 2 | 03-20-2006 06:56 AM |
| writing database tables into a file in unix | dreams5617 | UNIX for Advanced & Expert Users | 3 | 03-22-2005 01:31 PM |
| building flat files in unix and importing them from windows | tunirayavarapu | UNIX for Advanced & Expert Users | 3 | 08-09-2004 09:54 AM |
| importing database from unix to winnt | sadiecutie | UNIX for Dummies Questions & Answers | 4 | 08-16-2001 01:04 AM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
Importing a unix file dump into a PC capable database
My development team has been trying to figure out how to import a unix data dump into SQL Server or convert it into an intermediate file format for several days.
The data dump in question looks like this: $RecordID: 1<eof> $Version: 1<eof> Category: 1<eof> Poster: John Doe<eof> ProductName: Test Product<eof> SKU: 10045689<eof> Line1: Test Product Line 1 Description<eof> Line2: Test Product Line 2 Description<eof> Comments: Test Product Comments<eof> <eor> There are nearly 100,000 of records that have nearly 4,000 fields that vary based on the product's category. The field order varies per record within each category. When data does not exist for a given field, the field/value pair is simply excluded for the dump. We were going to write a parsing application that converted this dump to XML, read it into a dataset, and then uploaded the dataset. About half way through that development, however, I realized that the parsing program would require a minimum of eight gigabytes to run. That obviously won't work. (100,000 records X 4,000 fields = 400,000,000 fields) X 20 bytes per field name = 8,000,000,000 or 8 billion bytes Do you know anyone that could tell us an easier way to import this unix dump into SQL? I'm sure there is a standard way of dealing with these dumps, but no one on our team has any experience with unix. Any help or referral would be GREATLY APPRECIATED. This problem is holding up our entire development process. Sincerely, Dalton D. Franklin, MCP Chief Executive Officer Simplicity Technology http://www.simplicitycorp.net daltonf@simplicitycorp.net 615-327-9797 Telephone 615-985-0060 Fax |
|
||||
|
Some comments:
1) However the conversion/translation is done, you want to do it using an on-the-fly approach, not by building a giant data structure (XML or whatever) that contains all the data. If you are using XML, there are two general approaches that are used: one builds a tree, and the other processes the XML piecewise. You want the latter. 2) Some more information about the eventual resting place would be helpful. Do you have a single relation defined with 4000-or-so columns? 3) Assuming a simple table such as #2, what I would think about is writing simple program to turn the original file into a sequence of DML INSERT statements to load the database. I could help you figure this out if I had more information about the input, the database schema, and the tools available. If you want, you can contact me offline. |
|
||||
|
Dalton Franklin
OK here's the situation..
One of my larger suppliers is giving me detailed information about the products they sell for my website. This information was submitted to them by the MANUFACTURERS of those products. There are roughly 200 categories. Each category has a different number of fields, but the maximum number of fields in any category is 80 (all categories have at least one field). We are going to put this data into a ProductDetails table. In addition to regular product data (SKU, ManufacturerID, etc.), there will be eighty varchar fields named "Field1", "Field2", etc. An XML file will map those fields to the custom field names defined for each category. We pondered completely normalizing the database by using a one-to-one related table for each category; but that would overcomplicated the parsing and this works fine, I belive. The records in the dump are NOT sorted by field. Each record in the same category does NOT have the same number of fields; fields that were not filled out by the manufacturer are not shown at all. Each record in the same category does NOT use the same field order. (I'm thinking XML will solve this problem if I can get the file into XML) Please email me at daltonf@simplicitycorp.net, if you don't mind. I would be glad to provide you with detailed information about the file, or a copy of it if you like. I just can't post it here on this discussion board for legal reasons. Thanks very much and have a great day. |
|
|||||
|
Sorry to be a pain, but, as a moderator, one of my duties is to mention our rules. In particular we seem to be straying from:
Quote:
|
|
||||
|
Importing a Unix file dump into SQL Server
The vendor is getting back to me on the size of the full file (right now I'm working with a sample).
I do know that there are approximately 100,000 products listed in field:value pairs. In terms of database design, this data is going to be used to create a web store that will allow users to compare products in the same category by searching, filtering, or viewing the products' specifications. Do you think it would be better to: (a) Have one Product_Details table that had the fields present in all categories and "Field1", "Field2", etc. with XML defining the names of the fields based on the category in the web front-end OR (b) Have multiple Product_Details tables with all of the common product detail field and category-specific fields. (Each product would be related to only one Product_Details table, in this case, so this latter method SHOULDN'T take up any more space) Thank you for your help in advance. |
|
|||||
|
Hmmmmmm....
Maybe you could create a single UberTable with a union of all the fields in the database that are intelligently populated for similiar properties and do away with an XML based data-dictionary (I could see that getting pretty nitty gritty quicky). The downside would be that the table would have a bunch of columns that may not be fully utilized and could be rather large in size but the upside would be that the columns would have actual names without an extra lookup during the query process and be easier for humans to follow. The real question is how disparate the meta-data for each category really is. It wouldn't surprise me if the categories had a lot of similiar data... "Catalog_Number" is the same as "CatNo" is the same as "Cat_No" (et cetera, et cetera). If you could find the intersection of the common fields and then just append the balance to the table you could reduce total number of columns and be easier to query and index the database. I am trying to remember the proper mathematical terminology, but my Discrete Math is a tad bit rusty at the moment. ![]() The parser would have to have some logic to split the fields into the proper columns. Instead of one hash, one hash per category/manufacturer/grouping that contains the From/To when populating the UberTable. A lot of work would have to be put into the From/To relationship during conversion but the amount of work saved on the database end my justify the cost on the front side of the conversion. Hopefully I am not just running on about things that have already been discussed by your own development team. |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|