08-22-2016
They should be indeed but too bad that different manufacturers can use the same code somehow. For example EAN 7636490074196 which is a Seagate 2TB SSD but also a Lacie external HDD.
The problem is not in the joining itself but due too the problem that it is used multiple times.
I am not sure if the explaination will be sufficient but i will try.
The script we use for joining on barcodes needs to be adapted/changed so that it will check the manufacturer of both files against a third file where all the different names are written. This third file is pure to have manufacturers like HP, WD etc caught without them getting ignored each time.
To describe it in steps:
- Each line from the supplier file gets matched against the website file. With this is the complete line
- Now the manufacturerfrom the website file gets checked against the manufacturer file so it can check how the manufacturer can be written by different suppliers.
- The line that matches from those 2 has to be checked against the manufacturer from the supplier.
- If this are the same it continues with the normal loop. If there is a difference it needs to be written to a new file which will be picked up for manual checking.
- This manual file needs to have the full line from website and supplier in them so we can check if its a spelling error or a ean error.
I hope this clarifies it a bit.
Last edited by rbatte1; 09-14-2016 at 05:53 AM..
Reason: Converted text numbered list to formatted number-list
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
I need a little help as I am a complete novice at scripting in unix. However, i am posed with an issue... i have two csv files in the following format@
FILE1.CSV:
HEADER
HEADER
Header
, , HEADER
001X ,,200
002X ,,300
003X ,,300
004X ,,300
FILE2.CSV:
HEADER
HEADER
Header
, ,... (3 Replies)
Discussion started by: chachabronson
3 Replies
2. UNIX for Dummies Questions & Answers
I need a script (perl or awk..anything is fine) to join 3 files based on three key columns. The no of non-key columns can vary in each file. The columns are delimited by semicolon.
For example,
File1
Dim1;Dim2;Dim3;Fact1;Fact2;Fact3;Fact4;Fact5
---- data delimited by semicolon ---
... (1 Reply)
Discussion started by: Sebben
1 Replies
3. Shell Programming and Scripting
Hello, I know how to join multiple files using the cat function. I want to do something a little more advanced. Basically I want to put the filename in the first column...
One thing to note is that the file is tab delimited.
e.g.
file1.txt
joe 1 4 5 6 7 3
manny 2 3 4 5 6 7
... (4 Replies)
Discussion started by: phil_heath
4 Replies
4. UNIX for Dummies Questions & Answers
Hi guys,
I am a forum (and a bit of a unix) newbie, and I currently have a tricky problem lying ahead of me. I have multiple files, and I am looking to join the files on the first column.
Example:
File 1
andy b 100
amy c 200
amy d 300
File 2
andy c 200
amy c 100
clyde o 50
... (3 Replies)
Discussion started by: jdr0317
3 Replies
5. Shell Programming and Scripting
I am trying to execute the following command to check the existance of a file (which has a date timestamp on it). If there are more than one file, then also it should give me 'success' result.
if
then
<do some work>
else
<no files>
fi
Since there are more than one... (18 Replies)
Discussion started by: vivek_damodaran
18 Replies
6. Shell Programming and Scripting
Hi,
I have nine files looking similar to file1 & file2 below.
File1:
1 ABCA1
1 ABCC8
1 ABR:N
1 ACACB
1 ACAP2
1 ACOT1
1 ACSBG
1 ACTR1
1 ACTRT
1 ADAMT
1 AEN:N
1 AKAP1File2:
1 A4GAL
1 ACTBL
1 ACTL7 (4 Replies)
Discussion started by: seqbiologist
4 Replies
7. Shell Programming and Scripting
Hi! I need to learn that how a shell script can transverse a csv file n check if any field is empty or not. means its contains two comma or space b/w commas i.e., "" or " ".
can anyone help me out how I can do that.... (10 Replies)
Discussion started by: sukhdip
10 Replies
8. Shell Programming and Scripting
Hi again,
I have monthly one-column files of roughly around 10 years. Is there a more efficient way to concatenate these files column-wise other than using paste command? For instance:
file1.txt
12
13
15
12
file2.txt
14
15
18
19
file3.txt
20
21 (8 Replies)
Discussion started by: ida1215
8 Replies
9. UNIX for Dummies Questions & Answers
Hello again,
I am trying to join 3rd column of 3 files into the end on one file and save it separately... my data looks like this
file 1
Bob, Green, 80
Mark, Brown, 70
Tina, Smith, 60
file 2
Bob, Green, 70
Mark, Brown, 60
Tina, Smith, 50
file 3
Bob, Green, 50
Mark, Brown,60
Tina,... (6 Replies)
Discussion started by: A-V
6 Replies
10. UNIX for Beginners Questions & Answers
Hello All,
just wanted to export multiple tables from oracle sql using unix shell script to csv file and the below code is exporting only the first table.
Can you please suggest why? or any better idea?
export FILE="/abc/autom/file/geo_JOB.csv"
Export= `sqlplus -s dev01/password@dEV3... (16 Replies)
Discussion started by: Hope
16 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)
NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS
--predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO
bup-midx(1), bup-save(1)
BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown- bup-margin(1)