Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


SAS or SSD for Ubuntu 14.04 and data analysis


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
SAS or SSD for Ubuntu 14.04 and data analysis

I am in the process of building a workstation and have a question related to performance. I am a scientist who deals with big data (average file size 30-50gb). My OS is ubuntu 14.04 and so far I have a 128gb dual xeon E5-2630 with 6 cores each. I/O buffering is an issue so I am adding a 256/512? PCIe card and either 2 SSD or SAS drives for the OS and software. Since the PCIe will be separate its main purpose will be for file transfer, so would a SAS or SSD be a better fit for the OS? I am leaning towards SAS for the buffering issue, but wanted to ask more knowledgeable users. I forgot to mention that there will be a separate 1 or 2TB drive. Any recommendations for the size of the SAS or SSD? Thanks Smilie.
# 4  
SSD is more than an order of magnitude (or much) faster than SAS high-rpm disks.
SSD is limited - usually to 1-2 TB of storage. With 128GB of memory, you could easily use SSD disks to load whatever file you want into memory - e.g., usual term is a RAMDISK. Ubuntu supports this. It also caches files very effectively without much human intervention other than configuration.

Learn about pdflush: The Linux Page Cache and pdflush

There is also vmtouch. You can force any file to be read entirely into memory. Which would definitely favor SSD.

https://hoytech.com/vmtouch/ Also note some other tools on that site.

So, I would suggest: SSD's and vmtouch (or an analagous tool.)
This User Gave Thanks to jim mcnamara For This Post:
# 6  
Not directly related but i had a longer workshop yesterday about our new storage system (EMC VMax 200k). EMC claims that they had intended the 300GB 15k-SAS drives for high-performance, but phase them out now because (quoting from memory) with the development of Flash-SSDs its just not worth it any more. They also claim that, because they use SLC-based hardware, they have even lower rates of disk-replacement, even in heavy-duty transactional storage systems, than with rotational disks, to which a much lower energy consumption of the SSDs compared to the 15k-SAS disks contributes. There is simply less heat involved and that shows when you pack some ~2500 disks into a rack.

You haven't said where you are going to place the workstation, but in case it is going to be somewhere near your desk: 15k-disks are awefully LOUD in addition to be premier heating devices while SSDs are completely silent.

I hope this helps.

bakunin
These 2 Users Gave Thanks to bakunin For This Post:
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Data analysis, Regular Expression - Unix
@man
Hey every one! I have a dataset like this : 1 100 1 0 5 100 1 8 7 50 1 0 7 100 2 0 10 20 1 8 10 30 1 8 10 100 3 8 15 50 5 0 20 90 1 0 20 99 9 0 I wanna check if the 4th column is 0 or 8 If it's zero write the 1st column itself, if it's 8 write sum of 1st and second something...... UNIX for Dummies Questions & Answers
2
UNIX for Dummies Questions & Answers
Help with analysis data based on particular column content
perl_beginner
Input file: Total_counts 1306726155 100% Number_of_count_true 855020282 Number_of_count_true_1 160014283 Number_of_count_true_2 44002825 Number_of_count_true_3 18098424 Number_of_count_true_4 24693745 Number_of_count_false 115421870 Number_of_count_true 51048447 Total_number_of_false ...... Shell Programming and Scripting
2
Shell Programming and Scripting
What is the best tools for performance data gathering and analysis?
devyfong
Dear Guru, IHAC who complaint that his CentOS is getting performance issue. I have to help him out of there. Could you please tell me which tools is better to gathering the whole system performance data? -- CPU/Memory/IO(disk & Network)/swap I would like the tools could be...... Red Hat
6
Red Hat

Featured Tech Videos