Tuesday, April 24, 2018

Frankencomputer for deep learning: running ubuntu 16.04 on a Dell Precision T5500

Choosing a second hand computer:

As my previous laptop died of heat, I was looking for a new machine with a tight budget. Having read the post from Denys Katerenchuck on a budget friendly PC for deep learning it was tempting find an old machine capable of powering a gpu such a GTX 960 or better.
Old workstations are capable of powering such GPU. The Dell Precision T series are valuable since they comes with a 875 W PSU in the case of the T5500.

A lot of peoples tweak this model  for gaming (thanks for the link), for example this one. Several videos show such Dell Precision workstations running games:
    I bought on Ebay a Dell Precision T5500 (12 Gb RAM, xeon 5520 CPU, no HD, no cable, 875W PSU) for 133€ (~100€ for the machine and ~30€ for the transport from UK to France).
    3x4 Gb RAM
    Dust included

    PCI slots

    When first connected to a TV as monitor (no HD at this stage), an error message was displayed (due to the size of the screen?):


    Fortunately, initial tests show no hardware problems:


    A 128 Gb SSD from a previous laptop computer was connected on SATA1:


    Ubuntu 16.04 was installed from a usb-key. Installing Ubuntu took 15 minutes.
    Some BIOS settings were modified to boot on the SSD. SATA-1 was checked and other settings were unchecked:


     RAID config was switched from "RAID-on" to "RAID auto detect/ AHCI":
    Powering up looks like:
     

    If needed, the boot-info is here.

    CPU

    This Precision T5500 model comes with a 0D883F motherboard accepting E55xx or E56xx CPU, this is not the best possible choice for a T5500, currently, it has one Xeon E5520 CPU.

    GPU card:

    The graphic card provided with the computer can't be used for deep learning, even not for 3D with geogebra !! It's a:
     VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV635 [Radeon HD 3650/3750/4570/4580]


    in 2016, Denys Katerenchuck recommended the GTX 960 4Gb as a good tradeoff between price and performances. In 2018 such card can be found on ebay for less than 150€. The Dell T5500 can power a GTX 1050 from its PCI slot. It has also a six pins power connector (left) to power an EVGA GTX 1060 for example.

    There's  an interresting post regarding powering GPU and 6/8 pins cable. Again, according to Akshat Verma (see the answer about T5500) it seems possible to plug a GTX 1070 in a T5500.
    Somes reported that they installed a GTX 960 and even a 1080.

    BIOS


    Dell splash screen visible on first boot
    The bios version is a A02 from 2009.  Dell provides updates for A02 both for windows and linux (for redhat, not ubuntu). It seems that the BIOS has to be upgraded in order to install a recent GPU.
    Questions about BIOS version and GTX960 can be found, The BIOS upgrade is a recurrent question.

    BIOS version history

    Latest version is A17

     

    - no title specified
    Last
    Update
    BIOS

    Fixes & Enhancements

    07 Jun 2013
    1.Enhanced Broadcom onboard NIC 5761 support.
    2.Enhanced CCTK support for Turbo Boost setting.
    27 Feb 2013
    1.Enhanced PSA support.
    2.Update PSA to A4527.
    3.Enhanced Dual CPU memory configurations support.
    04 Dec 2012
    1.Enhanced Windows 8 support.
    07 Jul 2012
    1.Add High IO performance option in Setup.
    2.Disable QPI L1 when C-States have been disabled.
    3.Add the changes that ties ASPM to the C-State switch.
    4.Add TSEG protection feature ENABLE_TSEG_SECURITY.
    07 Jul 2012
    1.Update new RAID OROM v10.8.0.1303.
    03 Nov 2011
    1.Enhanced TPM support.
    03 Nov 2011
    The following changes have been made to BIOS A09 to A10:

    1. Added support for newer Processors.
    2. Added support for bridged AGP video cards.
    3. Updated the copyright to 2003.
    4. Fixed UDMA support for 48-bit LBA hard drives (> 137 GB).
    5. Fixed problem where system would occasionally shut off when Ctrl-Alt-Del is pressed.
    6. Free up unused portions of E000:0000 to E000:FFFF memory for memory managers to use.
    07 Jul 2012
    1.Added new Hard Disk master password algorithm support.
    2.Added support to install single graphic card in 2nd PCIEx16 slot.
    3.Enhanced PCIE Slot interrupt handling.
    4.Enhanced the memory map algorithm for 128GB configuration.
    07 Jul 2012
    1.Enhanced method to check the failure memory
    2.Updated ACPI SRAT table information
    07 Jul 2012
    1. Enhanced security device support
    07 Jul 2012
    1. Updated Intel Xeon?Processor 5600/3600 Series microcode to rev 10.
    2. Updated Intel Xeon Processor 5500/5600 Platform Reference Code to revision P2.91.
    3. Enhanced TPM remotely provision.
    4. Enhance VT-d
    07 Jul 2012
    1. Updated to the latest Intel (R) Xeon (R) Processor 5600 Series microcode
    07 Jul 2012
    1. Supported E5620, E5630, E5640, X5650, X5660, X5667, X5670, X5677, X5680, W3680 CPUs.
    2. Updated Intel Xeon?500/5600 Platform Reference Code to revision P2.7
    3. Removed S1 support.
    4. Removed "Optional HDD fan" support.
    5. Reported riser's DIMMs information(asset tag, serail number) in SMBIOS.
    6. Used the same fan setting before and after S3.
    7. Enhanced the compatibility with certain PCI-Express Gen1 cards.
    8. Enhanced NUMA under RHEL5.3.
    9. Removed HDD Acoustic support.
    07 Jul 2012
    1. Added error detection for bad monitors or cables when entering setup menu.
    2. Enhanced PCIe bar allocation.
    3. Enhanced algorithm for Graphics card with multiple OPROMs.
    4. Updated BIOS fan descriptions.
    5. Fixed BIOS cannot boot from SATA CD/DVD when setting USB controller to "No Boot" in BIOS setup menu.
    07 Jul 2012
    1. Implemented fix for intermittant boot issue with 6.4 GT/s CPUs.
    2. Added feature to display a message if the DIMM configuration is not optimal.
    3. Updated Intel(R) Memory Reference Code.
    4. Added microcode update revision 11 for Nehalem D0-step.
    5. Fixed boot issue with RAID and ATI FireMV 2450.
    6. Added TCM support.
    7. Added updated IO programming.
    8. Fixed possible hang condition when VT-d is enabled.
    9. Moved "Optional HDD Fan" to the "Post Behavior" section in Setup.
    10. Updated SMBIOS tables.
    11. Added support for Windows 7.
    12. Updated fan settings.
    13. Corrected memory channel information in Setup.
    14. Improved the allocation of system resources.
    Testimony of a user showing he had a hard time when he tried to upgrade the firmware. The user reported that : "Oh I should state that when I got the machine it had bios version A5 on it. I was able to get to A9, but I had to flash each BIOS in order to get to that point."
    Different feedbacks from the web, advice to upgrade the BIOS directly from the current version to the last version

    Upgrading from windows yields:


    From Ubuntu or freeDos:

    There are different possible ways to upgrade the BIOS:
    By the way, The T5500 refused to boot from a freedos usb stick up to now ...

    Motherboard versions 

    It seems  that the motherboard can comes in different flavours:
    • D883F
    • CRH6C
    • W1G7K 
    The most capable motherboard seems to be the CRH6C model supporting up to X5690 xeon CPU. One user reported the use of two X5660 on the D883F model with the latest BIOS (shoud be A16).

     Upgrading the GPU

    An asus GTX 960 4Gb turbo was waited:

    The card is long for a T5500, some place has to be made to plug it in the case:
    The blue plastic locker was removed
    The hard disc support was partially cut and bent.
    As the T5500 was powered on, it just boot (from a BIOS A02!) and open a linux session (with a nouveau driver):

    So no firmware update!

    Some softwares were installed (nvidia ppa, driver 384-130...) and this yields:
     
    the sensors yields the same info regarding the GPU temperature:

    CUDA 9 + cuDNN 7 on ubuntu 16.04

    Several tuto are available to install cuda 9 + cudnn:
    After installation, the nvidia driver was updated to 396.26:

    Checking install

    Copy the cuda sample files somewhere to  the home directory and build them:


    Launch deviceQuerry for example:


    The GTX 960 card (PCIe 3) is plugged on a PCIe 2 16x slot. How the card and the computer communicate? May be bandWidthTest can bring some insight: