IBM And NVIDIA Power New Scale-Out Gear For AI

Accelerating deep learning (DL) training – on GPUs, TPUs, FPGAs or other accelerators – is in the early days scale-out architecture, like the server market was in the mid-2000s. DL training enables the advanced pattern recognition behind modern artificial intelligence (AI) based services. NVIDIA GPUs have been a major driver for DL development and commercialization, but IBM just made an important contribution to scale-out DL acceleration. Understanding what IBM did and how that work advances AI deployments takes some explanation.

Scale Matters

TIRIAS Research

Key Definitions

Inference scales-out. Trained DL models can be simplified for faster processing with good enough pattern recognition to create profitable services. Inference can scale-out as small individual tasks running on multiple inexpensive servers. There is a lot of industry investment aimed at lowering the cost of delivering inference, we’ll discuss that in the future.

The immediate challenge for creating deployable inference models is that, today, training scales-up. Training requires large data sets and high numeric precision; aggressive system designs are needed to meet real-world training times and accuracies. But cloud economics are driven by scale-out.

The challenge for cloud companies deploying DL-based AI services, such as Microsoft’s Cortana, Amazon’s Alexa and Google Home, is that DL training has not scaled well. Poor off-the-shelf scaling is mostly due to the immature state of DL acceleration, forcing service providers to invest (in aggregate) hundreds of millions of dollars in research and development (R&D), engineering and deployment of proprietary scale-out systems.

NVLink Scales-Up in Increments of Eight GPUs

GPU evolution has been a key part of DL success over recent years. General purpose processors were, and still are, too slow at processing DL math with large training data sets. NVIDIA invested early in leveraging GPUs for DL acceleration, in both new GPU architectures to further accelerate DL and in DL software development tools to enable easy access to GPU acceleration.

An important part of NVIDIA’s GPU acceleration strategy is NVLink. NVLink is a scale-up high-speed direct GPU-to-GPU interconnect architecture that directly connects two to eight GPU sockets. NVLink enables GPUs to train together with minimum processor intervention. Prior to NVLink, GPUs did not have the low-latency interconnect, data flow control sophistication, or unified memory space needed to scale-up by themselves. NVIDIA implements NVLink using its SXM2 socket instead of PCIe.

NVIDIA’s DGX-1, Microsoft’s Open Compute Project (OCP) Project Olympus HGX-1 GPU chassis and Facebook’s “Big Basin” server contribution to OCP are very similar designs that each house eight NVIDIA Tesla SXM2 GPUs. The DGX-1 design includes a dual-processor x86 server node in the chassis, while the HGX-1 and Big Basin designs must be paired with separate server chassis.

Microsoft’s HGX-1 can bridge four GPU chassis by using its PCIe switch chips to connect the four NVLink domains to one to four server nodes. While all three designs are significant feats of server architecture, the HGX-1’s 32-GPU design limit presents a practical upper limit for directly connected scale-up GPU systems.

TIRIAS Research

Microsoft HGX-1 motherboard with eight SXM2 sockets (four populated)

The list price for each DGX-1 is $129,000 using NVIDIA’s P100 SXM2 GPU and $149,000 using its V100 SXM2 GPU (including the built-in dual-processor x86 server node). While this price range is within reach of some high-performance computing (HPC) cluster bids, it is not a typical cloud or academic purchase.

Original Design Manufacturers (ODMs) like Quanta Cloud Technology (QCT) manufacture variants of OCP’s HGX-1 and Big Basin chassis, but do not publish pricing. NVIDIA P100 modules are priced from about $5,400 to $9,400 each. Because NVIDIA’s SXM2 GPUs account for most of the cost of both Big Basin and HGX-1, we believe that system pricing for both is in the range of $50,000 to $70,000 per chassis unit (not including matching x86 servers), in cloud-sized purchase quantities.

Facebook’s Big Basin Performance Claims

Facebook published a paper in June describing how it connected 32 Big Basin systems over its internal network to aggregate 256 GPUs and train a ResNet-50 image recognition model in under an hour with about 90% scaling efficiency and 72% accuracy.

While 90% scaling efficiency is an impressive achievement for state-of-the-art, there are several challenges with Facebook’s paper.

The eight-GPU Big Basin chassis is the largest possible scale-up NVIDIA NVLink instance. It is expensive, even if you could buy OCP gear as an enterprise buyer. Plus, Facebook’s paper does not mention which OCP server chassis design and processor model they used for their benchmarks. Which processor it used may be a moot point, because if you are not a cloud giant, it is very difficult to buy a Big Basin chassis or any of the other OCP servers that Facebook uses internally. Using different hardware, your mileage is guaranteed to vary.

Facebook also does not divulge the operating system or development tools used in the paper, because Facebook has its own internal cloud instances and development environments. No one else has access to them.

The net effect is that it is nearly impossible to replicate Facebook’s achievement if you are not Facebook.

TIRIAS Research

Facebook Big Basin Server

IBM Scales-Out with Four GPUs in a System

IBM recently published a paper as a follow-up to the Facebook paper. IBM’s paper describes how to train a Resnet-50 model in under an hour at 95% scaling efficiency and 75% accuracy, using the same data sets that Facebook used for training. IBM’s paper is notable in several ways:

  1. Not only did IBM beat Facebook on all the metrics, but 95% efficiency is very linear scaling.
  2. Anyone can buy the equipment and software to replicate IBM’s work. Equipment, operating systems and development environments are called out in the paper.
  3. IBM used smaller scale-out units than Facebook. Assuming Facebook used their standard dual-socket compute chassis, IBM has half the ratio of GPUs to CPUs – Facebook uses a 4:1 ratio and IBM uses a 2:1 ratio.

IBM sells its OpenPOWER “Minsky” deep learning reference design as the Power Systems S822LC for HPC. IBM’s PowerAI software platform with Distributed Deep Learning (DDL) libraries includes IBM-Caffe and “topology aware communication” libraries. PowerAI DDL is specific to OpenPOWER-based systems, so it will run on similar POWER8 Minsky-based designs and upcoming POWER9 “Zaius”-based systems (Zaius was designed by Google and Rackspace), such as those shown at various events by Wistron, E4, Inventec and Zoom.

PowerAI DDL enables creating large scale-out systems out of smaller, more affordable, GPU-based scale-up servers. It optimizes communications between GPU-based servers based on network topology, the capabilities of each network link, and the latencies for each phase of a DL model.

IBM used 64 Power System S822LC systems, each with four NVIDIA Tesla P100 SXM2-connected GPUs and two POWER8 processors, for a total of 256 GPUs – matching Facebook’s paper. Even with twice as many IBM GPU-accelerated chassis required to host the same number of GPUs as in Facebook’s system, IBM achieved a higher scaling efficiency than Facebook. That is no small feat.

TIRIAS Research

IBM Power System S822LC with two POWER8 processors (silver heat sinks) and four NVIDIA Tesla P100 SXM2 modules

Commercial availability of IBM’s S822LC for low volume buyers will be a key element enabling academic and enterprise researchers to buy a few systems and test IBM’s hardware and software scaling efficiencies. The base price for an IBM S822LC for Big Data (without GPUs) is $6,400, so the total price of a S822LC for High Performance Computing should be in the $30,000 to $50,000 ballpark (including the dual-processor POWER8 server node), depending on which P100 model is installed and other options.

Half the battle is knowing that something can be done. We believe IBM’s paper and product availability will spur a lot of DL development work by other hardware and software vendors.

— The author and members of the TIRIAS Research staff do not hold equity positions in any of the companies mentioned. TIRIAS Research tracks and consults for companies throughout the electronics ecosystem from semiconductors to systems and sensors to the cloud.

[“Source-forbes”]

After Musk Remark, Zuckerberg Shares One Reason Why He’s So Optimistic About AI

After Musk Remark, Zuckerberg Shares One Reason Why He's So Optimistic About AI

HIGHLIGHTS

  • The battle of billionaire geeks continues
  • After Musk insulted Zuckerberg, Facebook chief executive has responded
  • Zuckerberg says he remains optimistic about AI

Hours after billionaire Elon Musk made a public aspersion about Mark Zuckerberg’s knowledge, by saying Facebook chief executive’s understanding of artificial intelligence is “limited,” Zuckerberg has reaffirmed why he is so optimistic about the nascent technology. To recall, Musk was responding to Zuckberberg’s comments made during a Facebook Live broadcast, where the Facebook CEO called out naysayers.

In a public post, Zuckerberg congratulated his company’s AI research division for winning the best paper award at the “top” computer vision conference for research in “densely connected convolutional networks” subject.

In the same post, Zuckerberg shared “one reason” why he is so optimistic about AI. These efforts, he said, could bring “improvements in basic research across so many different fields — from diagnosing diseases to keep us healthy, to improving self-driving cars to keep us safe, and from showing you better content in News Feed to delivering you more relevant search results.”

“Every time we improve our AI methods, all of these systems get better. I’m excited about all the progress here and it’s potential to make the world better,” Zuckerberg said, whose company already uses a range of AI-powered tools to, among other things, serve relevant posts to around two billion people on the planet.

Zuckerberg’s remarks comes merely hours after Tesla and Space X founder and CEO Elon Muskcriticised Zuckerberg’s inability to foresee the evil side of artificial intelligence. Musk believes that all these AI efforts need to be regulated by the government, as otherwise there is a chance one day these AI-powered robots would kill humans, in what he describes as the “doomsday” scenario.

Over the weekend, in a Facebook Live session, Zuckerberg without calling out Musk, said “naysayer’s” predictions about “doomsday scenarios” were “irresponsible.” When a user asked about Musk’s views on Zuckeberg’s remarks, Musk tweeted Tuesday that he has spoken to Mark Zuckerberg and reached the conclusion that his understanding of AI is limited.

[“Source-gadgets.ndtv”]

Boltt Wants to Be Your Digital Fitness Coach With an AI Play

Boltt Wants to Be Your Digital Fitness Coach With an AI Play

HIGHLIGHTS

  • Boltt is offering a range of hardware products
  • The companion app comes with AI-enabled coach
  • Boltt products now available to pre-order in India

Although the fitness wearable craze seems to be slowing down now, you’ll still frequently run into people wearing some kind of device on their wrist, which tracks data such as steps, heart rate, and so on. One of the big problems is that the user tends to be clueless about what this data means. An Indian company, Boltt, wants to address this, with a companion app that quantifies and measures the recorded data, and then uses an AI-enabled coach to guide the user on the next steps, in order to cut down the abandonment rate for wearables.

“We saw was that over time, people were buying wearables but they didn’t know what to do with the data,” says Aayushi Kishore, co-founder at Boltt. “What do you do with 10,000 steps a day or burning 500 calories a day? This is where Boltt steps in.”

Boltt’s AI-enabled personal coach is called ‘B’, and is analyses data gathered by the wearable, including sleep, fitness, nutrition, and activity, offering customised guidance.

boltt coach boltt

Boltt’s AI fitness coach is available in the app as a text- and voice-based coach, offering insights when any of the company’s fitness devices (a smart band, shoes, and stride sensor) is connected with the app. The AI-enabled coach can provide customised and real-time coaching to users without the hassles of time or geography. The company is also offering third-party integration, which means that if a user has been using a Fitbit then that data can be synced to Boltt.

“A company in wearable segment usually has three elements: hardware, software, and services,” says Boltt’s founder, Arnav Kishore, a former tennis player. “Within the hardware part, we have about three categories of fitness wearables. We have got a form of sensor which is on your wrist for 24×7 tracking, second is a bunch of heart rate sensors as well, which can be on your chest, and the third one is stride sensor which tracks user’s biomechanical data like how fast one runs, and similar data.”

“The Stride sensor can be clipped on to your regular shoe or it could be within the embedded solution which is the smart shoe product that we have in our portfolio,” added [Arnav] Kishore.

“The sensors on our wearables are fundamentally tracking biomechanical data in the raw form,” he said. “But, the real magic lies within the software. Once the data is transmitted to your mobile application, all the inference and intelligence is happening from there on.”

boltt shoes 2 boltt

By focusing on the software guidance, Boltt wants to address one of the bigger issues in the wearables market – many people buy fitness trackers with the best intentions, but then simply stop using them.

“The idea behind this is using the raw data to dig out patterns,” says Arnav. “We believe that the only way a user will improve – or at least try to improve – is with utilisation of the data.”

There have been companies in India like Goqii that have tried coaching routes where they offered human coaches, but then there are limitations. The human coach’s biggest limitation is that they can comprehend only so much data in limited time.

“If at a software level, we can connect all the dots and give you automated feedback in a fraction of a second that’s where we think the future lies,” says Aayushi.

“We have tried to replicate the process of human thinking in the form of an artificial intelligent coach,” adds Arnav. “How that works is all the data that comes is typically seen by an expert who would take into account your current condition and how well are you performing. What’s your current fitness level is. This is, however, very limited when it comes to human mind.”

“The more we have injected this intelligence in the form of AI, machine learning, and cognitive computing that’s the reason why we are able comprehend so much data in fraction of second and give a user guidance in return,” he said.

boltt app screen boltt

At the same time, the Kishores reiterated that user privacy is of the utmost importance, and all data is stored locally on your device. “We can assure that the data cannot be seen by anyone except the Boltt team, to prevent misuse in any way,” says Arnav.

The current lineup of products covers ‘connected sneaker’, with embedded sensors, a stride sensor, which can be clipped to any shoe, and a fitness tracker smart band, which can track movement, sleep, and give activity reminders.

Boltt recently started taking pre-orders for its products via the company’s site, Boltt.com, instead of other e-commerce channels. However, the Kishores say that Boltt will be opening channels both online and offline as it progresses and that there is also a B2B component to its go-to-market strategy.

[“Source-gadgets.ndtv”]

Acquisitions Accelerate as Tech Giants Seek to Build AI Smarts

Acquisitions Accelerate as Tech Giants Seek to Build AI Smarts

HIGHLIGHTS
Major tech companies are betting big on artificial intelligence
As a result, they are ending up in acquiring more AI startups than ever
Apple, Google, Uber, Ford etc., have acquired sizeable number of startups
A total of 34 artificial intelligence startups were acquired in the first quarter of this year, more than twice the amount of activity in the year-ago quarter, according to the research firm CB Insights.

Tech giants seeking to reinforce their leads in artificial intelligence or make up for lost ground have been the most aggressive buyers. Alphabet Inc’s Google has acquired 11 AI startups since 2012, the most of any firm, followed by Apple Inc, Facebook Inc and Intel Corp, respectively, according to CB Insights.

The companies declined to comment on their acquisition strategies. A spokesman for Apple did confirm the company’s recent purchase of Lattice Data, a startup that specialises in working with unstructured data.

The first quarter also saw one of the largest deals to date as Ford Motor Co invested $1 billion in Argo AI, founded by former executives on self-driving teams at Google and Uber Technologies Inc.

Startups are looking to go deep on applications of artificial intelligence to specific fields, such as health and retail, industry observers say, rather than compete directly with established companies.

“What you will see is very big players will build platform services, and startup communities will migrate more to applied intelligent apps,” said Matt McIlwain, managing director of Madrona Venture Group.

Healthcare startup Forward, for example, is using artificial intelligence to crunch data that can inform doctors’ recommendations.
“For people who really want to focus on core AI problems, it makes a lot of sense to be in bigger companies,” said Forward Chief Executive Officer Adrian Aoun, who previously worked at Google. “But for folks who really want to prove a new field, a new area, it makes more sense to be separate.”

Artificial intelligence companies that do remain independent field a steady stream of suitors: Matthew Zeiler, chief executive of Clarifai, which specialises in image and video recognition, said he has been approached about a dozen times by prospective acquirers since starting the company in late 2013.

Clarifai’s pitch to customers such as consumer goods company Unilever Plc and hotel search firm Trivago is bolstered by its narrow focus on artificial intelligence.

“(Google) literally competes with almost every company on the planet,” Zeiler said. “Are you going to trust them with being your partner for AI?”

Tech giants have been locked in a bidding war for academics specializing in artificial intelligence. Startups rarely have the capital to compete, but a company with a specialized mission can win over recruits, said Vic Gundotra, chief executive of AliveCor, which makes an AI-driven portable heart monitor.

“They say, ‘I want to come here and work on a project that might save my mother’s life,'” Gundotra said.

© Thomson Reuters 2017

For the latest tech news and reviews, follow Gadgets 360 on Twitter, Facebook, and subscribe to our YouTube channel.
Tags: AI, AI Startups, Artificial Intelligence, Uber, Ford, Facebook, Apple, Alphabet, Startups, Apps, Science, Intel

[“Source-ndtv”]