It is hard to imagine a software tool more tied to Big Data than TensorFlow. Originating from within the Google’s Artificial Intelligence (AI) organization, TensorFlow is a tool for processing large amounts of historical (training) data in order to make inferences regarding new data. TensorFlow helps
search engines integrate new data quickly into search results. It also enables individuals to automate the search through pictures on their phone to identify friends and family. TensorFlow was built by engineers working at the forefront of big data applications to address the challenge of classifying new information in near real-time. Although originally used only internally at Google, this project was released under an open-source Apache 2.0 license on November 9th, 2015. Today it ranks as the #1 machine learning technology with a 45% market share as measured by Datanyze.com.
To understand what makes this technology interesting and powerful, we should begin by looking at the nature of Big Data. You can be easily forgiven for thinking “Big Data” refers simply to large volumes of data. In fact, “Big Data” generally refers to not just the volume of data, but also the frequency of receiving the data (its velocity), as well as the difficulty in processing the data due to a lack of unifying structure (its variety). When we view Big Data in this way, we can understand that even mobile phones and small laptop computers could benefit from Big Data solutions.
In its most basic form, TensorFlow is a tool that helps individuals organize their data. What makes TensorFlow so unique is its ability to optimize the assignment of new data into categorical structures very quickly. The organization and categorization of data is built through a supervised training process using Neural Networks (the term refers to the fact that these networks resemble the way our brain processes data). Once this supervised training process is complete, TensorFlow can deploy models within applications on mobile phones to help people categorize their personal information.
As discussed, the term Big Data refers to the volume, velocity, and variety of data that requires processing. What makes Big Data, well, BIG, is that prior to the development of specific tools that addressed these challenges, processing this data with enough frequency to categorize information in a timely matter was not possible. Even with the advances available today, initial training for large neural networks can take thousands of hours using powerful servers.
How Tensors Change the Game
TensorFlow is a tool that developers can use to manage the relationships between complex geometric objects (termed Tensors). Complex geometric objects, or Tensors, allow physicists to solve problems in areas such as fluid dynamics and elasticity. Applying these mathematical principles, computer scientists can quantify relationships between digital objects and examine how these digital data objects interact and relate to each other conceptually.
Fluid dynamics requires quantifying how high velocity and high volume streams of fluid, such as water, interact with a variety of different materials and obstacles (such as rocks in a river). Let us take a white-water river as an example. A river will have rocks and other obstacles that large amounts of water will interact with as it falls downhill. Each obstacles in a river will cause the water to change direction and speed. You can imagine how difficult it would be if you tried to calculate the effects of obstacles on each water molecule, one-at-a-time, in order to understand the flow of the entire river. Although this would indeed provide a very precise model of the river, it would take so long that the river would likely have changed by the time the processing was completed (perhaps a tree fell into the river upstream).
Obviously, we don’t need to model each and every water molecule in order to get a solid understanding of the river flow. Instead, our brains understand the river by simplifying the input and making generalizations. Computer central processing units (CPUs), historically optimized for precision and linear speed, are poorly suited to address problems that involve massive numbers of simultaneous calculations and generalizations. Yet these parallel and generalized calculations are precisely what is required for realistic graphics within video games. Video game developers faced the same problem as physicists measuring fluid dynamics within a river. Specifically, 1) precision was less important than processing information quickly, and 2) all calculations needed to be processed and available at the same time (only presenting half of the screen image within a video game was not an option).
These real-world needs for processing high velocity data lead to increasingly parallel graphics processing units (GPUs) that excelled at taking large numbers of simple mathematical tasks and performing them simultaneously. In the Fall of 2018 NVidia, the current leader in GPU components, released their next generation of consumer graphics cards aimed at processing high volume, high velocity data. These cards now include a specific computer hardware (Tensor Cores) that process data faster due to the ability to utilize lower precision operations (INT8 versus FP32).
What is TensorFlow Again?
Previously we described TensorFlow as a tool to process large amounts of historical data in order to make inferences regarding new data. TensorFlow accomplishes this task by helping developers manage this flow of information by generating graph representations for the attributes and values used during the categorization process. The TensorFlow tool also allows individuals to make decisions regarding the balance between precision and speed by enabling the individual to route the flow of data to processing units that have different precision levels and capacities during the training process. Once the model is trained using a supervised learning process, TensorFlow allows individuals to create a static, compiled version of this model for deployment onto mobile phones and other less powerful computing devices. Using TensorFlow as a tool to process the flow of information to computer hardware (such as Tensor Cores) enables a huge increase in performance that reduces the time needed to process big data sets that are inputs to the useful tools that everyday consumers use on their mobile phones and desktop computers.