2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.
Ethical Principles and AI
Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.
AI for Social Good
The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property.
A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.
We have also seen a huge number of external parties, sometimes in collaboration with Google researchers and engineers, using open source software like TensorFlow to tackle a wide range of scientific and social problems, such as using convolutional neural networks to identify humpback whale calls, detecting new exoplanets, identifying diseased cassava plants and more.
To spur creative activity in this area, we announced the Google AI for Social Impact Challenge in collaboration with Google.org, whereby individuals and organizations can receive grants from a total of $25M of funding, along with mentorship and advice from Google research scientists, engineers and other experts as they work to take a project with large potential social impact from idea to reality.
Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.
Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).
An important focus in our research is helping to make products like the Google Assistant support more languages and allow better understanding of semantic similarity, even when very different ways of expressing the same concept or idea are used. Underlying new product capabilities like these is research we performed on improving the quality of both speech synthesis and text-to-speech for languages without much training data available.
Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.
We also released Cirq, an open source programming framework for quantum computers, and explored how quantum computers could be used for neural networks. Finally, we shared our experience and techniques for understanding performance fluctuations in quantum processors, and shared some thoughts on how quantum computers might be useful as a computational substrate for neural networks. We're looking forward to exciting results in the quantum computing space in 2019!
Natural Language Understanding
Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.
In addition to collaborating with various research teams to enable Smart Compose and Duplex (discussed previously), we worked to make the Google Assistant handle multilingual use cases better, with the goal of making the Assistant naturally conversational for all users.
Our perception research tackles the hard problems of allowing computers to understand images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality. In 2018, our technology improved Google Photos' ability to organize the content that users most care about, such as people and pets. Google Lens and the Assistant enabled users to learn about the natural world, answer questions in real-time, and do more with Lens in Google Images. A key aspect of the Google AI mission is to empower others to benefit from our technology, and we've made a lot of progress this year in improving capabilities and building blocks that are parts of Google APIs. Examples include improved and new capabilities in vision and video in Cloud ML APIs and face-related on-device building blocks through ML Kit.
In 2018, our contributions to academic research included advances in deep learning for 3D scene understanding, such as stereo magnification, which enables synthesizing novel photorealistic views of a scene. Our ongoing research on better understanding images and video enables users to find, organize, enhance and improve images and video in Google products such as Photos, YouTube, Search and more. In 2018, notable advances included a fast bottom-up model for joint pose estimation and person instance segmentation, a system for visualizing complex motion, a systemwhich models spatio-temporal relations between people and objects and improvements in video action recognition based on distillation and 3D convolutions.
In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listencombines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.
Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.
The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.
This year, one of our primary efforts in computational photography research was to create a new capability called Night Sight, which enables Pixel phone cameras to "see in the dark", earning praise by both press and users. Of course, Night Sight is just one of the new software-enabled camera features our teams have developed to help you take the perfect photo, including using ML to provide better portrait mode shots, seeing better and further with Super Res Zoom and capturing special moments with Top Shot and Google Clips.
Algorithms and Theory
Algorithms are the backbone of Google systems and touch all our products, from routing algorithms behind Google trips to consistent hashing for Google cloud. Over the past year, we continued our research in algorithms and theory covering a wide range of areas from theoretical foundations to applied algorithms, and from graph mining to privacy-preserving computation. Our work in optimization spans areas from studying continuous optimization for machine learning to distributed combinatorial optimization. In the former area, our work on studying convergence of stochastic optimization algorithms for training neural networks (which won an ICLR 2018 Best Paper Award) exhibited issues with popular gradient-based optimization methods (such as some variants of ADAM), but provided a solid foundation for new gradient-based optimization methods.
In distributed optimization, we worked to improve the round and communication complexity of well-studied combinatorial optimization problems such as matchings in graphs via round compressionand via core-sets, as well as submodular maximization, and k-core decomposition. On the more applied side, we developed algorithmic techniques for solving set cover at scale via sketching and for solving balanced partitioning and hierarchical clustering for graphs with trillions of edges. Our work on online delivery services was nominated for the best paper award at WWW'18. Finally, our open source optimization OR-tools platform won 4 gold medals at the 2018 Minizinc constraint programming competition.
In algorithmic choice theory, we have proposed new models and investigated the problems ofreconstruction and learning a mixture of multinomial logits. We also studied the classes of functions learnable by neural networks and how to use machine-learned oracles to improve classiconline algorithms.
Understanding learning techniques with strong privacy guarantees is of great importance for us at Google. In this context, we developed two new means of analyzing how differential privacy can be amplified by iteration and by shuffling. We also applied differential privacy techniques to design incentive-aware learning methods that are robust against gaming. Such learning techniques have applications in efficient online market design. Our new research in the area of market algorithmsinclude also techniques to help advertisers test incentive compatibility of ad auctions, and optimizing ad refresh for in-app advertising. We also pushed the boundaries of state-of-the-art dynamic mechanisms for repeated auctions, and presented dynamic auctions that are robust against lack of prediction of future, against noisy forecasts, or against heterogenous buyer behaviour, and extend our results to dynamic double auctions. Finally, in the context of robustness in online optimization and online learning, we developed new online allocation algorithms for stochastic input with traffic spikes and new bandit algorithms robust to corrupted data.
A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.
We also released JAX, an accelerator-backed variant of NumPy that supports automatic differentiation of Python functions to arbitrary order. While JAX is not part of TensorFlow, it leverages some of the same underlying software infrastructure (e.g. XLA), and some of its ideas and algorithms have been helpful to our TensorFlow projects. Finally, we continued our research on the security and privacy of machine learning, and our development of open source frameworks for safety and privacy in AI systems, such as CleverHans and TensorFlow Privacy.
Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.
In 2018 we learned about Spectre and Meltdown, new classes of serious security vulnerabilities in modern computer processors, thanks to Google's Project Zero team in collaboration with others. These and related vulnerabilities will keep computer architecture researchers quite busy. In our continuing efforts to model CPU behavior, our Compiler Research team integrated their tool for measuring machine instruction latency and port pressure into LLVM, making possible better compilation decisions.
Google products, our Cloud offerings and inference for machine learning models depend critically on the ability to provide large-scale, reliable, efficient technical infrastructure for computing, storage and networking. A few research highlights from the past year include the evolution of Google's Software Defined Networking WAN, a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats, in many storage systems (BigTable, Spanner, Google Spreadsheets, etc.) and a report on our extensive use of code review, investigating the motivations behind code review at Google, current practices, and developers' satisfaction and challenges.
Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guaranteeson the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.
AutoML, also known as meta-learning, is the use of machine learning to automate some aspects of machine learning. We have been performing research in this space for many years, and the long-term goal is to develop learning systems that can learn to take a new problem and solve it automatically, using insights and capabilities derived from other problems that have been previously solved. Our earlier work in this space has mostly used reinforcement learning, but we are also interested in the use of evolutionary algorithms. Last year we showed how evolutionary algorithms can be used to automatically discover state-of-the-art neural network architectures for a variety of visual tasks. We also explored how reinforcement learning can be applied to other problems than just neural network architecture search, showing that it can be used to 1) automatically generate image transformation sequences that improve the accuracy of a wide variety of image models, and 2) find new symbolic optimization expressions that are more effective than the commonly used optimization update rules. Our work on AdaNet showed how to have a fast and flexible AutoML algorithm with learning guarantees.
Another focus for us was on automatically discovering neural network architectures that are computationally efficient, so that they can run in environments such as mobile phones or autonomous vehicles that have tight constraints on either computational resources or on inference time. For this, we showed that combining the accuracy of a model with its inference computation time in the reward function for a reinforcement learning architecture search can find models that are highly accurate while meeting particular performance constraints. We also explored using ML to learn to automatically compress ML models to have fewer parameters and use less computational resources.
Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.
Open Source Software and Datasets
Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.
In addition to continued work on existing open source ecosystems, in 2018 we introduced a new framework for flexible and reproducible reinforcement learning, new visualization tools to rapidly understand the characteristics of a dataset (without needing to write any code), added a high-level library for expressing machine learning problems that involve learning-to-rank (the process of ordering a list of items in a way that maximizes the utility of the entire list, applicable across domains that include search engines, recommender systems, machine translation, dialogue systems and even computational biology), released a framework for fast and flexible AutoML solutions with learning guarantees, a library for doing in-browser realtime t-SNE visualizations using TensorFlow.js and added FHIR tools and software for working with electronic healthcare data(discussed in the healthcare section of this post).
From time-to-time, we also help establish new kinds of challenges for the research community, so that we can all work together on solving difficult research problems. Often these are done with the release of a new dataset, but not always. This year, we established new challenges around the Inclusive Images Challenge, to work towards making more robust models that are free from many kinds of biases, the iNaturalist 2018 Challenge which aims to enable computers' fine-grained discrimination of visual categories (such as species of plants in an image), a Kaggle "Quick, Draw!" Doodle Recognition Challenge to create a better classifier for the QuickDraw challenge game, and Conceptual Captions, a larger-scale image captioning dataset and challenge aimed at enabling better image captioning model research.
In 2018, we made significant progress towards our goal of understanding how ML can teach robots how to act in the world, achieving a new milestone in the ability to teach robots to grasp novel objects (best systems paper at CoRL'18), and using it to learn about objects without human supervision. We've also made progress in learning robot motion by combining ML and sampling-based methods (best paper in service robotics at ICRA'18) and learning robot geometry for faster planning. We've made great strides in our ability to better perceive the structure of the world from autonomous observation. For the first time, we've been able to successfully train deep reinforcement learning models online on real robots, and are finding new, theoretically grounded ways, to learn stable approaches to robot control.
Applications of AI to Other Fields
In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.
Our Nature Methods paper on high-precision automated reconstruction of neurons proposed a new model that improves the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques.
Some other examples of applying ML to science include:
Finding new planets outside our solar system by data mining light curves of stars
Recognizing the origin or function of short DNA sequences
Automatically detecting out-of-focus microscope images
Digitally creating images of the same cells with multiple stains
Automatically mapping mass spectrometry output to peptide chains
For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.
At the end of 2016, we published work showing that a model trained to assess retinal fundus images for signs of diabetic retinopathy was able to perform on-par to slightly-better than U.S. medical-board-certified ophthalmologists at this task in a retrospective study. In 2018, we were able to show that by having the training images labeled by retinal specialists and by using an adjudicated protocol (where multiple retinal specialists convene and have to arrive at a single collective assessment for each fundus image), we could arrive at a model that is on-par with retinal specialists. Later, we published an evaluation that showed how pairing ophthalmologists and this ML model allow them to make more accurate decisions than either alone. We have deployed this diabetic retinopathy detection system in partnership with our Alphabet colleagues at Verily at over 10 sites including Aravind Eye Hospitals in India and at Rajavithi Hospital affiliated with the Ministry of Health in Thailand.
We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world.
Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.
Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.
We believe that contributing openly to the broader research community is a critical part of supporting a healthy and productive research ecosystem. In addition to our open source and dataset releases, much of our research is published openly in top conference venues and journals, and we actively participate in the organization and sponsorship of conferences, all across the spectrum of different disciplines. For just a small sample, see our involvement at ICLR 2018, NAACL 2018, ICML 2018, CVPR 2018, NeurIPS 2018, ECCV 2018 and EMNLP 2018. Googlers also participated extensively in ASPLOS, HPCA, ICSE, IEEE Security & Privacy, OSDI, SIGCOMM, and many other conferences in 2018.
New Places, New Faces
In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.
Looking Forward to 2019
This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!