People are increasingly embracing the latest in technology, with terms like “digital twin” and “metaverse” filtering into everyday conversations. Kioxia’s Ryohei Orihara and NVIDIA’s Takeshi Izaki are on the front line of AI development. What technologies do they have their eyes on, and how do they envision the future? Finally, what possibilities lie in store for future collaborations between Kioxia and NVIDIA?
Ryohei Orihara of Kioxia and Takeshi Izaki of NVIDIA are at the vanguard of R&D in the fields of AI and deep learning. As their conversation continues, they discuss their predictions for how these technologies will be adopted in the near future. From potential applications for AI to its expansion into the metaverse, the future they envision is becoming more likely with every passing day.
Orihara: We used a generative adversarial network (GAN)—a type of deep learning model known for its ability to generate images and other media—for Tezuka 2020. Just being able to generate images was enough for us engineers at first, but as time went on, we wanted to have more precise control over what was created. That was the impetus for NVIDIA’s StyleGAN. More recent research has been focused on combining GAN technology with natural language processing (NLP) in an attempt to exert better control over generative AIs, which I find fascinating.
Izaki: So the AI can, for example, generate an image from a written description?
Orihara: Exactly. We are now able to use natural language when giving instructions. For example, we can say, “Change this part of the picture in this way.”
Izaki: Which is thanks to the massive strides we’ve made in natural language processing over the past few years. Both language recognition and language interpretation have improved enormously since the development of transformer models, which are mainly used in the field of NLP. Transformer models are quickly evolving, with researchers set to apply them to image recognition and incorporate them into generative networks.
Orihara: In the early days of deep learning, methods that had been proven in image recognition were applied to natural language processing. The opposite is true of transformer models—methods that worked well in natural language processing have been applied to image recognition. It is interesting to see this back-and-forth flow of ideas.
Izaki: Speaking of language processing, GPT-3 is another example of how a transformer model can be used in natural language processing. GPT-3 is a text-generating language model that can create AI-written blog posts and news articles. If someone creates a model that, for example, allows animated characters to perform actions and converse with humans in a natural way, it will serve a similar purpose as an intelligent conversational robot. I think we will see these kinds of models being created more and more frequently.
──You mentioned news articles. What other everyday applications do you envision for AI?
Izaki: I think most people are familiar with the example of chatbots, which are taking the place of call centers. Many financial institutions have already implemented AI-based chatbots, and a variety of cloud service companies are offering them as a SaaS option. As time goes on, these will become more intelligent, even acquiring the ability to read emotions from the sound of a person’s voice and respond accordingly. It won’t be long before their accuracy increases and their human interactions become more natural.
Another example would be virtual salespeople for retail settings. The idea is to have visitors search for a product on an LCD screen, where the virtual salesperson would interpret their request correctly and even guide them to the product’s location in the store. Unlike a human salesperson, this virtual person would be connected to a network, so it would be able to access data that would allow it to display personalized recommendations. This will also become more common.
Orihara: The ability to recognize images and voices would help their interactions become more natural and human-like.
Izaki: Yes, and gesture recognition would make them even more lifelike. Before long, we should be able to create multimodal systems that combine voice and image for smoother interactions. This might allow an AI to read a person’s emotions from their facial expressions, for example.
Orihara: More and more companies are holding virtual meetings, but people dislike teleconferencing because it is difficult to convey non-verbal information, which accounts for over half of human communication. I believe that multimodal systems will make up for such issues.
I have another interesting example from my own experience. When I was in France for a conference, I made a restaurant reservation over email. I wrote the text in English and then converted it to French using Google Translate. When I showed up at the restaurant, I found that the staff didn’t speak any English, so I was forced to communicate in broken French. But partway through my meal, the server brought me a note asking me to write down what I wanted to say since I had successfully communicated in French with the initial email. So Google Translate is able to pass the Turing test.
Izaki: Google Translate has recently become much more accurate, to the point where you can just copy the translation and use it as is. As you mentioned with the Turing test, it has reached the point where it can be mistaken for a human being.
──On the topic of cutting-edge technology, could you tell us about GauGAN?
Izaki: About two years ago, there was an explosion of interest in GANs—people were publishing paper after paper about them. Things have calmed down a bit since then, but many new GAN algorithms are still being developed. At NVIDIA, we published a paper on GauGAN, which is a model that lets you draw a simple graphic and designate certain areas as, for example, rock or sky. It will then create a photorealistic image based on those specifications. The GauGAN2 algorithm is even more advanced. Also, the StyleGAN model, which we used to design the characters for Tezuka 2020, is now able to create backgrounds as well. There are so many new developments when it comes to creative AI models.
Recently, I’ve been keeping an eye on neural radiance field (NeRF) algorithms, which can convert 2D pictures into 3D images. They allow you to accurately reproduce a scene from a flat image into three dimensions and even move through the space and change your point of view. I think the shift into three dimensions will lead to more creativity and more AI implementations in a variety of fields.
──As the metaverse continues to grow, do you think there will be more AI-based creations in 3D spaces?
Izaki: NVIDIA has a platform called Omniverse, which was developed to enable 3D design collaboration and real-time simulation in virtual spaces. It’s a collaborative environment in which designers located in different places and using different 3D design applications can still collaborate in real time from their various devices.
For example, in the past, you would create a physical model of your factory’s automation machines and transport robots, which would allow you to simulate the factory’s operations so you could optimize travel routes and simulate assembly line arrangements and personnel allocations. On Omniverse, multiple people can collaboratively edit a single virtual model of a structure, such as a factory, while running virtual simulations of its real-world operations. In the future, I anticipate that construction will increasingly involve a process where every part is simulated in virtual spaces that incorporate every aspect—from the surrounding environment to the amount of sunlight entering the building—as the thing is being built.
Instead of doing creative work at a desk or computer, people will move into virtual spaces that allow for more interactive collaboration.
Orihara: As I’m listening to you speak, it strikes me that this transition is already happening in programming, which is my area of expertise as a programmer. Thanks to the metaverse, what is already common in programming and software development will expand into the world of design and other fields related to tangible things.
Izaki: Programming is evolving rapidly, and so are the tools we use. There are people who do hard coding, of course, but there are also some who work with no-code and low-code tools. Going forward, I think we will see more situations in which people of various skill levels can work together to create something.
It used to be that programs could only be understood by programmers, but now they are so commonplace that even laypeople can use them. This, naturally, changes the types of things that are created, which in turn leads to greater diversification.
Orihara: It is the democratization of programming.
Izaki: Yes, exactly.
Next Steps for Kioxia and NVIDIA
Tezuka 2020 represents the first collaboration between Kioxia and NVIDIA, but what will be their next project? Orihara and Izaki discuss the two companies’ boundless curiosity and ideas about topics as varied as digital humans and Earth simulations, as well as necessary skills for the rapidly growing field of AI development.
Orihara: Since NVIDIA is continuing its research on deep learning—in GANs as well as other models—I am looking forward to continue using the software resources you publish on sites like GitHub. We at Kioxia hope to start contributing to the open source community as well.
Izaki: Your company produces memory and other storage devices, so we have a business relationship even outside of this project. The idea of reviving Osamu Tezuka through manga started from the idea of memory—but creation can take many other forms as well, such as image and sound. I’m sure you will continue exploring the uses of memory devices to develop services that can provide something new and valuable to the world. And I am excited to see how we at NVIDIA can help, whether it be through our algorithms, our GPUs, or something else.
──If Kioxia and NVIDIA were to create a project even more impactful than Tezuka 2020, what field would that be in?
Orihara: Given the companies involved, it would be something that requires a great deal of computing power and a great volume of data.
Izaki: That makes me think of digital humans. Constructing a digital human would require a 360-degree camera and a tremendous amount of data, including voice data and human behavioral data, as well as significant computing power and a large storage capacity. So that might be a possible area of collaboration.
Another thing we are trying to create at NVIDIA is a simulation of the planet—Earth’s digital twin. But it will take a tremendous amount of data to model the entire planet.
──An Earth simulation would certainly be impactful.
Izaki: Recently, there has been a lot of talk of sequestering carbon dioxide underground, so we are trying to simulate how this would affect the Earth. Only with a proper model will we be able to see what is happening, and of course that will require massive amounts of data. However, it is directly relevant to our lives. And although the length of a human life is very short from the perspective of the planet as a whole, the lives of people in the future are dependent on what we humans are doing now. So we need to simulate the future accurately and examine how to improve the lives of those who come after us. I think our findings from these simulations would be a worthwhile product of AI.
Orihara: Without building simulations that allow us to grasp the future in a realistic way, we will not be able to change our current behavior to ensure a better future. I imagine it would also be helpful to incorporate concepts from cognitive science and psychology.
──What qualities will be required of engineers in the future?
Izaki: Resources such as AIs are becoming readily accessible in the open source world. Algorithms from the latest papers are uploaded to GitHub at the time of publication, so it’s easy to download and run even the newest algorithms all by yourself.
This has eliminated the need to study every aspect of a new technology from scratch. Now, we can use it right away and see how it works and then think about how to improve it. So newcomers to the industry should start by familiarizing themselves with as many of these new technologies as possible.
It is much easier to implement something with an algorithm than with mechanical technologies, so I recommend focusing on applying these algorithms—seeing how they can be used in real-life services—and then incorporating them in a business’s operations. That will give you feedback that can inspire the next improvement and then the next improvement and so on. It is also important to gather information from a global perspective. If your focus is only on Japan, you will get left behind.
Orihara: AI is inherently interdisciplinary, an intersection of mathematics, physics, psychology, literature, and more. The topics you study in other fields can often be quite useful, so I want to urge up-and-coming researchers to develop their knowledge on a variety of topics.
──It seems likely that AI will be implemented in a variety of fields, but how do you think that expansion will happen?
Izaki: I think the lines between industries will blur. Our business used to be concentrated in the manufacturing industry. However, since we started working with AI, it has expanded to include not only manufacturing but also entertainment—as in the case of Tezuka 2020—as well as Internet services, retail, and even fishing and agriculture. There are possibilities for AI in all kinds of places; it is singlehandedly erasing the boundaries between industries.
I honestly do not know what forms of knowledge will come into play or how. But as long as we keep combining domain-specific knowledge with AI technology, I’m sure we will keep developing new solutions. For that reason, I believe that developing a broad knowledge spectrum while also delving deeper into the knowledge you already have will be very helpful in AI.
The content and profile are current as of the time of the interview (February 2022).