Chinese Chatbots and the Rise of AI Risks

(issaronow - stock.adobe.com)

In the global race to develop artificial intelligence (AI) chatbots, Chinese censorship may hamper its competitiveness and pose compliance, reputational and other cyber risks for multinational organizations that use Chinese chatbots abroad, especially if they operate in China. In November 2022, U.S. firm OpenAI's release of ChatGPT kicked off a global race to develop similar AI chatbots, heightening technological competition between the United States and China in particular. Following ChatGPT's introduction into the market, several Chinese companies raced to release their own models, such as internet company Baidu, whose Ernie Bot was unveiled in March, though it is only available to those who apply for and receive an access code and is still awaiting official governmental approval.

In April, Chinese e-commerce company Alibaba announced its AI chatbot, "Tongyi Qianwen," which will be integrated into all Alibaba products, including workplace communication platform DingTalk and its Tmall Genie Internet of Things platform. It will also incorporate features such as image processing and text-to-image generation.
Also in April, Chinese internet platform Kunlun Tech launched its "Tiangong" chatbot, which it claims is the only chatbot in China with the same training metric levels as ChatGPT, capable of conducting question-and-answer interactions with users. However, the bot is still in the testing phase, available only by invitation.
Chinese technology company Tencent Holdings allegedly also has a development team creating an AI chatbot called "HunyuanAide."

Homegrown Chinese chatbots will be primarily trained on information scraped from the heavily censored Chinese internet space, which will constrain their outputs compared to their Western-developed counterparts. ChatGPT and other chatbots from Western firms are trained on huge datasets of information scraped from the internet. While this presents its own challenges, such as potentially incorporating biased or extremist content, overall, far more data can be acquired and processed. By contrast, China's "great firewall" prevents information the government does not approve of being publicly accessible within its borders. The Chinese Communist Party (CCP) is careful to maintain a tight grip on the information space, which unpredictable AI chatbots can threaten. To hedge against the risk that the bots will output information that does not align with CCP interests, the Cyberspace Administration of China (CAC) announced rules in April governing the development of chatbots in the country. While the agency hopes to support AI innovation, it requires that generative AI content be in line with China's "core socialist values." Companies must heed the CCP's censorship rules by avoiding information that undermines "state power" or national unity. Providers will be responsible for the legitimacy of the information used to train generative AI products. Users will also be required to submit real identities and other related information, creating implications for the CCP regarding its ability to tie information entered into the bots back to specific identities. Additionally, the human fine-tuning stage of chatbot training will play a critical role in how the CCP governs censorship of this technology to ensure that the outputs align with CCP values. This often tedious process allows humans to correct bot outputs, usually with the intention of minimizing bias, and to incorporate additional (or specific, in this case) examples that a bot could produce for an answer to a particular prompt, especially if there is limited data on the subject available in the training dataset.

Reuters was able to access and test Baidu's Ernie, posing several questions about Chinese President Xi Jinping. The bot was able to give descriptions of Xi's role and education but mostly responded to the questions by saying, "As an AI large-scale model, I have not learnt how to answer that question, you can ask me some other questions, I will do my best to help you solve them." It also responded similarly to questions about China's 1989 Tiananmen Square protests and Chinese treatment of Uyghur Muslims. The chatbot occasionally responded with "let's change the subject and start again" when prompted with sensitive topics, including questions about U.S. President Joe Biden or former President Donald Trump.
A Wall Street Journal study showed similar results with both Ernie and other bots, which shut down when prompted with questions about Chinese or American politics. Any potentially sensitive topic was met with the response that the input "could not pass a safety review."
If the chatbots primarily include data from within the Chinese internet space, the bots will rely almost exclusively on Chinese language data. This means that the utility of Chinese chatbots will be limited outside of the country.

Chinese companies may also be constrained in their ability to produce effective chatbots due to export restrictions on advanced semiconductor technology needed to power such AI systems but will undertake measures to circumvent restrictions. In October 2022, the U.S. government announced restrictions to cut off Chinese access to advanced semiconductor technology in an effort to thwart its AI and military progress; since then, a handful of allies — including the Netherlands and Japan, which, with the United States, have a near monopoly on the highly advanced chipmaking gear market — have followed suit. China's Baidu is confident that it has access to adequate GPUs to power its chatbot models. Huawei, a Chinese technology company previously banned in several Western countries from supplying 5G network equipment, asserts that China's chip industry will strengthen over time. However, questions remain as to whether Chinese AI will be able to stay competitive as China's chip industry relies on foreign technology, and domestic firms are highly unlikely to be able to compensate for all the aspects included in the production process. For instance, in March, Huawei and other Chinese companies announced that they had created electronic chip design tools for semiconductors sized at 14 nanometers and above. Nevertheless, these are quite large compared to the much smaller, more powerful and more efficient chips needed to support more advanced technology such as AI. Furthermore, the development of a self-reliant chip-making industry will take a long time to develop and will face significant technological challenges, potentially leaving China behind in its AI development. In contrast, the West will further advance using its access to relevant semiconductor technology.

Because AI chips are small and lightweight and do not require support after they are sold, they are ideal goods for smuggling. Though there is no concrete evidence of chip smuggling following the October export restrictions, chip smuggling has historically been a common practice in China. Smuggling the chips is potentially easy in small quantities but more difficult for the quantities needed to train large AI models, meaning any attempt would be a significantly difficult and likely slow undertaking.
Though the United States and several allies have implemented export restrictions on advanced semiconductor manufacturing technology, there are other countries without restrictions that China could utilize as alternatives. South Korea is a smaller player in the broader semiconductor industry, but it has not yet issued export restrictions for advanced semiconductor manufacturing technology. China is also circumventing export controls by importing chips to subsidiaries in India and allowing Chinese programmers to access their computing capacity via the cloud.
Well-resourced Chinese AI companies can also use chips that comply with export control performance thresholds, which would operate with lower processing power and slower interconnect speed. Multinational software company Nvidia has reduced the capabilities of some of its most advanced chips so that they are legal to export to China, and experts at the U.K.-based Centre for the Governance of AI said in April that the impact on these chips' performance would be less than 10% compared to those available on the international market.

A Potential Chinese Advantage in Weaker Data Protections
While Western-developed chatbots will have an advantage in being trained on a wider amount of information that governments do not censor, they may be constrained by data privacy requirements. Western chatbots are grappling with how to best adapt to their governments' privacy regulations, notably the European Union's General Data Protection Regulation (GDPR), which over time, could harm advancement to such an extent that China may be able to gain a strategic advantage. OpenAI has already expressed concerns that it may not even be able to operate ChatGPT in Europe because strict GDPR standards contradict the principle of large language models, which require large swaths of (often personal or proprietary) data to function properly. Though Chinese authorities have also raised concerns about the risks of AI, they are mainly concerned with the risks to political stability and international security rather than data privacy. In fact, Chinese authorities at all levels of society are known to collect (and share among themselves and state-owned firms) a vast amount of data from citizens. This volume of data has already helped China make progress in a variety of AI technologies and, over time, could give Chinese firms an advantage over Western competitors, which may face greater limits on their ability to pull data from their citizens. It should be noted that Chinese private tech companies could potentially be reprimanded for violating China's Personal Information Protection Law as the government seeks to control the abuse of data by big-tech firms. However, the government still has a vested interest in accessing as much data as possible, looking to mobilize large datasets as an economic catalyst for China's data economy. This is especially true because China sees AI as a key technology necessary for national economic growth and global tech leadership.

If Chinese chatbots are able to overcome these challenges and be effectively used outside of the country, organizations that use them will face various data privacy and cyber security risks. If Chinese companies are able to obtain enough resources, including advanced semiconductor technology, to develop multiple versions of chatbots, such as one for domestic use and another for external use, the bots may be more effective. They might include language offerings other than Chinese and have access to additional data for training that does not exist within China's domestic internet space. In this scenario, Chinese chatbots would not be subject to the same stringent privacy rules as their Western counterparts, allowing them to develop more advanced models more rapidly. However, organizations that use these bots will face data privacy risks, as any data entered into the model can be used to train future models. This is true for many existing Western AI chatbots too, but there are additional layers of concern for Chinese-developed chatbots as current CAC regulations require users to submit identification information before using the tools. Because the Chinese government is so data-driven, it would likely require similar requirements that maximize user data collection. There are also risks that Chinese authorities would be able to access data entered into Chinese-owned bots, fueling espionage and national security concerns likely much larger than those expressed in the ongoing controversies of the Chinese short-form video app TikTok. The use of Chinese chatbots would also raise questions about whether the tools have extended permissions onto a user's device. While OpenAI and other Western AI firms have faced similar criticisms, there is a distinction between the third parties that OpenAI may be sharing information with for marketing and advertising purposes and the more sensitive user data that could potentially be shared with the Chinese government.

Because these chatbots will likely collect user data that can be tied back to specific identities, Chinese chatbots developed for international use could also be leveraged as a means of influence to advance CCP ideology. Just as Western companies like OpenAI filter out biased or harmful information, Chinese companies would implement similar restrictions, but their content filtering would likely be politicized. For instance, they may opt (or be required) to filter out information that the government might view as Western "propaganda" or anything contrary to Chinese interests if spread. The government may even require chatbots to promote information in their responses that endorse specific Chinese government narratives.

For multinational organizations with locations in China, there are reputational, compliance, surveillance and other risks in incorporating AI chatbots into business operations inside the country. Many organizations in numerous sectors have begun incorporating AI chatbot services into daily tasks and services, relying on their efficiency to carry out business operations. If these organizations have locations in China (or staff are required to frequently travel to China for business purposes), they may be unable to use their preferred AI chatbot service, as foreign chatbots such as ChatGPT are often banned. Unless they forgo their use, organizations would have to turn to Chinese offerings, which may not be as effective for work in other languages or in producing the right types of content. This poses financial and reputational risks to organizations if a mistake were to be made that impacts the quality of its services or reflects poorly on it. Output from these chatbots may also be biased, inaccurate or contain political undertones that advance CCP interests. Organizations that use these outputs could open themselves up to reputational damage. There may also be compliance issues for organizations that could prevent them from using these chatbots in the first place, which poses operational risks if they rely on AI for daily work and are unable to provide the same services. Furthermore, multinational organizations operating in China will likely face even greater data privacy risks than those associated with the potential development of Chinese firms' internationally focused chatbots. Any information that is included in a prompt as well as user information will be logged and could be used to train future models, potentially revealing business information to other users, including Chinese business rivals who may have connections to the CCP. These chatbots may also be used as an avenue for surveillance, as this information will also be accessible by the government and definitively tied back to identity, portending risks for corporate espionage as China seeks to gain an advantage in numerous sectors in the midst of economic competition with the West.

AI start-up Yuanyu Intelligent's ChatYuan was suspended within days of its launch after it responded to prompts about the Chinese economy, saying there was "no room for optimism," citing challenges such as pollution, lack of investment and a housing bubble. Baidu's Ernie bot maintains a list of banned keywords to filter out, including content involving politics.
Businesses operating in China already face data privacy risks, and many operate under the assumption that the government can access any information sent digitally as technology firms are required to share data with the government under China's Internet Security Law and National Intelligence Law. Illustrating risks that could come with Chinese chatbots, Pinduoduo, a popular Chinese retail app, also has the ability to spy on its users, accessing notifications and private messages and monitoring activity on other apps to collect massive troves of user data without explicit consent.
Chinese chatbots are likely to generate larger data privacy concerns than those faced by other bots, such as ChatGPT, in which information entered into prompts can be included in training datasets and repeated to future users. Though OpenAI has introduced an option for users to opt out of the data collection process, it is unlikely that Chinese companies will do so, given the CCP's already extensive data collection practices. In fact, since these chatbots represent a lucrative opportunity for collecting vast troves of user data, the Chinese government likely has even larger incentives to collect as much data as possible.