开源不适用于人工智能？

szhzxw.cn/cxounion.org

我们需要做一些事情来谈论开源和开放性。至少从2006年开始，笔者就清楚地知道——笔者因为认为谷歌和雅虎阻碍开源而与很多人有了争吵。正如Tim O'Reilly当时写道的那样，在一个开源的云时代，“为了让别人运行你的程序，分享源代码副本的必要性的动机之一不复存在了。不仅不再需要它，对于最大的应用程序来说，这不再可能。”

在过去的十年里，共享的不可能性搅乱了开源的定义，正如Mike Loukides最近指出的那样，它现在正在影响我们对人工智能（AI）的思考方式。在人工智能领域进行合作，从来没有比这更重要的时刻，但也从来没有比这更困难的时刻。正如Loukides所描述的，“由于其规模，大型语言模型在再现性方面存在重大问题。”

正如2006年的云技术一样，在人工智能领域做最有趣工作的公司可能会努力以我们传统预期的方式“开源”。即便如此，这并不意味着它们不能以有意义的方式开放。

根据Loukides的说法，虽然许多公司可能声称参与了人工智能，但实际上只有三家公司推动了该行业的发展：Facebook、OpenAI和谷歌。他们有什么共同点？大规模运行模型的能力。换句话说，他们正在以一种你我都做不到的方式做人工智能。他们不是在试图保密；他们只是拥有基础设施和如何运行基础设施的知识，而你我都没有。

“你可以下载Facebook的OPT-175B的源代码，”Loukides承认，“但你无法在任何你可以访问的硬件上对其进行训练。即使对于大学和其他研究机构来说，它也太大了。你仍然必须相信Facebook的话，它做它所说的事情。”，尽管Facebook宣布“共享开放式预训练Transformer（OPT-175B）……让更多社区参与了解这项基础性新技术。”

这听起来不错，但正如Loukides坚持的那样，“即使谷歌和OpenAI拥有足够的计算资源，也可能无法复制OPT-175B。”为什么？“OPT-175B与Facebook的基础设施（包括定制硬件）联系太紧密，无法在谷歌的基础设施上复制。尽管Facebook并没有试图隐藏它对OPT-175B的使用。建造这样的基础设施真的很难，即使是那个些有资金和技术的人最终也会建造一些不同的东西。

这正是雅虎的Jeremy Zawodny和谷歌的Chris DiBona于2006年在OSCON上所体现的。当然，他们可以开放所有代码的源代码，但考虑到它是以一种在其他任何地方都无法复制的方式大规模运行的，别人能用它做什么呢？

回到人工智能。如果我们不了解机器内部的科学，就很难相信人工智能。我们需要找到开放基础设施的方法。Loukides有一个想法，尽管它可能无法满足最狂热的自由软件/人工智能人士：“答案是向外部研究人员和早期采用者提供免费访问权限，以便他们可以提出自己的问题，并查看广泛的结果。”不，不是让他们通过钥匙卡访问Facebook、谷歌或OpenAI的数据中心，而是通过公共API。这是一个有趣的想法，可能会奏效。

但它并不像许多人所希望的那样“开源”。华东CIO大会、华东CIO联盟、CDLC中国数字化灯塔大会、CXO数字化研学之旅、数字化江湖-讲武堂，数字化江湖-大侠传、数字化江湖-论剑、CXO系列管理论坛（陆家嘴CXO管理论坛、宁波东钱湖CXO管理论坛等）、数字化转型网，走进灯塔工厂系列、ECIO大会等

另一个角度看开放

自2006年以来，谷歌在满足其战略需求的情况下对关键基础设施进行了打包和开源。TensorFlow开源可以称为入站，Kubernetes的开源可以称为出站，要么是机器学习的开源行业标准，有望带来更多谷歌云工作负载，要么是确保云之间的可移植性，给谷歌云更多赢得工作负载的机会。这是一种智能业务，但在某种程度上，它不是开源的。

在这方面，谷歌也并非孤军奋战。它只是比大多数公司更擅长开源。因为开源天生自私，公司和个人总是会打开有利于他们或他们自己的客户的代码。一直都是这样，而且永远都是这样。

对于Loukides关于如何有意义地开放人工智能的观点，尽管三大人工智能巨头与其他所有人之间存在差异，但他并没有像我们传统上在开源定义下那样主张开源。为什么？因为尽管它很神奇（事实上也是如此），但它从来没有解决过DiBona和Zawodny在2006年OSCON提出的软件开发者和消费者的云开源难题。已经过去十多年的时间了，但我们还没有找到答案。

笔者认为我们需要一种新的思考开源许可的方式，笔者的想法可能与Loukides思考人工智能的方式没有太大的不同。我理解他的论点，关键是为研究人员提供足够的途径，使他们能够重现特定人工智能模型工作的成功和失败。他们不需要完全访问所有代码和基础设施来运行这些模型，因为正如他所说的那样，这样做基本上是没有意义的。在一个开发者可以在笔记本电脑上运行开源程序并进行衍生工作的世界里，要求完全访问该代码是有道理的。考虑到谷歌或微软今天运行的代码的规模和独特的复杂性，这已经没有意义了。无论如何，并不是所有大规模运行的云代码。

我们需要抛弃开源的非黑即白观点。它从来不是一个特别有用的视角来看待开源世界，考虑到我们的云时代，它正变得越来越不那么有用。作为公司和个人，我们的目标应该是以有利于我们的客户和第三方开发者的方式开放对软件的访问，以促进访问和理解，而不是试图将几十年前的开源概念改造为云。它不适用于开源，就像它不适用于人工智能一样。是时候换个角度思考了。

原文：

Clearly, we need to do something about how we talk about open source and openness in general. It’s been clear since at least 2006 when I rightly got smacked down for calling out Google and Yahoo! for holding back on open source. As Tim O’Reilly wrote at the time, in a cloud era of open source, “one of the motivations to share—the necessity of giving a copy of the source in order to let someone run your program—is truly gone.” In fact, he went on, “Not only is it no longer required, in the case of the largest applications, it’s no longer possible.”

That impossibility of sharing has roiled the definition of open source during the past decade, and it’s now affecting the way we think about artificial intelligence (AI), as Mike Loukides recently noted. There’s never been a more important time to collaborate on AI, yet there’s also never been a time when doing so has been more difficult. As Loukides describes, “Because of their scale, large language models have a significant problem with reproducibility.”

Just as with cloud back in 2006, the companies doing the most interesting work in AI may struggle to “open source” in the ways we traditionally have expected. Even so, this doesn’t mean they can’t still be open in meaningful ways.

According to Loukides, though many companies may claim to be involved in AI, there are really just three companies pushing the industry forward: Facebook, OpenAI, and Google. What do they have in common? The ability to run massive models at scale. In other words, they’re doing AI in a way that you and I can’t. They’re not trying to be secretive; they simply have infrastructure and knowledge of how to run that infrastructure that you and I don’t.

“You can download the source code for Facebook’s OPT-175B,” Loukides acknowledges, “but you won’t be able to train it yourself on any hardware you have access to. It’s too large even for universities and other research institutions. You still have to take Facebook’s word that it does what it says it does.” This, despite Facebook’s big announcement that it was “sharing Open Pretrained Transformer (OPT-175B) ... to allow for more community engagement in understanding this foundational new technology.”

That sounds great but, as Loukides insists, OPT-175B “probably can’t even be reproduced by Google and OpenAI, even though they have sufficient computing resources.” Why? “OPT-175B is too closely tied to Facebook’s infrastructure (including custom hardware) to be reproduced on Google’s infrastructure.” Again, Facebook isn’t trying to hide what it’s doing with OPT-175B. It’s just really hard to build such infrastructure, and even those with the money and know-how to do it will end up building something different.

This is exactly the point that Yahoo!’s Jeremy Zawodny and Google’s Chris DiBona made back in 2006 at OSCON. Sure, they could open source all their code, but what would anyone be able to do with it, given that it was built to run at a scale and in a way that literally couldn’t be reproduced anywhere else?

Back to AI. It’s hard to trust AI if we don’t understand the science inside the machine. We need to find ways to open up that infrastructure. Loukides has an idea, though it may not satisfy the most zealous of free software/AI folks: “The answer is to provide free access to outside researchers and early adopters so they can ask their own questions and see the wide range of results.” No, not by giving them keycard access to Facebook’s, Google’s, or OpenAI’s data centers, but through public APIs. It’s an interesting idea that just might work.

But it’s not “open source” in the way that many desire. That’s probably OK.

Think differently about open

In 2006, I was happy to rage against the mega open source machines (Google and Yahoo!) for not being more open, but that accusation was and is mostly meaningless. Since 2006, for example, Google has packaged and open sourced key infrastructure when doing so met its strategic needs. I’ve called things like TensorFlow and Kubernetes the open sourcing of on-ramps (TensorFlow) or off-ramps (Kubernetes), either open sourcing industry standards for machine learning that hopefully lead to more Google Cloud workloads, or ensuring portability between clouds to give Google Cloud more opportunity to win over workloads. It’s smart business, but it’s not open source in some Pollyanna sense.

Nor is Google alone in this. It’s just better at open source than most companies. Because open source is inherently selfish, companies and inpiduals will always open code that benefits them or their own customers. Always been this way, and always will.

To Loukides’ point about ways to meaningfully open up AI despite the delta between the three AI giants and everyone else, he’s not arguing for open source in the way we traditionally did under the Open Source Definition. Why? Because as fantastic as it is (and it truly is), it has never managed to answer the cloud open source quandary—for both creators and consumers of software—that DiBona and Zawodny laid out at OSCON in 2006. We’ve had more than a decade, and we’re no closer to an answer.

Except that we sort of are.

I’ve argued that we need a new way of thinking about open source licensing, and my thoughts might not be too terribly different from how Loukides reasons about AI. The key, as I understand his argument, is to provide enough access for researchers to be able to reproduce the successes and failures of how a particular AI model works. They don’t need full access to all the code and infrastructure to run those models because, as he argues, doing so is essentially pointless. In a world where a developer could run an open source program on a laptop and make derivative works, it made sense to require full access to that code. Given the scale and unique complexities of the code running at Google or Microsoft today, this no longer makes sense, if it ever did. Not for all cloud code running at scale, anyway.

We need to ditch our binary view of open source. It’s never been a particularly useful lens through which to see the open source world, and it’s becoming less so every day, given our cloud era. As companies and inpiduals, our goal should be to open access to software in ways that benefit our customers and third-party developers to foster access and understanding instead of trying to retrofit a decades-old concept of open source to the cloud. It hasn’t worked for open source, just as it’s not working for AI. Time to think differently.

本文主要内容转载出自InfoWorld，原作者Matt Asay，仅供广大读者参考，如有侵犯您的知识产权或者权益，请联系我提供证据，我会予以删除。

CXO联盟（CXO union）是一家聚焦于CIO，CDO，cto，ciso，cfo，coo，chro，cpo，ceo等人群的平台组织，其中在CIO会议领域的领头羊，目前举办了大量的CIO大会、CIO论坛、CIO活动、CIO会议、CIO峰会、CIO会展。如华东CIO会议、华南cio会议、华北cio会议、中国cio会议、西部CIO会议。在这里，你可以参加大量的IT大会、IT行业会议、IT行业论坛、IT行业会展、数字化论坛、数字化转型论坛，在这里你可以认识很多的首席信息官、首席数字官、首席财务官、首席技术官、首席人力资源官、首席运营官、首席执行官、IT总监、财务总监、信息总监、运营总监、采购总监、供应链总监。

数字化转型网（资讯媒体，是企业数字化转型的必读参考，在这里你可以学习大量的知识，如财务数字化转型、供应链数字化转型、运营数字化转型、生产数字化转型、人力资源数字化转型、市场营销数字化转型。通过关注我们的公众号，你就知道如何实现企业数字化转型？数字化转型如何做？

【CXO UNION部分社群会员】特发服务CISO、南山智尚CISO、中伟CISO、润阳CISO、南凌CISO、天秦CISO、研奥CISO、法本CISO、博俊CISO、江天CISO、华安鑫创CISO、华骐CISO、屹通CISO、通用电梯CISO、三友联众CISO、中辰CISO、盈建科CISO、中英CISO、药易购CISO、信测标准CISO、秋田微CISO、南极光CISO、创识CISO、易瑞CISO、春晖智控CISO、曼卡龙CISO、恒而达CISO、德必CISO、冠中生态CISO、奥雅设计CISO、德固特CISO、博硕CISO、恒辉安防CISO、震裕CISO、嘉亨家化CISO、英力CISO、贝泰妮CISO、建工修复CISO、线上线下CISO、通业CISO、深水海纳CISO、中金辐照CISO、中洲特材CISO、本川智能CISO、恒宇信通CISO、共同CISO、晓鸣CISO、格林精密CISO、恒帅CISO、华绿CISO、博亚精工CISO、万辰CISO、立高CISO、商络CISO、达瑞CISO、深圳瑞捷CISO、东箭CISO、华利CISO、祥源CISO、中红CISO、苏文电能CISO、尤安设计CISO、金沃CISO、致远新能CISO、志特CISO、川网CISO、津荣天宇CISO、蕾奥规划CISO、同飞CISO、创益通CISO、泰福泵业CISO、玉马遮阳CISO、久祺CISO、奇德CISO、普联CISO、欢乐家CISO等

展开阅读全文

页面更新：2024-03-16

标签：人工智能基础设施总监首席大会代码方式会议公司论坛

1 2 3 4 5

开源不适用于人工智能？

一种新型的主量元素原位微区分析方法：桌上型SEM/SDD-EDS分析系统

果集超级腕丨C咖联合创始人Nikita讲述C咖小罐膜品类

安卓手机最新性能排行榜出炉：同样是骁龙8 Gen2，性能差距巨大！

抢在苹果入场前确立优势！VR厂商卷得厉害，消费者却不买账

.NET 7 和 C# 11 的 7 大自定义扩展方法

如何上架App Store？（2023年）

大陆的伸展推动了古代全球变暖事件

两家跨境互联网券商被证监会点名整改：行业“无证驾驶”定调 - 愉见财经

智慧消防解决方案(智慧消防平台方案建设思路)

6小时爆闪40万次，汤加火山刷新全球闪电纪录！为何闪电如此之多

ModBus RTU、ASCII、TCP，选哪种模式更好？

稳赚不赔！灰度信托的投资者累计支付了15亿美元的天价管理费

用了两年的小米11烧了主板，为何用户还在坚持？雷军的做法太绝了

经开区再落一子，打造世界一流新能源汽车检测检验平台！

未来5年，洗美的前景比维保更大？

陈吉栋｜人工智能法的理论体系与核心议题

科洛结构自防水技术（深圳）有限公司总经理杨飞

互联网基础设施影响区域经济发展的机制分析

做直播电商仅仅一年，销售额就破亿，这家小公司是如何做到

PlayStation聘请前苹果公司高管负责旗下数字业务

我国又一重大科技基础设施全线贯通

腾讯会议们为什么就非要免费？

语音人工智能公司SoundHound解雇近一半员工并提供"可

无人机配送要来了？顺丰无人机公司业务新增快递服务，公司

产后如何恢复？医学博士教你正确打开方式