「龙腾网」作为一名数据科学家,如何解决这个真实的业务问题?

正文翻译


How to solve this real business problem as a data scientist?

作为一名数据科学家,如何解决这个真实的业务问题?

评论翻译
Lyndon D'Arcy
How would a data scientist solve this business problem?
Original question details: “Suppose I have a dataset of all the boat owners that sold a boat for the last 7 years....
If I'm trying to create a predictive model of people that own boats and are likely to be selling their boats...in the near future
What sort of data points and data sets as well as software tools would be needed”
There are two issues with your dataset.
You are trying to solve a two-outcome (binomial) classification problem.
You want to predict, based on who owns a boat today, what will be the outcome in the future - sell, or don't sell.

数据科学家将如何解决这个商业问题?
原始问题细节:“假设我有一个包含了过去7年里所有卖船的船主的数据集。如果我试图创建一个预测模型,预测那些拥有船只并可能在不久的将来出售船只的人。需要什么样的数据点、数据集以及软件工具?”
你的数据集存在两个问题。
你正试图解决一个有两个结果(二项)的分类问题。
你想根据今天谁拥有一艘船来预测未来的结果——卖掉还是不卖掉。
Unfortunately, everyone in your dataset sold their boat. That means that your model will always predict a sale outcome, because that's all it knows.
What you need is a dataset of all boat owners, regardless of whether they sold in a given year or not. Then you can start to build a meaningful classifier.
The second issue to be wary of with your dataset, is that if it is based on sales data it may only contain information that was known after the sale event, for example the sale price. You want to make sure that this sort of information is not included in the model, since it is not known at the time that we are making our predictions.
A final thought on other datasets that might be useful.
Why do people sell boats? Too expensive? They don't use it any more? Upgrade to a better boat? Moving city and can't take it with them?
If you can get an expert to give you a breakdown of the main reasons that people sell boats, that will help point you to the data sources that will give you the most predictive value.

不幸的是,你的数据集中的每个人都卖掉了他们的船。这意味着你的模型总是会预测销售结果,因为这是它所知道的一切。
你需要的是所有船主的数据集,不管他们是否在某一年卖出了船。然后你可以开始构建一个有意义的分类器。
关于数据集要注意的第二个问题是,如果它是基于销售数据,那么它可能只包含销售事件之后已知的信息,例如销售价格。你要确保这类信息不包含在模型中,因为我们在做预测时还不知道这些信息。
关于其他可能有用的数据集的最后一个想法。
人们为什么要卖船?太贵了?他们不再用了吗?升级到更好的船?搬来搬去却不能带走吗?
如果你能让专家告诉你人们卖船的主要原因,这将有助于你找到最具预测价值的数据源。
Ricardo Vladimiro
I'll try to go through all of the points you raised.
Features don't really apply here. There's no modelling involved to solve this problem, there might be but you don't actually need it. Picking the top words is a straightforward statistical test. I don't know your dataset or ability to create the one you need though.
The target value depends on what you want to test. This depends on a number of variables.
In the end you mention both models and problems which hints me that you don't really know what you want to do. My best advice is do not do it yourself.
The best solution to this is to hire a statistician or data analyst, preferably one that is able to handle observational studies.

我会尽力把你提出的所有观点都仔细研究一遍。
特性在这里并不适用。解决这个问题不需要建模,可能会有用,但实际上你并不需要它。挑选最热门的单词是一个简单的统计测试。我不知道你的数据集,也不知道你是否有能力创建你需要的数据集。
目标值取决于您想要测试的内容。这取决于许多变量。
最后,你提到了模型和问题,这暗示我你真的不知道自己想做什么。我最好的建议是不要自己做。
对此,最好的解决方案是聘请一名统计学家或数据分析师,最好是能够处理观察性研究的人。
Colleen Farrelly
As a data scientist, what's the complex real life problem you have ever solved (using data science)?
By far, problems involving human behavior and biological responses given a very incomplete set of predictors. At Kaplan, we've been able to predict which students will drop out at a given point of time and which students are at risk for failing exit exams with a high degree of accuracy and predictors I wouldn't have guessed were associated with the behavior at the start of projects (usually >95% accuracy with current models).

作为一名数据科学家,你(使用数据科学)解决过什么复杂的现实生活问题?
到目前为止,涉及人类行为和生物反应的问题给出了一组非常不完整的预测因素。在卡普兰,我们已经能够高度准确地预测哪些学生将在特定时间点退学,哪些学生有可能在毕业考试中不及格,而我不会想到它们与项目开始时的行为有关(通常>95%的现有模型准确率)。
Jason T Widjaja
How can a data scientist negotiate with other parties to get access to their data?
Originally Answered: What is a data scientist without data? How can you negotiate with other parties to gain access to data?
A data scientist without data is like a printer without ink - full of capability but unable to function without the required raw material.
But printers need more than ink to function well. Just as a printer is tasked to print useful content rather than spraying ink randomly on a page, data scientists also need use cases and well formed projects to investigate.
The art of translating business problems into data science use cases - and getting the requisite data to do so - is a crucial but underrated skill. And I suspect the lack of this skill is the cause of many failures in data science investments. Having senior executives hire a team of data scientists to declare with much fanfare ‘go forth and do AI and machine learning!’ is neither necessary nor sufficient.

数据科学家如何与其他各方协商以获取他们的数据?
最初回答:没有数据的数据科学家是什么?你如何与其他各方协商以获得数据访问权限?
没有数据的数据科学家就像没有墨水的打印机——有足够的能力,但没有所需的原材料就无法运作。
但是打印机需要的不仅仅是墨水。就像打印机的任务是打印有用的内容,而不是在页面上随机喷射墨水一样,数据科学家也需要案例和形式良好的项目来进行调查。
将业务问题转化为数据科学用例的艺术——并获得必要的数据——是一项至关重要但被低估的技能。我怀疑缺乏这项技能是数据科学投资失败的原因。高管们聘请了一个数据科学家团队,大张旗鼓地宣布“去做人工智能和机器学习吧!”这句话既不必要也不充分。
Having gone through this use case hunting and data acquisition dozens of times with my team, here are five things that I have found helpful in getting access to data:
Don’t show off tech - show off the potential to solve problems.
Align every project to business strategy. There are multiple possible projects in every business department. But the one that will ultimately get support are usually the ones that most directly and significantly impact business metrics.
Run open learning sessions using data generated to approximate real data sets. There is a lot of interest in data science, machine learning and AI at the moment.
And most importantly:
Communicate with empathy to your audience.

在与我的团队一起进行了数十次用例搜索和数据获取之后,我发现以下5件事情有助于访问数据:
不要炫耀技术——炫耀解决问题的潜力。
使每个项目与业务战略保持一致。每个业务部门都有多个可能的项目。但是最终得到支持的通常是那些对业务指标产生最直接和最显著影响的。
使用生成的接近真实数据集的数据运行开放式学习会话。目前,人们对数据科学、机器学习和人工智能很感兴趣。
最重要的是:
带着同理心与你的听众交流。

展开阅读全文

页面更新:2024-03-12

标签:科学家   船主   业务   数据   人工智能   墨水   打印机   模型   真实   科学   项目

1 2 3 4 5

上滑加载更多 ↓
推荐阅读:
友情链接:
更多:

本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828  

© CopyRight 2008-2024 All Rights Reserved. Powered By bs178.com 闽ICP备11008920号-3
闽公网安备35020302034844号

Top