...药品销售数据_weixin_26714477的博客-CSDN博客_python中sales-免疫在线蚂蚁淘旗下平台-

当前位置：首页 > 新闻动态 >

热卖商品

新闻详情

...药品销售数据_weixin_26714477的博客-CSDN博客_python中sales

来自 : CSDN技术社区发布时间：2021-03-24

The only thing we need to do is to take this value and print it as a result.

我们唯一需要做的就是获取该值并将其打印出来。

# Displaying resultsfor field in result.columns.values[0:1]: print( The drug most often sold on Mondays in 2017 is str(field)) print( with the volume of str(round(result[field].iloc[0], 2)))

The drug most often sold on Mondays in 2017 is N02BE
with the volume of 1160.56

In the above exercises, we were loading data sets to Pandas Data Frames and we were looking for specific information using functions like grouping, sorting and summarising. These are great exercises to practice these types of data manipulations and we will be using them often in the next exercises.

在上面的练习中我们正在将数据集加载到Pandas Data Frames中并且正在使用诸如分组排序和汇总之类的功能来查找特定信息。这些是练习这些类型的数据操作的出色练习我们将在下一个练习中经常使用它们。

2020年1月可能会售出什么药 (What medicine sales may be in January 2020?)

We will now look at regression which is a very common data science task. The idea of regression is to predict the value of a dependent variable based on the values of one or more independent variables. Using different methods of regression and using past data we can try to predict future values.

现在我们将讨论回归这是非常常见的数据科学任务。回归的思想是基于一个或多个自变量的值来预测因变量的值。使用不同的回归方法和过去的数据我们可以尝试预测未来的价值。

In this exercise, we will try to predict sales volume in the future months for the data recorded between 2014 and 2019.

在本练习中我们将尝试使用2014年至2019年之间记录的数据预测未来几个月的销量。

前处理 (Preprocessing)

Looking at the data sets we can see that the data is good quality but there are some records where sales value is 0 for at least one group of drugs. This is usually something that we need to take care of before we run any machine learning model. In this case, we have a couple of options; we can remove rows where the recorded sales value is 0 for at least one group of drugs or we can also replace 0 values with the mean or median value for the group. For simplicity, we will remove all records where the recorded sales value is 0 for at least one group of drugs but we recommend you to repeat this exercise again and replace 0 values with the mean or median value for the group to see if you will get better results.

查看数据集我们可以看到数据质量不错但有一些记录显示至少一组药物的销售价值为0。在运行任何机器学习模型之前通常这是我们需要注意的事情。在这种情况下我们有两个选择。我们可以删除至少一组药物的记录销售值为0的行也可以将0值替换为该组的平均值或中值。为简单起见我们将删除至少一组药物的记录销售值为0的所有记录但我们建议您再次重复此练习并将0值替换为该组的平均值或中位数以查看是否会更好的结果。

Another important feature of data is that it contains incomplete sales data for 2019. The last recorded day is 8th of October which means sales data for October is incomplete. Because for the regression methods in this exercise we only use monthly sales data, we have excluded data from October 2019 from the analysis.

数据的另一个重要特征是它包含2019年不完整的销售数据。最后记录的日期是10月8日这意味着10月的销售数据不完整。因为对于本练习中的回归方法我们仅使用每月销售数据所以从分析中排除了2019年10月的数据。

模型与技术 (Models and Techniques)

We will be using Pandas for reading CSV data files and data preprocessing and Scikit-learn Python library for the regression models.

我们将使用Pandas读取CSV数据文件和数据预处理并使用Scikit-learn Python库创建回归模型。

For data visualisation, we will use Matplotlib Python library.

为了实现数据可视化我们将使用Matplotlib Python库。

We will use the following regression models:

我们将使用以下回归模型

Linear Regression

线性回归 Polynomial Regression

多项式回归 Simple Vector Regression (SVR)

简单向量回归(SVR)

Scikit-learn library includes implementation of all the above models.

Scikit学习库包括上述所有模型的实现。

We have already loaded all the required Scikit-learn libraries at the beginning of our notebook.

我们已经在笔记本的开头加载了所有必需的Scikit学习库。

We will split the data and we will use 70% data for training the models and 30% of data for testing.

我们将拆分数据并将使用70 的数据训练模型并使用30 的数据进行测试。

We will use Voting Regressor to combine different machine learning regressors and return the average predicted values. We do this to balance out individual regressors weaknesses.

我们将使用投票回归器来组合不同的机器学习回归器并返回平均预测值。我们这样做是为了平衡各个回归变量的弱点。

计算和图解 (Calculations and Plots)

We will display individual results for all regressions and Voting Regressor and we will plot all regressions on a chart to visually assess how the data is scattered and how regressions are plotted among dataset values.

我们将显示所有回归和投票回归的单独结果并将所有回归绘制在图表上以直观地评估数据的分散方式以及如何在数据集值之间绘制回归。

Because we are working with a relatively big project here it is a good idea to organise the code using functions.

由于我们正在处理一个相对较大的项目因此最好使用函数来组织代码。

Let’s start from the function that will scatter our train and test data on the chart.

让我们从在图表上分散训练和测试数据的功能开始。

def scatterData(X_train, y_train, X_test, y_test, title): plt.title( Prediction using title) plt.xlabel( Month sequence , fontsize 20) plt.ylabel( Sales , fontsize 20)

本文链接： http://drugssales.immuno-online.com/view-695008.html

发布于： 2021-03-24 阅读（0）

没有了