SQL语法提示工具SQL Prompt教程:为什么SELECT *(BP005)在生产代码中不好?(下)
SQL Prompt根据数据库的对象名称、语法和代码片段自动进行检索,为用户提供合适的代码选择。自动脚本设置使代码简单易读--当开发者不大熟悉脚本时尤其有用。SQL Prompt安装即可使用,能大幅提高编码效率。此外,用户还可根据需要进行自定义,使之以预想的方式工作。
如果“提示”警告您在SELECT语句中使用星号或“star”(*),请考虑将其替换为显式列列表。它将防止不必要的网络负载和查询性能问题,并避免在插入表时如果列顺序更改而造成问题。这篇文章主要描述该教程的后半部分内容,“为什么SELECT *在生产代码中不好?”的一些内容(紧接上文),还有“在应用程序中选择*”的内容。
误解
使用SELECT *,您不能确保代码始终以相同的顺序返回相同的列,这意味着它对数据库重构没有弹性。对表源的上游修改可以更改列的顺序或数量。如果使用来传输数据,INSERT INTO…SELECT *,那么最佳结果将是一个错误,因为分配数据的后果是错误的目标列可能会令人恐惧
我将演示如果在生产代码中使用它,然后需要进行一些数据库重构,那么这将是多么危险。在这里,我们在复制敏感信息时会犯一个错误。这是非常容易做到的,并且可能导致财务违规,而不会触发任何错误。如果您情绪紧张,请立即移开视线。
/* we create a table just for our testing */ CREATE TABLE dbo.ExchangeRates --lets pretend we have this data ( CurrencyRateDate DATETIME NOT NULL, AverageRate MONEY NOT NULL, EndOfDayRate MONEY NOT NULL, FromCurrency NVARCHAR(50) NOT NULL, FromRegion NVARCHAR(50) NOT NULL, ToCurrency NVARCHAR(50) NOT NULL, ToRegion NVARCHAR(50) NOT NULL ); /* we now steal data for it from AdventureWorks next-door */ INSERT INTO dbo.ExchangeRates SELECT CurrencyRate.CurrencyRateDate, CurrencyRate.AverageRate, CurrencyRate.EndOfDayRate, Currency.Name AS FromCurrency, CountryRegion.Name AS FromRegion, CurrencyTo.Name AS ToCurrency, CountryRegionTo.Name AS ToRegion FROM Adventureworks2016.Sales.CurrencyRate INNER JOIN Adventureworks2016.Sales.Currency ON CurrencyRate.FromCurrencyCode = Currency.CurrencyCode INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency ON Currency.CurrencyCode = CountryRegionCurrency.CurrencyCode INNER JOIN Adventureworks2016.Person.CountryRegion ON CountryRegionCurrency.CountryRegionCode = CountryRegion.CountryRegionCode INNER JOIN Adventureworks2016.Sales.Currency AS CurrencyTo ON CurrencyRate.ToCurrencyCode = CurrencyTo.CurrencyCode INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency AS CountryRegionCurrencyTo ON CurrencyTo.CurrencyCode = CountryRegionCurrencyTo.CurrencyCode INNER JOIN Adventureworks2016.Person.CountryRegion AS CountryRegionTo ON CountryRegionCurrencyTo.CountryRegionCode = CountryRegionTo.CountryRegionCode; GO /* so we start our test by creating a view to show exchange rates from equador */ CREATE VIEW dbo.EquadorExhangeRates AS SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate, ExchangeRates.EndOfDayRate, ExchangeRates.FromCurrency, ExchangeRates.FromRegion, ExchangeRates.ToCurrency, ExchangeRates.ToRegion FROM dbo.ExchangeRates WHERE ExchangeRates.FromRegion = 'Ecuador'; go /* now we just fill a table variable with the first ten rows from the view and display them */ DECLARE @MyUsefulExchangeRates TABLE ( CurrencyRateDate DATETIME NOT NULL, AverageRate MONEY NOT NULL, EndOfDayRate MONEY NOT NULL, FromCurrency NVARCHAR(50) NOT NULL, FromRegion NVARCHAR(50) NOT NULL, ToCurrency NVARCHAR(50) NOT NULL, ToRegion NVARCHAR(50) NOT NULL ); INSERT INTO @MyUsefulExchangeRates ( CurrencyRateDate, AverageRate, EndOfDayRate, FromCurrency, FromRegion,ToCurrency, ToRegion) SELECT * --this isn't good at all FROM dbo.EquadorExhangeRates; --disply the first ten rows from the table to see what we have SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate, UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion FROM @MyUsefulExchangeRates AS UER ORDER BY UER.CurrencyRateDate DESC; GO /* end of first part. Now someone decides to alter the view */ alter VIEW dbo.EquadorExhangeRates AS SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate, ExchangeRates.EndOfDayRate, ExchangeRates.ToCurrency, ExchangeRates.ToRegion, ExchangeRates.FromCurrency, ExchangeRates.FromRegion FROM dbo.ExchangeRates WHERE ExchangeRates.FromRegion = 'Ecuador'; GO /* we repeat the routine to extract the first ten rows exactly as before */ DECLARE @MyUsefulExchangeRates TABLE ( CurrencyRateDate DATETIME NOT NULL, AverageRate MONEY NOT NULL, EndOfDayRate MONEY NOT NULL, FromCurrency NVARCHAR(50) NOT NULL, FromRegion NVARCHAR(50) NOT NULL, ToCurrency NVARCHAR(50) NOT NULL, ToRegion NVARCHAR(50) NOT NULL ); INSERT INTO @MyUsefulExchangeRates( CurrencyRateDate, AverageRate, EndOfDayRate, FromCurrency, FromRegion,ToCurrency, ToRegion) SELECT * --bad, bad, bad FROM dbo.EquadorExhangeRates; --check that the data is the same. It isn't is it? No sir! SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate, UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion FROM @MyUsefulExchangeRates AS UER ORDER BY UER.CurrencyRateDate DESC; GO /* now just tidy up and tear down */ DROP VIEW dbo.EquadorExhangeRates DROP TABLE dbo.ExchangeRates
这是“之前”和“之后”结果…。
如您所见,通过切换“to”和“from”列,我们“无意”破坏了数据。引用列列表在您的代码中是多余的。但是,它的执行速度甚至比仅用星号指定所有列(假设它们按特定顺序排列)时的速度甚至更快。
约束问题
当我们使用SELECT *与大量的联接表时,我们可以并且可能会有重复的列名。这是来自AdventureWorks的简单查询:
SELECT * FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Department AS d ON edh.DepartmentID = d.DepartmentID WHERE (edh.EndDate IS NULL);
此代码将显示重复的列名称:
DECLARE @SourceCode NVARCHAR(4000)=' SELECT * FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Department AS d ON edh.DepartmentID = d.DepartmentID WHERE (edh.EndDate IS NULL); --' SELECT Count(*) AS Duplicates, name FROM sys.dm_exec_describe_first_result_set(@SourceCode, NULL, 1) GROUP BY name HAVING Count(*) > 1 ORDER BY Count(*) DESC;
这将给试图在选择命名列时理解这种结果的应用程序带来问题。如果您尝试根据结果创建一个临时表,使用SELECT…INTO会失败。
SELECT * INTO MyTempTable FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Department AS d ON edh.DepartmentID = d.DepartmentID WHERE (edh.EndDate IS NULL); Msg 2705, Level 16, State 3, Line 19 Column names in each table must be unique. Column name 'BusinessEntityID' in table 'MyTempTable' is specified more than once.
同样,这意味着您的SELECT *代码很脆弱。如果有人在一个表中更改了名称,则可能会在SELECT * INTO其他位置的上创建重复的列,而您只能挠头,想知道为什么正常工作的例程突然崩溃了
有一个地方SELECT *具有特殊的意义,不能被替代。这是在将结果转换为JSON时,并且您需要将联接表作为对象嵌入的结果时发生的情况。
SELECT * FROM HumanResources.Employee AS employee INNER JOIN Person.Person AS person ON person.BusinessEntityID = employee.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS history ON employee.BusinessEntityID = history.BusinessEntityID INNER JOIN HumanResources.Department AS d ON history.DepartmentID = d.DepartmentID WHERE ( history.EndDate IS NULL) FOR JSON AUTO
这将为您提供…(我仅显示数组中的第一个文档)
[{"BusinessEntityID": 1,"NationalIDNumber": "295847284","LoginID": "adventure-works\\ken0","JobTitle": "Chief Executive Officer","BirthDate": "1969-01-29","MaritalStatus": "S","Gender": "M","HireDate": "2009-01-14","SalariedFlag": true, "VacationHours": 99, "SickLeaveHours": 69, "CurrentFlag": true, "rowguid": "F01251E5-96A3-448D-981E-0F99D789110D","ModifiedDate": "2014-06-30T00:00:00", "person": [{ "BusinessEntityID": 1, "PersonType": "EM","NameStyle": false, "FirstName": "Ken","MiddleName": "J","LastName": "Sánchez","EmailPromotion": 0, "Demographics": "0<\/TotalPurchaseYTD><\/IndividualSurvey>","rowguid": "92C4279F-1207-48A3-8448-4636514EB7E2","ModifiedDate": "2009-01-07T00:00:00", "history": [{ "BusinessEntityID": 1, "DepartmentID": 16, "ShiftID": 1, "StartDate": "2009-01-14","ModifiedDate": "2009-01-13T00:00:00", "d": [{ "DepartmentID": 16, "Name": "Executive","GroupName": "Executive General and Administration","ModifiedDate": "2008-04-30T00:00:00" }] }] }] }}
这里没有冲突,因为ModifiedDate列被封装在表示源表的对象中
对应的XML给出如下:
<employee BusinessEntityID="1" NationalIDNumber="295847284" LoginID="adventure-works\ken0" JobTitle="Chief Executive Officer" BirthDate="1969-01-29" MaritalStatus="S" Gender="M" HireDate="2009-01-14" SalariedFlag="1" VacationHours="99" SickLeaveHours="69" CurrentFlag="1" rowguid="F01251E5-96A3-448D-981E-0F99D789110D" ModifiedDate="2014-06-30T00:00:00"> <person BusinessEntityID="1" PersonType="EM" NameStyle="0" FirstName="Ken" MiddleName="J" LastName="Sánchez" EmailPromotion="0" rowguid="92C4279F-1207-48A3-8448-4636514EB7E2" ModifiedDate="2009-01-07T00:00:00"> <Demographics> <IndividualSurvey xmlns="//schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"> <TotalPurchaseYTD>0</TotalPurchaseYTD> </IndividualSurvey> </Demographics> <history BusinessEntityID="1" DepartmentID="16" ShiftID="1" StartDate="2009-01-14" ModifiedDate="2009-01-13T00:00:00"> <d DepartmentID="16" Name="Executive" GroupName="Executive General and Administration" ModifiedDate="2008-04-30T00:00:00"/> </history> </person> </employee>
可维护性
在布置代码时,您指定的列不仅避免在将值分配给正确的列或变量时出错,而且还使代码更具可读性。尽您所能,仅出于将来的目的,或者有一天要负责维护代码的可怜的灵魂,就应详细说明所涉及的列的名称。当然,代码看起来有些笨拙,但是如果您的肩膀上出现了一位仙女,并说如果您两次键入代码,您的代码将更加清晰和可靠,您会这样做吗?
在应用程序中选择*
有时,您会看到长时间运行的查询,这些查询请求所有列并且源于一个应用程序,通常是使用LINQ的应用程序。通常,这不是故意的,但是开发人员犯了一个错误,没有指定列的说明,看起来无辜的LINQ查询会转换为SELECT *或包含每个列的列列表。如果该WHERE条款过于笼统,或者甚至被完全遗漏,那么后果就更加复杂了,因为网络始终是最慢的组件,所有不必要的数据都在网络上堆积。
例如,使用Adventureworks和LinqPad,可以在LINQ中执行此操作:
Persons.OrderBy (p => p.BusinessEntityID).Take (100)
…LINQ将其转换为实际执行的查询。您会看到它选择了所有列…
SELECT TOP (100) [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate] FROM [Person].[Person] AS [t0] ORDER BY [t0].[BusinessEntityID]
同样,这个表达式
from row in Persons select row
…将提供整个表格中每一行的每一列。
SELECT [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate] FROM [Person].[Person] AS [t0]
相比之下,这…
from row in Persons.Where(i => i.LastName == "Bradley") select row.FirstName+" "+row.LastName
…翻译成更明智的:
-- Region Parameters DECLARE @p0 NVarChar(1000) = 'Bradley' DECLARE @p1 NVarChar(1000) = ' ' -- EndRegion SELECT ([t0].[FirstName] + @p1) + [t0].[LastName] AS [value] FROM [Person].[Person] AS [t0] WHERE [t0].[LastName] = @p0
结论
一般的代码味道是请求提供比您需要的更多的数据。允许数据源为您进行过滤几乎总是更好、更快的方法。使用SELECT *,在某些情况下是完全合法的,通常是这个更普遍问题的标志。对于那些精通C#或VB但不精通SQL的开发人员来说,诱使他们下载整行甚至整个表,并在更熟悉的领域进行过滤是很诱人的。额外的网络负载和延迟本身应该足以阻止这种做法,但这通常被误认为是“数据库慢”。长列列表(通常列出所有列)几乎与SELECT *一样有害,尽管SELECT *在进行任何重构时会带来额外的风险。
本教程内容到这里就结束了,感兴趣的朋友可以继续关注我们,后面会不管更新新的文章内容!您也可以下载SQL Prompt免费版评估一下~
相关内容推荐:
SQL语法提示工具SQL Prompt教程:为什么SELECT *(BP005)在生产代码中不好?(上)
想要购买SQL Prompt正版授权,或了解更多产品信息请点击