Working with Subquery in the SQL procedure, Summaries of Database Management Systems (DBMS)

This paper will explore how subqueries work properly with different components of SQL such as WHERE, HAVING, FROM,. SELECT, SET clause. It also demonstrates ...

Typology: Summaries

2022/2023

Uploaded on 03/01/2023

parvini
parvini 🇺🇸

4.5

(15)

243 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
WORKING WITH SUBQUERY IN THE SQL PROCEDURE
Lei Zhang, Domain Solutions Corp., Cambridge, MA
Danbo Yi, Abt Associates, Cambridge, MA
ABSTRACT
Proc SQL is a major contribution to the SAS®/BASE
system. One of powerful features in SQL procedure
is subquery, which provides great flexibility in
manipulating and querying data in multiple tables
simultaneously. However, subquery is the subtlest
part of the SQL procedure. Users have to
understand the correct way to use subqueries in a
variety of situations. This paper will explore how
subqueries work properly with different components
of SQL such as WHERE, HAVING, FROM,
SELECT, SET clause. It also demonstrates when
and how to convert subqueries into JOINs solutions
to improve the query performance.
INTRODUCTION
One of powerful features in the SAS SQL procedure
is its subquery, which allow a SELECT statement to
be nested inside the clauses of another SELECT, or
inside an INSERT, UPDATE or DELETE statement,
even or inside another subquery. Depending on the
clause that contains it, a subquery can select a
single value or multiple values from related tables. If
more than one subquery is used in a query-
expression, the innermost query is evaluated first,
then the next innermost query, and so on. PROC
SQL allows a subquery (contained in parentheses)
to be used at any point in a query expression, but
user must understand when and where to use
uncorrelated/correlated subqueries, non-scalar and
scalar subqueries, or the combinations. This paper
will discuss the subquery usage with following
sections:
Subquery in Where clause
Subquery in Having clause
Subquery in From clause
Subquery in Select clause
Subquery in SET clause
Subquery and JOINs
The examples in this paper refer to an imaginary set
of clinical trial data, but the techniques used will
apply to other industries as well. There are three
SAS data sets in the example database: Data set
DEMO contains variables PATID (Patient ID), AGE,
and SEX. Data set VITAL contains variables PATID,
VISID (Visit ID), VISDATE (Visit date), DIABP and
SYSBP (patient’s diastolic/ systolic blood pressure
which is measured twice at a visit). Data set
VITMEAN contains patient’s mean diastolic/ systolic
blood pressure at each visit recorded by doctors in
variables PATID, VISID, VISDATE, MDIABP and
MSYSBP. The source SAS codes to create the three
data sets are listed in the appendix.
SUBQUERY IN THE WHERE CLAUSE
We begin with WHERE clause because it is the
most common place to use subqueries. Consider
following two queries:
Example 1: Find patients who have records in table
DEMO but not in table VITAL.
Proc SQL;
Select patid from demog
Where patid not in (select patid from vital);
Result:
PATID
-----
150
Example 2: Find patients who didn’t have their
blood pressures measured twice at a visit.
Proc SQL;
select patid from vital T1
where 2>(select count(*)
from vital T2
where T1.patid=T2.patid
and
T1.visid=T2.visid);
Result::
PATID
-----
110
140
In the example 1, The SELECT expression in the
parentheses is an un-correlated subquery. It is an
inner query that is evaluated before the outer (main)
query and its results are not dependent on the
pf3
pf4
pf5

Partial preview of the text

Download Working with Subquery in the SQL procedure and more Summaries Database Management Systems (DBMS) in PDF only on Docsity!

WORKING WITH SUBQUERY IN THE SQL PROCEDURE

Lei Zhang, Domain Solutions Corp., Cambridge, MA

Danbo Yi, Abt Associates, Cambridge, MA

ABSTRACT

Proc SQL is a major contribution to the SAS®/BASE system. One of powerful features in SQL procedure is subquery, which provides great flexibility in manipulating and querying data in multiple tables simultaneously. However, subquery is the subtlest part of the SQL procedure. Users have to understand the correct way to use subqueries in a variety of situations. This paper will explore how subqueries work properly with different components of SQL such as WHERE, HAVING, FROM, SELECT, SET clause. It also demonstrates when and how to convert subqueries into JOINs solutions to improve the query performance.

INTRODUCTION

One of powerful features in the SAS SQL procedure is its subquery, which allow a SELECT statement to be nested inside the clauses of another SELECT, or inside an INSERT, UPDATE or DELETE statement, even or inside another subquery. Depending on the clause that contains it, a subquery can select a single value or multiple values from related tables. If more than one subquery is used in a query- expression, the innermost query is evaluated first, then the next innermost query, and so on. PROC SQL allows a subquery (contained in parentheses) to be used at any point in a query expression, but user must understand when and where to use uncorrelated/correlated subqueries, non-scalar and scalar subqueries, or the combinations. This paper will discuss the subquery usage with following sections:

  • Subquery in Where clause
  • Subquery in Having clause
  • Subquery in From clause
  • Subquery in Select clause
  • Subquery in SET clause
  • Subquery and JOINs

The examples in this paper refer to an imaginary set of clinical trial data, but the techniques used will apply to other industries as well. There are three SAS data sets in the example database: Data set DEMO contains variables PATID (Patient ID), AGE, and SEX. Data set VITAL contains variables PATID, VISID (Visit ID), VISDATE (Visit date), DIABP and SYSBP (patient’s diastolic/ systolic blood pressure which is measured twice at a visit). Data set VITMEAN contains patient’s mean diastolic/ systolic

blood pressure at each visit recorded by doctors in variables PATID, VISID, VISDATE, MDIABP and MSYSBP. The source SAS codes to create the three data sets are listed in the appendix.

SUBQUERY IN THE WHERE CLAUSE

We begin with WHERE clause because it is the most common place to use subqueries. Consider following two queries: Example 1: Find patients who have records in table DEMO but not in table VITAL. Proc SQL; Select patid from demog Where patid not in (select patid from vital);

Result :

PATID

150

Example 2 : Find patients who didn’t have their blood pressures measured twice at a visit. Proc SQL;

select patid from vital T where 2>(select count(*) from vital T where T1.patid=T2.patid and T1.visid=T2.visid);

Result::

PATID

110 140

In the example 1, The SELECT expression in the parentheses is an un-correlated subquery. It is an inner query that is evaluated before the outer (main) query and its results are not dependent on the

values in the main query. It is not a scalar subquery because it will return multiple values.

In the example2, the SELECT expression in the parenthesis is a correlated subquery because the subquery refers to values from variables T1.PATID and T1.VISID in a table T1 of the outer query. The correlated subquery is evaluated for each row in the outer query. With correlated subqueries, PROC SQL executes the subquery and the outer query together. The subquery is also a scalar subquery because aggregate function COUNT(*) always returns one value with one column for each row values from the outer query at a time.

Multiple subqueries can be used in the SQL WHERE statement.

Example 3 : Find patients who have the maximum number of visits.

Proc SQL;

Select distinct patid from vitmean as a Where not exists (select visid from vitmean Except Select visid from vitmean as b Where a.patid=b.patid);

Result::

PATID

120

Below is a nested correlated subquery solution to example 3.

Proc SQL;

Select distinct patid from vitmean as a where not exists (select distinct visid from vitmean as b where not exists (select visid from vitmean as c where a.patid=c.patid and b.visid=c.visid));

This solution is using a double negative structure. The original question can be put in another way: Find each patient for whom no visit exists in which the patient concerned has never had. The two nested subqueryies produce a list of visits for whom a given patient didn’t have. The main query presents those patients for whom the result table of the subquery is empty. SQL procedure determines for each patient separately whether the subquery yields no result.

In the WHERE clause, IN, ANY, ALL, EXISTS predicates are allowed to use any type of subqueries (see example 1 and 3 above); However, only scalar subqueries can be used in relation operators (>, =, <, etc), BETWEEN, and LIKE predicates. One way to turn a subquery to into a scalar subquery is using aggregate functions. Consider Example 4 below, which two scalar subqueries are used in the BETWEEN predicate.

Example 4 : Find patients whose age is in the average age +/- 5 years. Proc SQL; Select patid from demog where age between (select Avg(age) from demog)- and (select avg(age) from demog)+5; Result:

PATID

110

SUBQUERY IN THE HAVING CLAUSE

Users can use one or more subqueries in the Having clause as long as they do not produce multiple values. It means that the most outer subquery in the Having clause must be scalar subquery, either uncorrelated or correlated. The following query is an example,

Example 4 : Find patients who didn’t have maximun number of visits. Proc SQL; Select patid From vitmean as a Group by patid

(select patid, visid from VITAL) AS T on T1.visid=T2.visid and T1.patid < T2.patid Group BY T1.PATID, T2.PATID Having count()=(Select count() From vital as T Where T3.PATID=T1.PATID) and count()= (Select Count() From Vital as T

Where T4.PATID=T2.PATID); RESULT :

PATID PATID

100 130

SUBQUERY IN THE SELECT CLAUSE

SAS SQL can use a scalar subquery or even a correlated scalar subquery in the SELECT clause. If the subquery results are an empty, the result is a missing value. This feature lets you write a LEFT JOIN as a query within SELECT clause. Here is an example.

Example 7 : Compute the difference between recorded mean of SYSBP from VITMEAN table and calculated mean of SYSBP from VITAL for each patient and visit.

Proc SQL; Select a.patid, a.visid, a.msysbp - (select avg(sysbp) from vital as b where a.patid=b.patid and a.visid=b.visid) as diff from vitmean as a;

Result:

PATID VISID DIFF 100 1 0 100 2 0 100 3 0 110 1 0 110 3 0 120 1 0 120 2 34 120 3 7. 120 4 0 130 1 -0. 130 2 -0. 130 3 -0. 140 1 0 140 2 0

The subquery in the SQL clause is a correlated scalar subquery. It will calculate average systolic blood pressure based on VITAL table for each patient visit from table VITMEAN. The main query then use the result to calculate the difference between recorded systolic blood pressure and calculated systolic blood pressure for each patient visit. The process will be repeated until all patients in table VITMEAN are checked.

SUBQUERY IN THE SET CLAUSE

You can use scalar subquery and/or correlated scalar subquery in the SET clause of the Update statement to modify a table based on itself or another table. Here is an example

Example 8: Modify the mdiabp and msysbp in the VITMEAN using values calculated from table Vital;

proc sql; UPDATE VITMEAN as a Set mdiabp=(select avg(b.diabp) from VITAL as b where a.patid=b.patid and a.visid=b.visid), msysbp=(select avg(c.sysbp) from VITAL as c wherea.patid=c.patid and a.visid=c.visid);

SUBQUERY AND JOINS

Some users find subquery difficult to read, and consequently fear that subqueries are difficult to use. Indeed, many subqueries can be replaced by one or more join queries. However, if you can’t construct an equivalent join query, the only alternative will be to use a series of queries and pass the results from one query to another. This is obviously less efficient than using subqueries. Subqueries can perform wonders of selection and comparison so that one set of statements and nested subqueries can replace several non- subquery statements. This imporove efficiency and could even effect performance.

Generally whether using a join or a subquery is simply a matter of individual preference: they can often be used interchangeably to solve a given problem. Users will find that subqueries exceed the capabilities of join for certain queries. In some cases, a subquery is the shortest route to the results a user need. However, if users can find an equivalent join query for a subquery, it can replace the subquery to improve the performance, For example, the JOIN solution to the example 1 is:

proc SQL; select a.patid from demog as a left join (select distinct patid from vital) as b on a.patid=b.patid where b.patid is NULL;

and the JOIN solution to Example 7 is.

Proc SQL; select a.patid, b.visid, max(msysbp)-avg(b.sysbp) as diff from vitmean as a left join vital as b on a.patid=b.patid and a.visid=b.visid group by a.patid, b.visid;

In some situation, it may be worthwhile to look into rewriting such subqueries as JOINs, but the development and maintenance of such codes may have them less desirable than subqueries because JOINs solution may improve the performance at the cost of losing readability.

CONCLUSIONS

We summarize the usages of subquery in the different SQL components as follows:

SQL Components

Usage

Relation Operator (<, =, >, etc)

Scalar uncorrealted/correlated subquery

Between, Like

Scalar uncorrealted/correlated subquery IN, ANY, ALL, SOME, Exists

Uncorrelated/Correlated subquery

FROM Clause Uncorrelated subquery Having Clause

Scalar uncorrealted/correlated subquery Select Clause

Scalar uncorrealted/correlated subquery SET clause Scalar uncorrealted/correlated subquery

SAS and SAS/BASE mentioned in this paper are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

REFERENCES

SAS Institute, Inc. (1992), SAS Guide to the SQL Procedure: Usage and Reference, Version 6, First Edition SAS Institute, Inc. (1994), SAS Technical Report P- 222, Changes and Enhancements to Base SAS Software, Release 6.07. Getting started with the SQL, Procedure, First Edition

AUTHOR’S ADDRESS

Lei Zhang DomianPharma Inc. 10 Maguire Road Lexington, MA 02140, USA Tel: 781-778- Fax: 781-778- e-mail: [email protected]

Danbo Yi Abt Associates Inc. 55 Wheeler Street Cambridge, MA 02138-1168, USA Tel: 617-349- Fax: 617-520- e-mail: [email protected]