Working with Subquery in the SQL procedure

WORKING WITH SUBQUERY IN THE SQL PROCEDURE

Lei Zhang, Domain Solutions Corp., Cambridge, MA

Danbo Yi, Abt Associates, Cambridge, MA

ABSTRACT

Proc SQL is a major contribution to the SAS®/BASE

system. One of powerful features in SQL procedure

is subquery, which provides great flexibility in

manipulating and querying data in multiple tables

simultaneously. However, subquery is the subtlest

part of the SQL procedure. Users have to

understand the correct way to use subqueries in a

variety of situations. This paper will explore how

subqueries work properly with different components

of SQL such as WHERE, HAVING, FROM,

SELECT, SET clause. It also demonstrates when

and how to convert subqueries into JOINs solutions

to improve the query performance.

INTRODUCTION

One of powerful features in the SAS SQL procedure

is its subquery, which allow a SELECT statement to

be nested inside the clauses of another SELECT, or

inside an INSERT, UPDATE or DELETE statement,

even or inside another subquery. Depending on the

clause that contains it, a subquery can select a

single value or multiple values from related tables. If

more than one subquery is used in a query-

expression, the innermost query is evaluated first,

then the next innermost query, and so on. PROC

SQL allows a subquery (contained in parentheses)

to be used at any point in a query expression, but

user must understand when and where to use

uncorrelated/correlated subqueries, non-scalar and

scalar subqueries, or the combinations. This paper

will discuss the subquery usage with following

sections:

• Subquery in Where clause

• Subquery in Having clause

• Subquery in From clause

• Subquery in Select clause

• Subquery in SET clause

• Subquery and JOINs

The examples in this paper refer to an imaginary set

of clinical trial data, but the techniques used will

apply to other industries as well. There are three

SAS data sets in the example database: Data set

DEMO contains variables PATID (Patient ID), AGE,

and SEX. Data set VITAL contains variables PATID,

VISID (Visit ID), VISDATE (Visit date), DIABP and

SYSBP (patient’s diastolic/ systolic blood pressure

which is measured twice at a visit). Data set

VITMEAN contains patient’s mean diastolic/ systolic

blood pressure at each visit recorded by doctors in

variables PATID, VISID, VISDATE, MDIABP and

MSYSBP. The source SAS codes to create the three

data sets are listed in the appendix.

SUBQUERY IN THE WHERE CLAUSE

We begin with WHERE clause because it is the

most common place to use subqueries. Consider

following two queries:

Example 1: Find patients who have records in table

DEMO but not in table VITAL.

Proc SQL;

Select patid from demog

Where patid not in (select patid from vital);

Result:

PATID

-----

150

Example 2: Find patients who didn’t have their

blood pressures measured twice at a visit.

Proc SQL;

select patid from vital T1

where 2>(select count(*)

from vital T2

where T1.patid=T2.patid

and

T1.visid=T2.visid);

Result::

PATID

-----

110

140

In the example 1, The SELECT expression in the

parentheses is an un-correlated subquery. It is an

inner query that is evaluated before the outer (main)

query and its results are not dependent on the

Working with Subquery in the SQL procedure, Summaries of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Working with Subquery in the SQL procedure and more Summaries Database Management Systems (DBMS) in PDF only on Docsity!

Lei Zhang, Domain Solutions Corp., Cambridge, MA

Danbo Yi, Abt Associates, Cambridge, MA

ABSTRACT

PATID

PATID

PATID

PATID

SUBQUERY IN THE HAVING CLAUSE

PATID PATID

SUBQUERY IN THE SELECT CLAUSE

SUBQUERY IN THE SET CLAUSE

SUBQUERY AND JOINS

CONCLUSIONS