BIOSTATISTICS A Foundation for Analysis in the Health Sciences
Contents:
- INTRODUCTION TO BIOSTATISTICS
- DESCRIPTIVE STATISTICS
- SOME BASIC PROBABILITY CONCEPTS
- PROBABILITY DISTRIBUTIONS
- SOME IMPORTANT SAMPLING DISTRIBUTIONS
- ESTIMATION
- HYPOTHESIS TESTING
- ANALYSIS OF VARIANCE
- SIMPLE LINEAR REGRESSION AND CORRELATION
- MULTIPLE REGRESSION AND CORRELATION
- REGRESSION ANALYSIS: SOME ADDITIONAL TECHNIQUES
- THE CHI-SQUARE DISTRIBUTION AND THE ANALYSIS OF FREQUENCIES
- NONPARAMETRIC AND DISTRIBUTION-FREE STATISTICS
- SURVIVAL ANALYSIS
- VITAL STATISTICS (ONLINE)
CHAPTER 1 INTRODUCTION TO BIOSTATISTICS
The concepts and methods necessary for achieving the first objective are presented under the heading of descriptive statistics, and the second objective is reached through the study of what is called inferential statistics.
Discrete Random Variable: A discrete variable is characterized by gaps or interruptions in the values that it can assume. These gaps or interruptions indicate the absence of values between particular values that the variable can assume. Some examples illustrate the point. The number of daily admissions to a general hospital is a discrete random variable since the number of admissions each day must be represented by a whole number, such as 0, 1, 2, or 3. The number of admissions on a given day cannot be a number such as 1.5, 2.997, or 3.333. The number of decayed, missing, or filled teeth per child in an elementary school is another example of a discrete variable.
DEFINITION Statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample that has been drawn from that population.
EXAMPLE 1.4.1 Gold et al. (A-1) studied the effectiveness on smoking cessation of bupropion SR, a nicotine patch, or both, when co-administered with cognitive-behavioral therapy. Consecutive consenting patients assigned themselves to one of the three treatments. For illustrative purposes, let us consider all these subjects to be a population of size N Ā¼ 189. We wish to select a simple random sample of size 10 from this population whose ages are shown in Table 1.4.1.
Chapter 1: Introduction to Biostatistics
Chapter Overview
This chapter introduces basic statistical concepts and terminology used in biostatistics. Topics include variables, measurement scales, sampling, statistical inference, the scientific method, and the use of computers in biostatistical analysis.
1.1 Introduction
- Data: Information in the form of numbers.
- Descriptive Statistics: Organizes and summarizes data.
- Inferential Statistics: Makes decisions about a population based on a sample.
1.2 Some Basic Concepts
- Statistics: Study of collecting, organizing, summarizing, analyzing data, and drawing inferences.
- Biostatistics: Application of statistics to biological sciences and medicine.
- Variable: A characteristic that can take on different values.
- Quantitative Variables: Measurable (e.g., height, weight).
- Qualitative Variables: Categorical (e.g., gender, ethnicity).
- Random Variable: Values arising from chance.
- Discrete Random Variable: Has gaps (e.g., number of hospital admissions).
- Continuous Random Variable: Can assume any value within an interval (e.g., height).
1.3 Measurement and Measurement Scales
- Measurement: Assignment of numbers to objects/events according to rules.
- Measurement Scales:
- Nominal Scale: Classifies data into categories (e.g., male/female).
- Ordinal Scale: Ranks data (e.g., low, medium, high).
- Interval Scale: Measures differences between data points (e.g., temperature).
- Ratio Scale: Measures ratios between data points, with a true zero (e.g., weight).
1.4 Sampling and Statistical Inference
- Statistical Inference: Drawing conclusions about a population based on a sample.
- Simple Random Sample: Every possible sample of size ( n ) has the same chance of being selected.
1.5 The Scientific Method and the Design of Experiments
- Scientific Method: Systematic approach to research.
- Experiment Design: Planning experiments to ensure valid and reliable results.
1.6 Computers and Biostatistical Analysis
- Use of Computers: Essential for data analysis in biostatistics.
1.7 Summary
- Understanding basic statistical concepts and terminology is essential for biostatistical analysis.
- Learning outcomes include selecting random samples, understanding the scientific method, and appreciating the role of computers in data analysis.
Examples Using Python and R
Example 1: Descriptive Statistics
Python:
import pandas as pd
# Example data
data = {'Height': [150, 160, 170, 180, 190], 'Weight': [50, 60, 70, 80, 90]}
df = pd.DataFrame(data)
# Descriptive statistics
print(df.describe())
R:
# Example data
data <- data.frame(Height = c(150, 160, 170, 180, 190), Weight = c(50, 60, 70, 80, 90))
# Descriptive statistics
summary(data)
Example 2: Sampling
Python:
import random
# Population
population = list(range(1, 101))
# Simple random sample
sample = random.sample(population, 10)
print(sample)
Example 3: Measurement Scales
Python:
# Nominal Scale
gender = ['male', 'female', 'female', 'male']
print(set(gender))
# Ordinal Scale
socioeconomic_status = ['low', 'medium', 'high', 'medium']
print(sorted(set(socioeconomic_status)))
# Interval Scale
temperatures = [20, 25, 30, 35]
print([temp - 20 for temp in temperatures])
# Ratio Scale
weights = [50, 60, 70, 80]
print([weight / 50 for weight in weights])
R:
# Nominal Scale
gender <- c('male', 'female', 'female', 'male')
print(unique(gender))
# Ordinal Scale
socioeconomic_status <- c('low', 'medium', 'high', 'medium')
print(sort(unique(socioeconomic_status)))
# Interval Scale
temperatures <- c(20, 25, 30, 35)
print(temperatures - 20)
# Ratio Scale
weights <- c(50, 60, 70, 80)
print(weights / 50)