SpaceX launch exploration

Analysis and report of SpaceX launches as part of IBM Data Science capstone project

Data Science

Context

This analysis and report are part of the capstone project of the IBM Data Science Professional Certificate. Here we examine the capabilities of launching and returning rockets in relation to their payload and other factors.

Executive Summary

Intro: Data collection (API, Web Scraping), Data Processing
Methodology: EDA, EDA with Data Visualization (Folium, Plotly Dash), Predictive Analysis (Classification)
Insights and predictions into launch outcomes in relation to payload, launch site

Project background and context

Aerospace industry is a cost intensive business. First stages of rockets are large and expensive.
Recovering the first stages can be an immense cost relief.
The first stage booster rocket differ in their capability to transport equipment to space (payload).

Problems we want to explore

Insights about the reusage of rocket-stages
Determine if we can predict the launch cost, if the first stage will land.

Methodology

Data collection methodology
SpaceX REST API, Web Scraping
data wrangling
Json normalized, data sampled, dealing with nulls, created new aggregated data columns
exploratory data analysis (EDA) using visualization and SQL
interactive visual analytics using Folium and PlotlyDash
predictive analysis using classification models
Class creation, standardize data, split-train-test data, find best hyperparameter (SVM, Classification Tree, Logistic Regression)

Data Collection – API and Web Scraping

data collection of SpaceX Rest API with additional endpoints
web scraping of Wikipedia (with BeautifulSoup)

Data Wrangling

the API data requests resulted in json format files
json files were normalized into a dataframe
additional data was acquired through other APIs (rockets, launchpads, etc.)
Sampling data (head), dealing with Nulls
Calculate number of launches on each site
creating a landing outcome column / success class of landing

Exploratory Data Analysis (EDA) with Data Visualization

Variables: Payload Mass, Flight Number, Launch Site, Orbit Types, Success rates
Gained insights about best places to start launches (CCAFS)
Success rates in relation to launch sites
Overall success rate is rising since 2013

Flight Number vs. Launch Site

Payload vs. Launch Site

VAFB site has a no launches over 10 000 kg
CCAFS has most launches with heaviest payloads
CCAFS launches with less than 8000 show higher failure rate

Success Rate vs. Orbit Type

ES L1, GEO, HEO, SSO: highest success rates around 100%
GTO: lowest success rate around 50%
SSO: probably no data or 100% Failure

Launch Success Yearly Trend

success rate starts increasing 2013
some failures after 2017
again an increasing success rate around 2018

Success rates of launch sites (Folium Map)

CCAFS (Cape Canaveral) LC 40: most starts with most failures
KSC (Kennedy Space Center) LC 39A: most successful starts
CCAFS (Cape Canaveral) SLC 40: least starts

Model accuracy for all built classification models (bar chart)

Logistic Regression (LR)
Support Vector Machine (SVM)
KNeigborsClassifier (all reach very similar scores around: 0.833)
Decision Tree Classifier has the lowest score: 0.72 2

Predictive Analysis (Classification)

data was transformed and preprocessed
A prediction class was calculated ( Numpy Array)
The test data was separated into two parts, one to train the model on the data, one to test the model on unknown data

Confusion Matrix of one of the best performing model

This KNN model predicts 12 labels correctly as landed
it predicted 3 not landed labels as not landed
It falsely predicted 3 not landed as landed
It did not predict any landed as not landed

Conclusion

Most successful starts from: Kennedy Space Center Launch Complex 39 (KSC LC 39A)
Importance of launch site: different specifications due to weather, geo location
Payload may result in difficulties to launch / land
Different orbits result in different success rates
Based on all the acquired and processed data we can make predictions about the successful outcome of a launch / landing: best performing models score around 0.83

this project was part of the IBM Data Science Professional certificate

Github repository with all notebooks

Previous ProjectData visualization through the centuries – a visual analysis of milestones
Next Projectdesign and technology study programs

SpaceX launch exploration

Previous ProjectData visualization through the centuries – a visual analysis of milestones

Next Projectdesign and technology study programs

finis coronat opus
© 2014 — 2024

connect

contact

SpaceX launch exploration

Previous ProjectData visualization through the centuries – a visual analysis of milestones

Next Projectdesign and technology study programs

finis coronat opus © 2014 — 2024

connect

contact

finis coronat opus
© 2014 — 2024