Overview

Dataset statistics

Number of variables14
Number of observations1309
Missing cells3855
Missing cells (%)21.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory143.3 KiB
Average record size in memory112.1 B

Variable types

Categorical7
Numeric5
Unsupported2

Alerts

name has a high cardinality: 1307 distinct values High cardinality
cabin has a high cardinality: 186 distinct values High cardinality
home.dest has a high cardinality: 369 distinct values High cardinality
pclass is highly correlated with fareHigh correlation
fare is highly correlated with pclassHigh correlation
pclass is highly correlated with fareHigh correlation
fare is highly correlated with pclassHigh correlation
pclass is highly correlated with survived and 2 other fieldsHigh correlation
survived is highly correlated with pclass and 1 other fieldsHigh correlation
age is highly correlated with sibsp and 1 other fieldsHigh correlation
sibsp is highly correlated with pclass and 1 other fieldsHigh correlation
parch is highly correlated with pclass and 2 other fieldsHigh correlation
sex is highly correlated with survivedHigh correlation
survived is highly correlated with sexHigh correlation
pclass is highly correlated with fare and 1 other fieldsHigh correlation
survived is highly correlated with sexHigh correlation
sex is highly correlated with survivedHigh correlation
fare is highly correlated with pclassHigh correlation
embarked is highly correlated with pclassHigh correlation
age has 263 (20.1%) missing values Missing
cabin has 1014 (77.5%) missing values Missing
boat has 823 (62.9%) missing values Missing
body has 1188 (90.8%) missing values Missing
home.dest has 564 (43.1%) missing values Missing
name is uniformly distributed Uniform
cabin is uniformly distributed Uniform
ticket is an unsupported type, check if it needs cleaning or further analysis Unsupported
boat is an unsupported type, check if it needs cleaning or further analysis Unsupported
sibsp has 891 (68.1%) zeros Zeros
parch has 1002 (76.5%) zeros Zeros
fare has 17 (1.3%) zeros Zeros

Reproduction

Analysis started2021-11-14 15:04:59.058179
Analysis finished2021-11-14 15:05:03.061251
Duration4 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

pclass
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
3
709 
1
323 
2
277 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
3709
54.2%
1323
24.7%
2277
 
21.2%

Length

2021-11-14T10:05:03.115568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-14T10:05:03.154949image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
3709
54.2%
1323
24.7%
2277
 
21.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

survived
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
0
809 
1
500 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0809
61.8%
1500
38.2%

Length

2021-11-14T10:05:03.200212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-14T10:05:03.237463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0809
61.8%
1500
38.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1307
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
Connolly, Miss. Kate
 
2
Kelly, Mr. James
 
2
Allen, Miss. Elisabeth Walton
 
1
Ilmakangas, Miss. Ida Livija
 
1
Ilieff, Mr. Ylio
 
1
Other values (1302)
1302 

Length

Max length82
Median length25
Mean length27.13063407
Min length12

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1305 ?
Unique (%)99.7%

Sample

1st rowAllen, Miss. Elisabeth Walton
2nd rowAllison, Master. Hudson Trevor
3rd rowAllison, Miss. Helen Loraine
4th rowAllison, Mr. Hudson Joshua Creighton
5th rowAllison, Mrs. Hudson J C (Bessie Waldo Daniels)

Common Values

ValueCountFrequency (%)
Connolly, Miss. Kate2
 
0.2%
Kelly, Mr. James2
 
0.2%
Allen, Miss. Elisabeth Walton1
 
0.1%
Ilmakangas, Miss. Ida Livija1
 
0.1%
Ilieff, Mr. Ylio1
 
0.1%
Ibrahim Shawah, Mr. Yousseff1
 
0.1%
Hyman, Mr. Abraham1
 
0.1%
Humblen, Mr. Adolf Mathias Nicolai Olsen1
 
0.1%
Howard, Miss. May Elizabeth1
 
0.1%
Horgan, Mr. John1
 
0.1%
Other values (1297)1297
99.1%

Length

2021-11-14T10:05:03.299694image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr763
 
14.3%
miss260
 
4.9%
mrs201
 
3.8%
william87
 
1.6%
john72
 
1.3%
master61
 
1.1%
henry49
 
0.9%
charles39
 
0.7%
james38
 
0.7%
george37
 
0.7%
Other values (1940)3742
70.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

sex
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
male
843 
female
466 

Length

Max length6
Median length4
Mean length4.711993888
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfemale
2nd rowmale
3rd rowfemale
4th rowmale
5th rowfemale

Common Values

ValueCountFrequency (%)
male843
64.4%
female466
35.6%

Length

2021-11-14T10:05:03.380594image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-14T10:05:03.425937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
male843
64.4%
female466
35.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

age
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct98
Distinct (%)9.4%
Missing263
Missing (%)20.1%
Infinite0
Infinite (%)0.0%
Mean29.88113451
Minimum0.1667
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.4 KiB
2021-11-14T10:05:03.475015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.1667
5-th percentile5
Q121
median28
Q339
95-th percentile57
Maximum80
Range79.8333
Interquartile range (IQR)18

Descriptive statistics

Standard deviation14.4134997
Coefficient of variation (CV)0.4823611933
Kurtosis0.1469499602
Mean29.88113451
Median Absolute Deviation (MAD)8
Skewness0.4076718865
Sum31255.6667
Variance207.7489736
MonotonicityNot monotonic
2021-11-14T10:05:03.654370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2447
 
3.6%
2243
 
3.3%
2141
 
3.1%
3040
 
3.1%
1839
 
3.0%
2534
 
2.6%
2832
 
2.4%
3631
 
2.4%
2730
 
2.3%
2630
 
2.3%
Other values (88)679
51.9%
(Missing)263
 
20.1%
ValueCountFrequency (%)
0.16671
 
0.1%
0.33331
 
0.1%
0.41671
 
0.1%
0.66671
 
0.1%
0.753
 
0.2%
0.83333
 
0.2%
0.91672
 
0.2%
110
0.8%
212
0.9%
37
0.5%
ValueCountFrequency (%)
801
 
0.1%
761
 
0.1%
741
 
0.1%
712
 
0.2%
70.51
 
0.1%
702
 
0.2%
671
 
0.1%
661
 
0.1%
653
0.2%
645
0.4%

sibsp
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4988540871
Minimum0
Maximum8
Zeros891
Zeros (%)68.1%
Negative0
Negative (%)0.0%
Memory size10.4 KiB
2021-11-14T10:05:03.723144image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.041658391
Coefficient of variation (CV)2.088102348
Kurtosis20.0432515
Mean0.4988540871
Median Absolute Deviation (MAD)0
Skewness3.844220343
Sum653
Variance1.085052203
MonotonicityNot monotonic
2021-11-14T10:05:03.775923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0891
68.1%
1319
 
24.4%
242
 
3.2%
422
 
1.7%
320
 
1.5%
89
 
0.7%
56
 
0.5%
ValueCountFrequency (%)
0891
68.1%
1319
 
24.4%
242
 
3.2%
320
 
1.5%
422
 
1.7%
56
 
0.5%
89
 
0.7%
ValueCountFrequency (%)
89
 
0.7%
56
 
0.5%
422
 
1.7%
320
 
1.5%
242
 
3.2%
1319
 
24.4%
0891
68.1%

parch
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.385026738
Minimum0
Maximum9
Zeros1002
Zeros (%)76.5%
Negative0
Negative (%)0.0%
Memory size10.4 KiB
2021-11-14T10:05:03.834162image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum9
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8655602753
Coefficient of variation (CV)2.248052382
Kurtosis21.54107887
Mean0.385026738
Median Absolute Deviation (MAD)0
Skewness3.669078204
Sum504
Variance0.7491945903
MonotonicityNot monotonic
2021-11-14T10:05:03.883070image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
01002
76.5%
1170
 
13.0%
2113
 
8.6%
38
 
0.6%
46
 
0.5%
56
 
0.5%
62
 
0.2%
92
 
0.2%
ValueCountFrequency (%)
01002
76.5%
1170
 
13.0%
2113
 
8.6%
38
 
0.6%
46
 
0.5%
56
 
0.5%
62
 
0.2%
92
 
0.2%
ValueCountFrequency (%)
92
 
0.2%
62
 
0.2%
56
 
0.5%
46
 
0.5%
38
 
0.6%
2113
 
8.6%
1170
 
13.0%
01002
76.5%

ticket
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size10.4 KiB

fare
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct281
Distinct (%)21.5%
Missing1
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean33.29547928
Minimum0
Maximum512.3292
Zeros17
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size10.4 KiB
2021-11-14T10:05:03.944214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.8958
median14.4542
Q331.275
95-th percentile133.65
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.3792

Descriptive statistics

Standard deviation51.75866824
Coefficient of variation (CV)1.5545254
Kurtosis27.02798635
Mean33.29547928
Median Absolute Deviation (MAD)6.9042
Skewness4.367709134
Sum43550.4869
Variance2678.959738
MonotonicityNot monotonic
2021-11-14T10:05:04.019956image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0560
 
4.6%
1359
 
4.5%
7.7555
 
4.2%
2650
 
3.8%
7.895849
 
3.7%
10.535
 
2.7%
7.77526
 
2.0%
7.229224
 
1.8%
7.92523
 
1.8%
26.5522
 
1.7%
Other values (271)905
69.1%
ValueCountFrequency (%)
017
1.3%
3.17081
 
0.1%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43753
 
0.2%
6.451
 
0.1%
6.49583
 
0.2%
6.752
 
0.2%
6.85831
 
0.1%
ValueCountFrequency (%)
512.32924
0.3%
2636
0.5%
262.3757
0.5%
247.52083
0.2%
227.5255
0.4%
221.77924
0.3%
211.55
0.4%
211.33754
0.3%
164.86674
0.3%
153.46253
0.2%

cabin
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct186
Distinct (%)63.1%
Missing1014
Missing (%)77.5%
Memory size10.4 KiB
C23 C25 C27
 
6
G6
 
5
B57 B59 B63 B66
 
5
C22 C26
 
4
C78
 
4
Other values (181)
271 

Length

Max length15
Median length3
Mean length3.738983051
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)36.3%

Sample

1st rowB5
2nd rowC22 C26
3rd rowC22 C26
4th rowC22 C26
5th rowC22 C26

Common Values

ValueCountFrequency (%)
C23 C25 C276
 
0.5%
G65
 
0.4%
B57 B59 B63 B665
 
0.4%
C22 C264
 
0.3%
C784
 
0.3%
D4
 
0.3%
B96 B984
 
0.3%
F44
 
0.3%
F334
 
0.3%
F24
 
0.3%
Other values (176)251
 
19.2%
(Missing)1014
77.5%

Length

2021-11-14T10:05:04.089629image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
f8
 
2.2%
c256
 
1.7%
c236
 
1.7%
c276
 
1.7%
g65
 
1.4%
b595
 
1.4%
b635
 
1.4%
b665
 
1.4%
b575
 
1.4%
b984
 
1.1%
Other values (192)301
84.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

embarked
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.2%
Missing2
Missing (%)0.2%
Memory size10.4 KiB
S
914 
C
270 
Q
123 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowS
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S914
69.8%
C270
 
20.6%
Q123
 
9.4%
(Missing)2
 
0.2%

Length

2021-11-14T10:05:04.148991image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-14T10:05:04.188081image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
s914
69.9%
c270
 
20.7%
q123
 
9.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

boat
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing823
Missing (%)62.9%
Memory size10.4 KiB

body
Real number (ℝ≥0)

MISSING

Distinct121
Distinct (%)100.0%
Missing1188
Missing (%)90.8%
Infinite0
Infinite (%)0.0%
Mean160.8099174
Minimum1
Maximum328
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.4 KiB
2021-11-14T10:05:04.238592image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q172
median155
Q3256
95-th percentile307
Maximum328
Range327
Interquartile range (IQR)184

Descriptive statistics

Standard deviation97.696922
Coefficient of variation (CV)0.6075304534
Kurtosis-1.254052417
Mean160.8099174
Median Absolute Deviation (MAD)88
Skewness0.09173881554
Sum19458
Variance9544.688567
MonotonicityNot monotonic
2021-11-14T10:05:04.311353image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
581
 
0.1%
2851
 
0.1%
1561
 
0.1%
1431
 
0.1%
1201
 
0.1%
3061
 
0.1%
691
 
0.1%
1881
 
0.1%
981
 
0.1%
471
 
0.1%
Other values (111)111
 
8.5%
(Missing)1188
90.8%
ValueCountFrequency (%)
11
0.1%
41
0.1%
71
0.1%
91
0.1%
141
0.1%
151
0.1%
161
0.1%
171
0.1%
181
0.1%
191
0.1%
ValueCountFrequency (%)
3281
0.1%
3271
0.1%
3221
0.1%
3141
0.1%
3121
0.1%
3091
0.1%
3071
0.1%
3061
0.1%
3051
0.1%
3041
0.1%

home.dest
Categorical

HIGH CARDINALITY
MISSING

Distinct369
Distinct (%)49.5%
Missing564
Missing (%)43.1%
Memory size10.4 KiB
New York, NY
64 
London
 
14
Montreal, PQ
 
10
Paris, France
 
9
Cornwall / Akron, OH
 
9
Other values (364)
639 

Length

Max length50
Median length17
Mean length19.16510067
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique217 ?
Unique (%)29.1%

Sample

1st rowSt Louis, MO
2nd rowMontreal, PQ / Chesterville, ON
3rd rowMontreal, PQ / Chesterville, ON
4th rowMontreal, PQ / Chesterville, ON
5th rowMontreal, PQ / Chesterville, ON

Common Values

ValueCountFrequency (%)
New York, NY64
 
4.9%
London14
 
1.1%
Montreal, PQ10
 
0.8%
Paris, France9
 
0.7%
Cornwall / Akron, OH9
 
0.7%
Wiltshire, England Niagara Falls, NY8
 
0.6%
Winnipeg, MB8
 
0.6%
Philadelphia, PA8
 
0.6%
Belfast7
 
0.5%
Sweden Winnipeg, MN7
 
0.5%
Other values (359)601
45.9%
(Missing)564
43.1%

Length

2021-11-14T10:05:04.393915image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ny175
 
7.4%
173
 
7.3%
new123
 
5.2%
york116
 
4.9%
england99
 
4.2%
london44
 
1.8%
pa38
 
1.6%
sweden38
 
1.6%
nj37
 
1.6%
ma34
 
1.4%
Other values (452)1503
63.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-11-14T10:05:02.311980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:00.796092image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.335134image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.674615image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.992170image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.371946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:00.874186image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.408538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.745592image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.061987image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.438031image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:00.949483image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.481822image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.813601image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.131927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.494237image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.019837image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.547126image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.872947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.193399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.547741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.274990image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.613732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:01.935280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-14T10:05:02.256657image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-11-14T10:05:04.457195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-14T10:05:04.551259image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-14T10:05:04.644022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-14T10:05:04.726878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-14T10:05:04.801804image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-14T10:05:02.666454image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-14T10:05:02.819468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-14T10:05:02.927293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-11-14T10:05:03.006663image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

pclasssurvivednamesexagesibspparchticketfarecabinembarkedboatbodyhome.dest
011Allen, Miss. Elisabeth Waltonfemale29.00000024160211.3375B5S2NaNSt Louis, MO
111Allison, Master. Hudson Trevormale0.916712113781151.5500C22 C26S11NaNMontreal, PQ / Chesterville, ON
210Allison, Miss. Helen Lorainefemale2.000012113781151.5500C22 C26SNaNNaNMontreal, PQ / Chesterville, ON
310Allison, Mr. Hudson Joshua Creightonmale30.000012113781151.5500C22 C26SNaN135.0Montreal, PQ / Chesterville, ON
410Allison, Mrs. Hudson J C (Bessie Waldo Daniels)female25.000012113781151.5500C22 C26SNaNNaNMontreal, PQ / Chesterville, ON
511Anderson, Mr. Harrymale48.0000001995226.5500E12S3NaNNew York, NY
611Andrews, Miss. Kornelia Theodosiafemale63.0000101350277.9583D7S10NaNHudson, NY
710Andrews, Mr. Thomas Jrmale39.0000001120500.0000A36SNaNNaNBelfast, NI
811Appleton, Mrs. Edward Dale (Charlotte Lamson)female53.0000201176951.4792C101SDNaNBayside, Queens, NY
910Artagaveytia, Mr. Ramonmale71.000000PC 1760949.5042NaNCNaN22.0Montevideo, Uruguay

Last rows

pclasssurvivednamesexagesibspparchticketfarecabinembarkedboatbodyhome.dest
129930Yasbeck, Mr. Antonimale27.010265914.4542NaNCCNaNNaN
130031Yasbeck, Mrs. Antoni (Selini Alexander)female15.010265914.4542NaNCNaNNaNNaN
130130Youseff, Mr. Geriousmale45.50026287.2250NaNCNaN312.0NaN
130230Yousif, Mr. WazlimaleNaN0026477.2250NaNCNaNNaNNaN
130330Yousseff, Mr. GeriousmaleNaN00262714.4583NaNCNaNNaNNaN
130430Zabour, Miss. Hilenifemale14.510266514.4542NaNCNaN328.0NaN
130530Zabour, Miss. ThaminefemaleNaN10266514.4542NaNCNaNNaNNaN
130630Zakarian, Mr. Mapriededermale26.50026567.2250NaNCNaN304.0NaN
130730Zakarian, Mr. Ortinmale27.00026707.2250NaNCNaNNaNNaN
130830Zimmerman, Mr. Leomale29.0003150827.8750NaNSNaNNaNNaN