SP500 prices

Dataset statistics

Number of variables	4
Number of observations	2026
Missing cells	71
Missing cells (%)	0.9%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	267.2 KiB
Average record size in memory	135.1 B

Variable types

Categorical	1
Numeric	2
DateTime	1

Alerts

`DATE` has a high cardinality: 2026 distinct values	High cardinality
`SP500` has 70 (3.5%) missing values	Missing
`DATE` is uniformly distributed	Uniform
`DATE` has unique values	Unique
`Date` has unique values	Unique
`% 1-Day Return` has 71 (3.5%) zeros	Zeros

Reproduction

Analysis started	2022-03-13 23:55:08.555107
Analysis finished	2022-03-13 23:55:09.433427
Duration	0.88 seconds
Software version	pandas-profiling v3.1.0
Download configuration	config.json

DATE
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct	2026
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	132.7 KiB

2014-06-06	1
2019-08-02	1
2019-08-21	1
2019-08-20	1
2019-08-19	1
Other values (2021)	2021

Length

Max length	10
Median length	10
Mean length	10
Min length	10

Characters and Unicode

Total characters	0
Distinct characters	0
Distinct categories	0 ?
Distinct scripts	0 ?
Distinct blocks	0 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2026 ?
Unique (%)	100.0%

Sample

1st row	2014-06-06
2nd row	2014-06-09
3rd row	2014-06-10
4th row	2014-06-11
5th row	2014-06-12

Common Values

Value	Count	Frequency (%)
2014-06-06	1	< 0.1%
2019-08-02	1	< 0.1%
2019-08-21	1	< 0.1%
2019-08-20	1	< 0.1%
2019-08-19	1	< 0.1%
2019-08-16	1	< 0.1%
2019-08-15	1	< 0.1%
2019-08-14	1	< 0.1%
2019-08-13	1	< 0.1%
2019-08-12	1	< 0.1%
Other values (2016)	2016	99.5%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
2014-06-06	1	< 0.1%
2014-07-02	1	< 0.1%
2014-06-13	1	< 0.1%
2014-06-16	1	< 0.1%
2014-06-17	1	< 0.1%
2014-06-18	1	< 0.1%
2014-06-19	1	< 0.1%
2014-06-20	1	< 0.1%
2014-06-23	1	< 0.1%
2014-06-24	1	< 0.1%
Other values (2016)	2016	99.5%

Most occurring characters

Value	Count	Frequency (%)
No values found.

Most occurring categories

Value	Count	Frequency (%)
No values found.

Most frequent character per category

Most occurring scripts

Value	Count	Frequency (%)
No values found.

Most frequent character per script

Most occurring blocks

Value	Count	Frequency (%)
No values found.

Most frequent character per block

SP500
Real number (ℝ_≥0)

MISSING

Distinct	1945
Distinct (%)	99.4%
Missing	70
Missing (%)	3.5%
Infinite	0
Infinite (%)	0.0%
Mean	2801.088016

Minimum	1829.08
Maximum	4796.56
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	16.0 KiB

Quantile statistics

Minimum	1829.08
5-th percentile	1961.0275
Q1	2108.905
median	2670.025
Q3	3130.0375
95-th percentile	4456.255
Maximum	4796.56
Range	2967.48
Interquartile range (IQR)	1021.1325

Descriptive statistics

Standard deviation	776.9689512
Coefficient of variation (CV)	0.2773811271
Kurtosis	-0.05808348742
Mean	2801.088016
Median Absolute Deviation (MAD)	548.6
Skewness	0.9570039718
Sum	5478928.16
Variance	603680.7511
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
2066.66	2	0.1%
2439.07	2	0.1%
2783.02	2	0.1%
2268.9	2	0.1%
2926.46	2	0.1%
2373.47	2	0.1%
2095.84	2	0.1%
2102.31	2	0.1%
2080.15	2	0.1%
2723.06	2	0.1%
Other values (1935)	1936	95.6%
(Missing)	70	3.5%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
1829.08	1	< 0.1%
1851.86	1	< 0.1%
1852.21	1	< 0.1%
1853.44	1	< 0.1%
1859.33	1	< 0.1%
1862.49	1	< 0.1%
1862.76	1	< 0.1%
1864.78	1	< 0.1%
1867.61	1	< 0.1%
1868.99	1	< 0.1%

Value	Count	Frequency (%)
4796.56	1	< 0.1%
4793.54	1	< 0.1%
4793.06	1	< 0.1%
4791.19	1	< 0.1%
4786.35	1	< 0.1%
4778.73	1	< 0.1%
4766.18	1	< 0.1%
4726.35	1	< 0.1%
4725.79	1	< 0.1%
4713.07	1	< 0.1%

% 1-Day Return
Real number (ℝ)

ZEROS

Distinct	1955
Distinct (%)	96.5%
Missing	1
Missing (%)	< 0.1%
Infinite	0
Infinite (%)	0.0%
Mean	0.04397695168

Minimum	-11.98405028
Maximum	9.38276571
Zeros	71
Zeros (%)	3.5%
Negative	892
Negative (%)	44.0%
Memory size	16.0 KiB

Quantile statistics

Minimum	-11.98405028
5-th percentile	-1.630657564
Q1	-0.309931843
median	0.03426822371
Q3	0.5022110835
95-th percentile	1.468397098
Maximum	9.38276571
Range	21.36681599
Interquartile range (IQR)	0.8121429265

Descriptive statistics

Standard deviation	1.094093047
Coefficient of variation (CV)	24.8787832
Kurtosis	19.61983268
Mean	0.04397695168
Median Absolute Deviation (MAD)	0.4114931522
Skewness	-0.6434118056
Sum	89.05332716
Variance	1.197039595
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	71	3.5%
1.301700683	1	< 0.1%
-0.0506081527	1	< 0.1%
0.8246825558	1	< 0.1%
-0.7914764079	1	< 0.1%
1.210587535	1	< 0.1%
1.442618345	1	< 0.1%
0.2464268112	1	< 0.1%
-2.929276361	1	< 0.1%
1.476202861	1	< 0.1%
Other values (1945)	1945	96.0%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
-11.98405028	1	< 0.1%
-9.511268047	1	< 0.1%
-7.596968076	1	< 0.1%
-5.894412157	1	< 0.1%
-5.183082331	1	< 0.1%
-4.886841092	1	< 0.1%
-4.416327867	1	< 0.1%
-4.414239783	1	< 0.1%
-4.335952253	1	< 0.1%
-4.097924428	1	< 0.1%

Value	Count	Frequency (%)
9.38276571	1	< 0.1%
9.287119453	1	< 0.1%
7.033130412	1	< 0.1%
6.241416084	1	< 0.1%
5.995482224	1	< 0.1%
4.959380715	1	< 0.1%
4.939633578	1	< 0.1%
4.603922524	1	< 0.1%
4.220259242	1	< 0.1%
3.90338454	1	< 0.1%

Date
Date

UNIQUE

Distinct	2026
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	16.0 KiB

Minimum	2014-06-06 00:00:00
Maximum	2022-03-11 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

SP500
% 1-Day Return

% 1-Day Return
SP500

% 1-Day Return
SP500

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

First rows

	DATE	SP500	% 1-Day Return	Date
0	2014-06-06	1949.44	NaN	2014-06-06
1	2014-06-09	1951.27	0.093873	2014-06-09
2	2014-06-10	1950.79	-0.024599	2014-06-10
3	2014-06-11	1943.89	-0.353703	2014-06-11
4	2014-06-12	1930.11	-0.708888	2014-06-12
5	2014-06-13	1936.16	0.313454	2014-06-13
6	2014-06-16	1937.78	0.083671	2014-06-16
7	2014-06-17	1941.99	0.217259	2014-06-17
8	2014-06-18	1956.98	0.771889	2014-06-18
9	2014-06-19	1959.48	0.127748	2014-06-19

Last rows

	DATE	SP500	% 1-Day Return	Date
2016	2022-02-28	4373.94	-0.244261	2022-02-28
2017	2022-03-01	4306.26	-1.547346	2022-03-01
2018	2022-03-02	4386.54	1.864263	2022-03-02
2019	2022-03-03	4363.49	-0.525471	2022-03-03
2020	2022-03-04	4328.87	-0.793402	2022-03-04
2021	2022-03-07	4201.09	-2.951810	2022-03-07
2022	2022-03-08	4170.70	-0.723384	2022-03-08
2023	2022-03-09	4277.88	2.569832	2022-03-09
2024	2022-03-10	4259.52	-0.429185	2022-03-10
2025	2022-03-11	4204.31	-1.296155	2022-03-11

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Interactions

Correlations

Spearman's ρ

Pearson's r

Kendall's τ

Phik (φk)

Missing values

Sample

First rows

Last rows