Overview

Dataset statistics

Number of variables4
Number of observations2026
Missing cells71
Missing cells (%)0.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory267.2 KiB
Average record size in memory135.1 B

Variable types

Categorical1
Numeric2
DateTime1

Alerts

DATE has a high cardinality: 2026 distinct values High cardinality
SP500 has 70 (3.5%) missing values Missing
DATE is uniformly distributed Uniform
DATE has unique values Unique
Date has unique values Unique
% 1-Day Return has 71 (3.5%) zeros Zeros

Reproduction

Analysis started2022-03-13 23:55:08.555107
Analysis finished2022-03-13 23:55:09.433427
Duration0.88 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

DATE
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct2026
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size132.7 KiB
2014-06-06
 
1
2019-08-02
 
1
2019-08-21
 
1
2019-08-20
 
1
2019-08-19
 
1
Other values (2021)
2021 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2026 ?
Unique (%)100.0%

Sample

1st row2014-06-06
2nd row2014-06-09
3rd row2014-06-10
4th row2014-06-11
5th row2014-06-12

Common Values

ValueCountFrequency (%)
2014-06-061
 
< 0.1%
2019-08-021
 
< 0.1%
2019-08-211
 
< 0.1%
2019-08-201
 
< 0.1%
2019-08-191
 
< 0.1%
2019-08-161
 
< 0.1%
2019-08-151
 
< 0.1%
2019-08-141
 
< 0.1%
2019-08-131
 
< 0.1%
2019-08-121
 
< 0.1%
Other values (2016)2016
99.5%

Length

2022-03-13T19:55:09.498425image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2014-06-061
 
< 0.1%
2014-07-021
 
< 0.1%
2014-06-131
 
< 0.1%
2014-06-161
 
< 0.1%
2014-06-171
 
< 0.1%
2014-06-181
 
< 0.1%
2014-06-191
 
< 0.1%
2014-06-201
 
< 0.1%
2014-06-231
 
< 0.1%
2014-06-241
 
< 0.1%
Other values (2016)2016
99.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

SP500
Real number (ℝ≥0)

MISSING

Distinct1945
Distinct (%)99.4%
Missing70
Missing (%)3.5%
Infinite0
Infinite (%)0.0%
Mean2801.088016
Minimum1829.08
Maximum4796.56
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.0 KiB
2022-03-13T19:55:09.569160image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1829.08
5-th percentile1961.0275
Q12108.905
median2670.025
Q33130.0375
95-th percentile4456.255
Maximum4796.56
Range2967.48
Interquartile range (IQR)1021.1325

Descriptive statistics

Standard deviation776.9689512
Coefficient of variation (CV)0.2773811271
Kurtosis-0.05808348742
Mean2801.088016
Median Absolute Deviation (MAD)548.6
Skewness0.9570039718
Sum5478928.16
Variance603680.7511
MonotonicityNot monotonic
2022-03-13T19:55:09.646586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2066.662
 
0.1%
2439.072
 
0.1%
2783.022
 
0.1%
2268.92
 
0.1%
2926.462
 
0.1%
2373.472
 
0.1%
2095.842
 
0.1%
2102.312
 
0.1%
2080.152
 
0.1%
2723.062
 
0.1%
Other values (1935)1936
95.6%
(Missing)70
 
3.5%
ValueCountFrequency (%)
1829.081
< 0.1%
1851.861
< 0.1%
1852.211
< 0.1%
1853.441
< 0.1%
1859.331
< 0.1%
1862.491
< 0.1%
1862.761
< 0.1%
1864.781
< 0.1%
1867.611
< 0.1%
1868.991
< 0.1%
ValueCountFrequency (%)
4796.561
< 0.1%
4793.541
< 0.1%
4793.061
< 0.1%
4791.191
< 0.1%
4786.351
< 0.1%
4778.731
< 0.1%
4766.181
< 0.1%
4726.351
< 0.1%
4725.791
< 0.1%
4713.071
< 0.1%

% 1-Day Return
Real number (ℝ)

ZEROS

Distinct1955
Distinct (%)96.5%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.04397695168
Minimum-11.98405028
Maximum9.38276571
Zeros71
Zeros (%)3.5%
Negative892
Negative (%)44.0%
Memory size16.0 KiB
2022-03-13T19:55:09.727772image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-11.98405028
5-th percentile-1.630657564
Q1-0.309931843
median0.03426822371
Q30.5022110835
95-th percentile1.468397098
Maximum9.38276571
Range21.36681599
Interquartile range (IQR)0.8121429265

Descriptive statistics

Standard deviation1.094093047
Coefficient of variation (CV)24.8787832
Kurtosis19.61983268
Mean0.04397695168
Median Absolute Deviation (MAD)0.4114931522
Skewness-0.6434118056
Sum89.05332716
Variance1.197039595
MonotonicityNot monotonic
2022-03-13T19:55:09.801268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
071
 
3.5%
1.3017006831
 
< 0.1%
-0.05060815271
 
< 0.1%
0.82468255581
 
< 0.1%
-0.79147640791
 
< 0.1%
1.2105875351
 
< 0.1%
1.4426183451
 
< 0.1%
0.24642681121
 
< 0.1%
-2.9292763611
 
< 0.1%
1.4762028611
 
< 0.1%
Other values (1945)1945
96.0%
ValueCountFrequency (%)
-11.984050281
< 0.1%
-9.5112680471
< 0.1%
-7.5969680761
< 0.1%
-5.8944121571
< 0.1%
-5.1830823311
< 0.1%
-4.8868410921
< 0.1%
-4.4163278671
< 0.1%
-4.4142397831
< 0.1%
-4.3359522531
< 0.1%
-4.0979244281
< 0.1%
ValueCountFrequency (%)
9.382765711
< 0.1%
9.2871194531
< 0.1%
7.0331304121
< 0.1%
6.2414160841
< 0.1%
5.9954822241
< 0.1%
4.9593807151
< 0.1%
4.9396335781
< 0.1%
4.6039225241
< 0.1%
4.2202592421
< 0.1%
3.903384541
< 0.1%

Date
Date

UNIQUE

Distinct2026
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size16.0 KiB
Minimum2014-06-06 00:00:00
Maximum2022-03-11 00:00:00
2022-03-13T19:55:09.879054image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-13T19:55:09.959746image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2022-03-13T19:55:08.977989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-13T19:55:08.809198image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-13T19:55:09.056402image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-03-13T19:55:08.903142image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-03-13T19:55:10.020628image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-03-13T19:55:10.081017image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-03-13T19:55:10.141051image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-03-13T19:55:10.205983image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-03-13T19:55:09.184844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-03-13T19:55:09.276248image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-03-13T19:55:09.353297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-03-13T19:55:09.397619image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

DATESP500% 1-Day ReturnDate
02014-06-061949.44NaN2014-06-06
12014-06-091951.270.0938732014-06-09
22014-06-101950.79-0.0245992014-06-10
32014-06-111943.89-0.3537032014-06-11
42014-06-121930.11-0.7088882014-06-12
52014-06-131936.160.3134542014-06-13
62014-06-161937.780.0836712014-06-16
72014-06-171941.990.2172592014-06-17
82014-06-181956.980.7718892014-06-18
92014-06-191959.480.1277482014-06-19

Last rows

DATESP500% 1-Day ReturnDate
20162022-02-284373.94-0.2442612022-02-28
20172022-03-014306.26-1.5473462022-03-01
20182022-03-024386.541.8642632022-03-02
20192022-03-034363.49-0.5254712022-03-03
20202022-03-044328.87-0.7934022022-03-04
20212022-03-074201.09-2.9518102022-03-07
20222022-03-084170.70-0.7233842022-03-08
20232022-03-094277.882.5698322022-03-09
20242022-03-104259.52-0.4291852022-03-10
20252022-03-114204.31-1.2961552022-03-11