Advanced in Financial machine learning : Part 2- Chapter 2: financial data structure

2.2 Essential types of financial data

4 types of data:

Fundamental dataMarket dataAnalytics dataAlternative data
AssetsPriceAnalyst recommendCCTV image
LiabilitiesVolumeCredit ratingsGoogle searches
SalesDividendEarning expectationTwitter Chats
EarningsOpen InterestNews sentimentMeta data
MacroQuotes

2.2.1 Fundamental data

Definition

Fundamental data encompasses information that can be found in regulatory filings and business analytics.

Characteristics

  1. The available time of the data is not at the end of period.
    • The first quarter financial report of AAPL is not publised on 01-Apr of the year, so if we set the available time of the fundamental data at the end of the first quarter, we might be involved in using future data and cause issue in future test and real trading
  2. Backfilled and reinstate of fundamental data
    • Some of the fundamental data might be missing or wrong at its initial release, so in the future, the missing data might be filled and wrong data might be corrected. If we still assign the available time of these data the same as rest of the data, we might be involved in using future data.
  3. Low frequent hence little value remained unexploited
    • Being so accessible to the marketplace, it is rather unlikely that there is much value left to be exploited. Still, it may be useful in combination with other data types.

Market Data

Definition

Market data includes all trading activity that takes place in an exchange (like CME) or trading venue (like MarketAxess).

Characteristics

  1. Data provider has given you a raw feed, which could be difficult to process
    • Data provider has given you a raw feed, with all sorts of unstructured information, like FIX messages that allow you to fully reconstruct the trading book, or the full collection of BWIC (bids wanted in competition) responses
  2. Market Data could be very abundant and storage could be a problem.

Analytic Data

Definition

You could think of analytics as derivative data, based on an original source, which could be fundamental, market, alternative, or even a collection of other analytics.

Characteristics

  1. The negative aspects are that analytics may be costly, the methodology used in their production may be biased or opaque, and you will not be the sole consumer.

Alternative Data

Definition

  1. Alternative data was differentiated by:
    • Produced by individual ( social media, news etc.)
    • Business process ( transaction, corporate data, government agency etc.)
    • sensors( satellites, geolocation, weather etc.)

Characteristics

  1. alternative data is that it is primary information, that is, information that has not made it to the other sources.
  2. Two problematic aspects of alternative data
    • Cost
    • Privacy

随笔

Huge Depression.

上面两个词是我过去几个月最真实的写照。看到太多厉害的人,能赚很多钱的人,能做出很厉害的科研成果的人,会很多乐器的人,很有领导力的人,很有运动天赋的人。

回国头来看我自己,我觉得我自己没有什么拿得出手的本事。

曾经有一段时间自己特别消极,觉得自己什么都不会做,什么都做不好。但是当我真正沉静下来,我觉得带着这样一种消极的心态做事,是不可能让自己有任何的进步的。曾经的我是害怕面对失败的,因此本能地逃避自己做不到地事情,自己不可能做成的事情。

我一直都在害怕,害怕别人的拒绝,害怕别人的批评,害怕被人的离开,害怕别人的不肯定,害怕自己的失败。

但是我到底在害怕一些什么呢?这些东西到底有什么值得害怕的呢?为什么我会这么在意别人的眼光,我到底为什么会活得这么得憋屈?

就在这一刻,我突然意识到,人不能给自己贴上这样的标签。我不能在还没开始做一件事情就对自己说我不行,就害怕失败,放弃那份冲劲。这可能是我最缺少,也是我现在最需要得到的东西了吧。以后不管做什么事情,永远是你能行,即使失败了一次,两次,我也绝对不可能就这样倒下。

What does not kill me , only makes me stronger. 所有杀不死的我的,都会让我变得更加强大。

Question 673. Number of Longest Increasing Subsequence

Link

https://leetcode.com/problems/number-of-longest-increasing-subsequence/

My code:

class Solution:
    def findNumberOfLIS(self, nums: List[int]) -> int:
        if len(nums) ==0: return 0
        if len(nums) ==1: return 1
        #The list contains the maximum length end up with ith element in the nums
        ResultList = [[1,1]]
        #The dictionary provides the number of list given in the list
        for i in range(1,len(nums)):
            IndexList = []
            Length =1
            Num = 0
            for j in range(i):
                if nums[i]>nums[j]:
                    Length = max(Length,ResultList[j][0]+1)
                    IndexList.append(j)
            if Length != 1:
                for k in IndexList:
                    if ResultList[k][0] == Length -1:
                        Num += ResultList[k][1]
            if Num == 0:
                Num = 1
            ResultList.append([Length,Num])
        LengthList = [ResultList[i][0] for i in range(len(ResultList))]
        MaxLength = max(LengthList)
        Sum  = 0
        for s in range(len(ResultList)):
            if ResultList[s][0] == MaxLength:
                Sum += ResultList[s][1]
        return Sum

Explanation: There are two main problems in this question:1. How to determine the maximum length 2. How to count all the subsequence with maximum length. Here is the way I solve it. I construct a list L such that L[i] is a list that contains two elements. First element is the maximum length that ends up with element nums[i] and second element is the number of such subsequences. The recursion relation is as follows: for i+1 th element, we first check if it is larger than the chosen previous element. If so, then we know that the length of new List would be the maximum length of subsequence that ends up with chosen previous element plus one. Iterate through the beginning we may get the maximum length that ends up with i+1 th element. Second is to determine the number of the elements that ends up with i+1 th element and equals to the maximum length. So I do the iteration again to find total number. Notice that I initialise the length as 1 and Num as 0 because there is a situation that the i+1 th element might be smaller than all previous elements. In this situation , the maximum length would be one as itself. and the number would be 1 as just itself. Hence we iterate throught the nums list. Then we first find out the maximum length. Then we sum all the numbers that gives the maximum length. Then we return the result.

 

Python Package Archive

Introduction

This article is to record all the packages and underlying function I use or will use in the quantitative trading.

Packages

1. Numpy

The whole information of this package can be found at https://numpy.org/ . This is the fundamental packages for scientific computing.

2. Statsmodel

The whole information of this package can be found at https://www.statsmodels.org/stable/index.html. This is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

3. Pandas

 

4. Tensorflow / Keras

Question115 Distinct subsequences

Link:

 

Code:

class Solution:
    def numDistinct(self, s: str, t: str) -> int:
        IndexDict = dict()
        M = len(s)
        N = len(t)
        def UniqueSubsequence(i,j):
            if (i,j) in IndexDict:
                return IndexDict[i,j]
            else:
                if i == M or j == N or M-i<N-j:
                    return int(j == N)
                else:
                    if (i,j) in IndexDict:
                        return IndexDict[i,j]
                    else:
                        ans = UniqueSubsequence(i+1,j)
                        if s[i] == t[j]:
                            ans += UniqueSubsequence(i+1,j+1)
                        IndexDict[i,j] = ans
                        return ans
        
                        
                        
        return UniqueSubsequence(0,0)

 

Question 1423. Maximum Points You Can Obtain from Cards

Link

https://leetcode.com/problems/maximum-points-you-can-obtain-from-cards/

My Code:

class Solution:
    def maxScore(self, cardPoints: List[int], k: int) -> int:
        if k>= len(cardPoints):
            return sum(cardPoints)
        else:
            RestLength = len(cardPoints) - k
            TotalSum = sum(cardPoints)
            PartSum = sum(cardPoints[0:RestLength])
            Min =PartSum
            for new in range(RestLength,len(cardPoints)):
                old = new -RestLength
                PartSum= PartSum - cardPoints[old]+cardPoints[new]
                Min = min(Min,PartSum)
            return TotalSum - Min

Explanation: The idea is clear , no matter which side you choose, the list left must be continuous and the length is total length – k. Then we just need to write an iteration to calculate minimum of the list. Notice that if we use sum function here, it will exceed time limit as sum function also originates from the iteration.

Question 120. Triangle

Link:

https://leetcode.com/problems/triangle/

My Code:

class Solution:
    def minimumTotal(self, triangle: List[List[int]]) -> int:
        TriangleLength = len(triangle)
        MinTri = [triangle[0]]
        for i in range(1,TriangleLength):
            CheckLen = len(triangle[i])
            MinList = []
            for j in range(CheckLen):
                if j == 0:
                    MinList.append(triangle[i][0]+MinTri[i-1][0])
                elif j == (CheckLen -1):
                    MinList.append(triangle[i][CheckLen-1]+MinTri[i-1][CheckLen-2])
                else:
                    Min = min(MinTri[i-1][j],MinTri[i-1][j-1])+ triangle[i][j]
                    MinList.append(Min)
            MinTri.append(MinList)
        return min(MinTri[TriangleLength-1])

Explanation: The idea is clear, for each element in the triangle, we search for the minimum path number among all the available previous element, then take the minimum to get the minimum path sum at this point.

Question 300. Longest Increasing Subsequence

Link

https://leetcode.com/problems/longest-increasing-subsequence/

My Code:

class Solution:
    def lengthOfLIS(self, nums: List[int]) -> int:
        if len(nums) ==0:
            return 0
        else:
            ResultList = [1]
            for i in range(1,len(nums)):
                Max = 1
                for j in range(i):
                    if nums[i]>nums[j]:
                        Max = max(Max,ResultList[j]+1)
                ResultList.append(Max)
            return max(ResultList)

Explanation: we set the ResultList as the longest subsequence up to element k in the sequence. Then for next element, we just need to check if it is larger than the previous elements. If it is larger, then we just append the element to the previous subsequence, corresponding to the code ResultList[j] +1, then we take the Max among all the possible choices. if it is smaller then all previous elements, then we set the number as 1 in the list.

Then we just need to return the maximum among all the result in the list.

Question 494. Target Sum

Link:

 

My Code:

class Solution:
    def findTargetSumWays(self, nums: List[int], S: int) -> int:
        LargestSum = sum(nums)
        SmallestSum = -LargestSum
        if LargestSum<S or SmallestSum>S:
            return 0
        else:
            ResultMat = []
            for l in range(len(nums)):
                List = [0]*(LargestSum*2+1)
                ResultMat.append(List)
            ResultMat[0][nums[0]] = ResultMat[0][nums[0]]+1
            ResultMat[0][-nums[0]] = ResultMat[0][-nums[0]]+1
            for i in range(1,len(nums)):
                for j in range(-LargestSum,LargestSum+1,1):
                    Index1 = j-nums[i]
                    Index2 = j+nums[i]
                    if Index1 <-LargestSum:
                        Value1 =0
                    else:
                        Value1 = ResultMat[i-1][Index1]
                    if Index2> LargestSum:
                        Value2 = 0
                    else:
                        Value2 = ResultMat[i-1][Index2]
                    Value = Value1+Value2
                    ResultMat[i][j] = Value
            return ResultMat[len(nums)-1][S]

Explanation: I use a two dimension matrix to store the data. The row is the ith element of the original list, the column is the Sum j and the value stored in the matrix[i][j] is the total number that can achieve value j at the ith index. Then to for i+1 Index, the only way to achieve the sum j is that we have achieved j+ nums[i+1] or j-nums[i+1] at the index i. Hence we just sum these two values together to get the total number that can achieve j at i+1.

Question 279. Perfect Squares

Link:

https://leetcode.com/problems/perfect-squares/

My Code:

class Solution:
    def numSquares(self, n: int) -> int:
        if n ==1:
            return 1
        else:
            NumList = [0]
            for Number in range(1,n+1):
                Min = 2**31
                SquareNum = math.floor(math.sqrt(Number))
                for j in range(1,(SquareNum+1)):
                    Min = min(Min,NumList[Number-j**2])
                NumList.append((Min+1))
            return NumList[n]

Explanation

It is clear that we for number a , we may check the number squares the a-k is made of and then take the minimum among them. A typical dynamic programming problem.