Here’s a Quick Way to Automatically Fetch Data from the SEC Website

Total
3
Shares

One thing I really struggled with was pre-market trading. Not because of lack of edge, my edge in PM is great and 100% defined. The problem was getting a full picture of the fundamentals of the gappers, before and not after the set up was already gone. For my set up, I need to know very quickly what is the market cap of the stock, the burn rate, prices of warrants and a lot more. Going over recent filings and getting those things takes time – time that I didn’t really have. So when time is a problem, usually automating things with coding is the solution. In the following post I will help you create your own fundamentals fetcher that will get you all the fundamentals you need in less than a few seconds.

Python as the programming lanuage

To go over SEC website and scrape fundamental data, I use python as the programming language. For those of you who doesn’t know python at all, I will recommend to go over one of the full python courses in youtube. Learning Python will give you a huge edge in my opinion in the stock market – anywhere from getting data, analysing, backtesting and automating daily routines. In the following guide we will use BeautifulSoup and urllib.request. If you don’t know how to code in Python, there’s no point for you to continue reading.

Handling the SEC website

Every stock have it’s own page in the SEC website. You can find almost all symbols in the following url:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=SYMBOL&type=&dateb=&owner=exclude&count=40

Just replace “SYMBOL” with the symbol you want to get. For example, if you want AAPL, you should simply type in your browser:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAPL&type=&dateb=&owner=exclude&count=40  

You can also get the company by its CIK, but it’s a bit more complicated as you need a data base that matches CIK to stock symbol.

By default you request 40 rows of fillings, but you can replace 40 (after count=) by up to 100 to get 100 rows. Also, by default you exclude form 4’s, you can include it by replacing “exclude” with “include”.

So a query for ACHV, 100 rows with form 4 included will be:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=achv&type=&dateb=&owner=include&count=100

After calling the url of the desired stock, you will want to go over all recent filings and look for 10-K/ 10-Q (or any other file name you need to scrape data from). Note that you will need to handle the case of 20-F, which is the equivalent for foreign companies.

To get the link, you will need to write something like:

Symbol = “TheSymbolYouWant”
url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" + symbol + "&type=&dateb=&owner=exclude&start=0&count=80&output=atom"
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html.parser')
entries = html.findAll("entry")
    
shouldContinue = True
link = ""
for entry in entries:

    if shouldContinue and (entry.find("category")["term"].lower() == "10-k" or entry.find("category")["term"].lower() == "10-q" or entry.find("category")["term"].lower() == "20-f"):
        
        firstUrl = entry.find("link")["href"]

The url can be found in “firstURL” variable.

Then, you will want to get the exact link of the document as can be found here:

To get it, you will need to target the table and grab the first link, as written here:

uClientFirstUrl = uReq(firstUrl)
page_html_firstUrl = uClientFirstUrl.read()
uClientFirstUrl.close()
htmlFirstUrl = soup(page_html_firstUrl, 'html.parser')

tds = htmlFirstUrl.findAll("table")[1].findAll("td")
foundtd = False
for td in tds:
    if foundtd == True:
        link = "https://www.sec.gov" + td.find("a")["href"]
        foundtd = False
    if "xbrl instance" in td.text.lower():
        foundtd = True

    shouldContinue = False

So the whole function, where the symbol is the input, will look like the following:

url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" + symbol + "&type=&dateb=&owner=exclude&start=0&count=80&output=atom"

uClient = uReq(url)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html.parser')
entries = html.findAll("entry")
    
shouldContinue = True
link = ""
for entry in entries:
    if shouldContinue and (entry.find("category")["term"].lower() == "10-k" or entry.find("category")["term"].lower() == "10-q" or entry.find("category")["term"].lower() == "20-f"):

        firstUrl = entry.find("link")["href"]
        uClientFirstUrl = uReq(firstUrl)
        page_html_firstUrl = uClientFirstUrl.read()
        uClientFirstUrl.close()
        htmlFirstUrl = soup(page_html_firstUrl, 'html.parser')

        tds = htmlFirstUrl.findAll("table")[1].findAll("td")
        foundtd = False
        for td in tds:
            if foundtd == True:
                link = "https://www.sec.gov" + td.find("a")["href"]
                foundtd = False
            if "xbrl instance" in td.text.lower():
                foundtd = True

            shouldContinue = False
            
    return link

Where the link variable is the actual link you need for the report. The link should look something like the following:

https://www.sec.gov/Archives/edgar/data/1445283/000156459016028797/pti-20160930.xml

Now you can go ahead and start working on extracting the fundamentals you want from the filing.

Extract data from SEC filings

The link you got in your hand now represents an xml file, with XBRL in it. XBRL is a business reporting language, which means you can find all data inside if you are using the right tags. You will need to do a little digging in order to figure out all the tags you need. Let’s see how can we grab cash of the company for example. Cash is represented by one of the following tags:

us-gaap:CashAndCashEquivalentsAtCarryingValue
ifrs-full:Cash
us-gaap:CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents 
us-gaap:Cash 

So to get cash, you can do something like:

Def getCash(url, symbol):
    uClient = uReq(url)
    page_html = uClient.read()
    uClient.close()
    
    xml = soup(page_html, 'xml')

    cash = xml.findAll("us-gaap:CashAndCashEquivalentsAtCarryingValue")
    if len(cash) == 0:
        cash = xml.findAll("ifrs-full:Cash")
        if len(cash) == 0:
            cash = xml.findAll("us-gaap:CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents")
            if len(cash) == 0:
                cash = xml.findAll("us-gaap:Cash")

    return cash;

Note that the “url” variable we send the function is the url of the xml we extracted earlier. Now the company’s cash should be returned as the “cash” variable. You can easily grab all other stats you need, simply by checking their tag names.

2 comments
Leave a Reply

Your email address will not be published. Required fields are marked *