How to import various dataset format (Python)

How to import various dataset format (Python)

In this post, I shared easy ways to import the following file formats for use in Python environment for analysis; .csv, .xlsx, .dat, .data, .parquet, .json, html, sqlite, .txt.
## may need to install the following libraries to get started ##
# pip install pandas
# pip install pyarrow
# pip install xlrd
# pip install html5lib


# import parquet file
# first install pyarrow as shown above
import pyarrow.parquet as pq
parquet_df = pd.read_parquet("data.parquet")
parquet_df


# import json file
# data format sample is the picture in display for this post
import json
from pandas.io.json import json_normalize
with open("data.json") as json_data:
    d = json.load(json_data)
json_df = json_normalize(d['contents'], 'monthlySales', ['category', 'region'])
json_df


# import csv file
csv_df = pd.read_csv("data.csv")
csv_df


# import .txt file
text_df = pd.read_csv('data.txt')
text_df


# import excel file
# install the xlrd module as shown above
excel_df = pd.read_excel("data.xls")
excel_df


# import sqlite file
import sqlite3 as sql
connection = sql.connect("database.sqlite") # create your connection.
sqlite_df = pd.read_sql_query("SELECT * FROM table_name", connection)
sqlite_df



# import data from html site
# install html5lib as shown above
html_df = pd.read_html('url')
html_df


# import .dat file 
dat_df = pd.read_table("data.dat")
dat_df


# import .data file
data_df = pd.read_table("data.data")
data_df


"""
Note some data format may need additional parameter to make the come out well, 
such parameter may include but not limited to the following I have had to use:
"""


"https://data.file_format" # from website
delim_whitespace=True # whitespace as delimiter
delimiter="," # delimiter - may be tab also etc
header=None # data has no header and you need to specify column names as shown below
skiprows=number_of_rows_to_skip, # skip certain number of rows
names=['col_1', 'col_2'] # create column names


2