2017년 8월 26일 토요일

Developing ElliottBrowser for Quandl.Com - 1


Downloading / Storing Chart Data Incrementally - 1


    If we get chart data locally, i.e., from a local file system or a database, we probably do not have to worry about performance. However, since ElliottBrowser for Quandl.Com fetches necessary data for charting from the Quandl's web sites, the size of data being requested is a big problem.

    Assuming we need a daily chart covering the last 3 years, should ElliottBrowser download 600 or more records every time it connect? What if we need the data not once but very frequently?

▷ Functional Design


    1.  JSON rather than XML - Quandl.Com supports both, but we use JSON.   ※ Note: JSON vs XML

    2.  Incremental Download - For a given stock, its chart data once downloaded, the next time ElliottBrowser connects to Quandl it downloads only the latest records which need to be updated or added.

    3. Database essential to Incremental Download - The downloaded data is stored in a database as a reliable data store.


▷ Some Technical Specifics


    Let's take a look at the Quandl.Com's JSON response format for chart data requests. ※ Note: Check the HTTP request URL format for Quandl.Com


Fig 1. Quandl's JSON Response Format for Chart Data

    We need to transform the JSON response string into a data type that fits our purpose. Of course, our goal is to store the time series data contained in the string in a database or display it as a chart with some additional analysis data.



    1. Deserializing JSON string using Json.NET

    We will use one of JsonSerializer.Deserialize methods, which deserializes the JSON structure contained by a reader into an instance of the specified type. See the API document.

 public Object Deserialize(
  JsonReader reader,
  Type objectType
 )

    Let's define a data type representing the JSON string returned by Quandl. As shown in Fig 1, the whole response string is a set of only one element which is actually a key-value pair with key 'dataset_data', value part in turn being a set containing the time-series 'data'. We map the root-level set into 'QuandlDataWrapper' type, and the value part of the key-value pair into 'QuandlData' type. See below.

    public class QuandlDataWrapper
    {
        public QuandlData dataset_data
        {
            get { return _dataset_data; }
            set { _dataset_data = value; }
        }

        private QuandlData _dataset_data;
    }


    public class QuandlData
    {
        public string limit
        {
            get { return _limit; }
            set { _limit = value; }
        }

        public string transform
        {
            get { return _transform; }
            set { _transform = value; }
        }

        public int[] column_index
        {
            get { return _column_index; }
            set { _column_index = value; }
        }

        public string[] column_names
        {
            get { return _column_names; }
            set { _column_names = value; }
        }

        public string start_date
        {
            get { return _start_date; }
            set { _start_date = value; }
        }

        public string end_date
        {
            get { return _end_date; }
            set { _end_date = value; }
        }

        public string frequency
        {
            get { return _frequency; }
            set { _frequency = value; }
        }
        
        //
        // The property representing the time-series which is labeled with 'data'
        // "data":
        // [
        //  ["2017-08-23",159.07,160.47,158.88,159.98,19198189.0],
        //  ["2017-08-22",158.23,160.0,158.02,159.78,21297812.0],
        //  ["2017-08-21",157.5,157.89,155.1101,157.21,26145653.0]
        // ],
        //
        public object[] data
        {
            get { return _data; }
            set { _data = value; }
        }
        
        public string collapse
        {
            get { return _collapse; }
            set { _collapse = value; }
        }

        public string order
        {
            get { return _order; }
            set { _order = value; }
        }

        private string _limit;
        private string _transform;
        private int[] _column_index;
        private string[] _column_names;
        private string _start_date;
        private string _end_date;
        private string _frequency;
        private object[] _data;
        private string _collapse;
        private string _order;
    }


 
    The 'data' property of QuandlData represents the time-series chart data and currently is of type array of objects, each element of which itself an array of mixed types, i.e., a string for a date and multiple numbers.

    These two type definitions, QuandlDataWrapper and QuandlDate, can fulfill our deserialization task of Quandl.Com's JSON string. But in runtime, ElliottBrowser must convert the time-series object, actually of JArray type, into an array of strings to iterate and get the numbers which correspond to a quote data of a specific date. By defining a simple JsonConverter and a type, let's say 'TimeSeries', we can avoid such a hassle by changing the type of property 'data' from array of objects to array of TimeSeries'.

    [JsonConverter(typeof(TimeSeriesConverter))]
    public class TimeSeries
    {
        public string Date { get; set; }
        public double[] Values { get; set; }
    }

    class TimeSeriesConverter : JsonConverter
    {
        public override bool CanConvert(Type objectType)
        {
            return (objectType == typeof(TimeSeries));
        }

        public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
        {
            JArray ja = JArray.Load(reader);
            TimeSeries ts = new TimeSeries();
            ts.Date = (string)ja[0]; ja.RemoveAt(0);
            ts.Values = ja.Select(jv => (double)jv).ToArray();
            return ts;
        }

        public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
        {
            JArray ja = new JArray();
            TimeSeries ts = (TimeSeries)value;
            ja.Add(ts.Date);
            ja.Add(ts.Values);
            ja.WriteTo(writer);
        }
    }


 
    Since defining the required data types is over, let's get our feet wet. First, we deserialize the response string,

    public static object GetJSONObjects(string url, Type t)
    {
        WebRequest request = default(WebRequest);
        WebResponse response = default(WebResponse);
        object result = null;
        JsonSerializer s = new JsonSerializer();

        request = WebRequest.Create(url);
        response = request.GetResponse();

        if (((HttpWebResponse)response).StatusDescription != "OK")
        {
            Console.WriteLine(((HttpWebResponse)response).StatusDescription);
        }

        using (Stream dataStream = response.GetResponseStream())
        {
            using (StreamReader reader = new StreamReader(dataStream))
            {
                using (JsonReader jrdr = new JsonTextReader(reader))
                {
                    result = s.Deserialize(jrdr, Type.GetType(t.FullName));
                }
            }
        }

        response.Close();
        return result;
    }


    and, store time-series data to a DataTable object.

    public QuandlError GetCandlesTable(string fstr, string end_date, out DataTable dic, string tname = "")
    {
        QuandlError err = default(QuandlError);
        QuandlDataWrapper result = default(QuandlDataWrapper);
        DataRow r = default(DataRow);
        string req_url = string.Format(fstr, end_date);
        int ccnt = 0, rcnt = 0;
        dic = _candles_table(tname);

        try
        {
            result = (QuandlDataWrapper)WnFElliottBrowser.GetJSONObjects(req_url, typeof(QuandlDataWrapper));
        }
        catch (WebException ex)
        {
            int status = (int)((HttpWebResponse)ex.Response).StatusCode;
            err = GetQuandlError(ex);
            Console.WriteLine("Exception at QuandlAPI.GetCandlesTable() GetJSONObjects returned {0}", status);
        }

        rcnt += 1;
        ccnt = result.dataset_data.data.Length;
        for (int i = result.dataset_data.data.Length-1; i>= 0; i--)
        {
            TimeSeries v = result.dataset_data.data[i];
            r = dic.NewRow();
            r["DateTime"] = v.Date.Replace("-", "/");
            r["Open"] = v.Values[0];
            r["High"] = v.Values[1];
            r["Low"] = v.Values[2];
            r["Close"] = v.Values[3];
            r["Volume"] = v.Values[4];
            dic.Rows.Add(r);
        }

        result = null;
        return err;
    }

    private DataTable _candles_table(string tname = "")
    {
        DataColumn col;
        DataTable dohlcv = new DataTable();
        if (string.IsNullOrEmpty(tname))
        {
            col = new DataColumn("row_num", typeof(Int32));
            col.AutoIncrement = true;
            col.AutoIncrementSeed = 0;
            dohlcv.Columns.Add(col);
        }
        else
            dohlcv.TableName = tname;

        dohlcv.Columns.Add("DateTime", typeof(string));
        dohlcv.Columns.Add("Open", typeof(double));
        dohlcv.Columns.Add("High", typeof(double));
        dohlcv.Columns.Add("Low", typeof(double));
        dohlcv.Columns.Add("Close", typeof(double));
        dohlcv.Columns.Add("Volume", typeof(double));
        dohlcv.PrimaryKey = new DataColumn[] { dohlcv.Columns["DateTime"] };
        return dohlcv;
    }



    ※ Next blog post will cover database-related content.

 

댓글 없음:

댓글 쓰기