Stupid Date Formats…

I have the annoying task of converting a text file into data (sufficiently annoying I am motivated to create my long delayed blog!).

Within the same file there were two date entries:

  • Mon Oct 06 05:19:38 EDT 2014
  • 201486

The first is simple enough, though as quick search will show that EDT could refer to either Eastern Daylight Time (which it does in this case) or  Australian Eastern Daylight Time. Sigh. Fortunately in my case I only need the date but still annoying to start with.

Now the second case…

It is in yyyyMd format. No leading zeros and no separator. Why on Earth anyone would use this format is beyond me, however I get no choice but to deal with it.

So a little data analysis.

I fortunately don’t need to deal with date prior to 1000 AD, so first 4 characters is definitely the year and I can deal with those instantly.

If it is 6 or 8 character string then we are done (either yyyyMd or yyyyMMdd) both of which are simple.

So onto the hard case 7 characters long. This could either by yyyyMMd or yyyyMdd.

  • Two character months are only October (10), November (11) or December (12).
  • This again give us a nice short cut as if the fifth character is not a 1 then we must be in yyyyMdd format and we are done.
  • If the sixth character is not a 0, 1, 2 then it cannot be yyyyMMd format so must be yyyyMdd
  • As we know that the day will not have a leading zero if the sixth character is a zero then we know we are in yyyyMMd format
  • One more small case we can eliminate is if the last character is a 0. As there is no 0th day in the month we know it must be the 10th or 20th, i.e. we are in yyyyMdd format.

This leaves 18 cases which we cannot distinguish:

  • January 11th – 19th versus November 1st – 9th
  • January 21st – 29th versus December 1st – 9th

In case I was dealing with I can apply some heuristics to try and and guess which one is correct. But what a pain in the neck.

FWIW C# to decode below:

        public static DateTime ConvertYYYYMD(string dateString)
        {
            if (dateString.Length < 6) return DateTime.MinValue;
            if (dateString.Length > 8) return DateTime.MinValue;

            int year = int.Parse(dateString.Substring(0, 4));
            if (dateString.Length == 6)
            {
                return new DateTime(
                    year,
                    int.Parse(dateString.Substring(4, 1)),
                    int.Parse(dateString.Substring(5, 1)));
            }

            if (dateString.Length == 8)
            {
                return new DateTime(
                    year,
                    int.Parse(dateString.Substring(4, 2)),
                    int.Parse(dateString.Substring(6, 2)));
            }

            if (dateString[4] != '1' || !dateString[5].In('0', '1', '2') || dateString[6] == '0')
            {
                return new DateTime(
                    year,
                    int.Parse(dateString.Substring(4, 1)),
                    int.Parse(dateString.Substring(5, 2)));
            }

            if (dateString[5] != '0')
            {
                return new DateTime(
                    year,
                    int.Parse(dateString.Substring(4, 2)),
                    int.Parse(dateString.Substring(6, 1)));
            }

            throw new AmbiguousMatchException("Multiple Possible Valid Date");
        }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s