Categories
Azure Data Factory

Sending email from Azure Data Factory

From time to time, it’s a good idea to send someone an email. And sometimes it is necessary to have the sending of emails automated.
If you have data processing jobs at Azure Data Factory, then you might want to be able to send an email about how their processing turned out.
It is logical that if you already use the ADF, you would expect the ADF to handle emails as well. But wow. The ADF cannot do this on its own. We’ll have to help it.

I won’t delay it. Logicc Apps can send emails quite easily. In addition, they can be run with a simple http request. And the ADF can create and send http requests. So sending the email itself will not be a problem. The problem will be to create a nice email :-).

Creating Logic App for email sending

1. Create new logic app. Choose your subscription, resource group and pick name of your app.

2. In “welcome” screen choose first step “When a HTTP request is received”.

3. Now you need some idea of your email structure.
You can’t do without the fields: EmailTo (email recipient), Subject, Message (the body of the email itself).
Other attributes are up to you. I put the body of the email into the Message attribute already in HTML format. I don’t want to compose an html form of email here in the Logic App. Alternatively, you can send emails as plain text.
Definition of attributes which can HTTP activity process is in JSON format.

{
    "properties": {
        "DataFactoryName": {
            "type": "string"
        },
        "EmailTo": {
            "type": "string"
        },
        "Footer": {
            "type": "string"
        },
        "Header": {
            "type": "string"
        },
        "Message": {
            "type": "string"
        },
        "PipelineName": {
            "type": "string"
        },
        "Subject": {
            "type": "string"
        }
    },
    "type": "object"
}

4. Paste JSON to Request Body JSON Schema field.

4. Add next step – Send an email. You can choose Office 365 Outlook for that or SMTP server to send your emails.

5. Now you use dynamic content to add attributes from HTML request to Send an email (V2) task.

6. If you want to use SMTP server for sending your emails you will need additional information about server (host, login, password). You can call HTTP request with this info in JSON or you can store it directly in Logic App (in particular step).

7. After initial setup logic app should look like this. And you are ready to run. You can use this app from any service which can produce HTTP request (which si send to HTTP POST URL of this app).

Calling Logic App from Azure Data Factory

Now you need to setup Azure Data Factory. This is easier than the previous step. Just open any pipeline and add Web activity (1).
Fill URL (2) of Logic App and choose right method (we are using POST). Don’t forget Content-Type (3). The Body (4) section includes all the attributes that the Logic App needs to send an email. Of course in JSON format.

And it’s done. Now you need just test it.
Fill parameters in ADF and run your pipeline.

It’s a simple solution and can be improved and enhanced, but for a basic idea of how to send an email from the ADF, it’s definitely enough.

Categories
Azure Databricks

Databricks function for generating XML file

In previous blog post i have mentioned databricks (scala) function for generating XML file. Here it is.

def createXMLFile
(
  parInputDataDataframe: DataFrame,
// -- dataframe with main data, names of columns are used as names of elements
  parXmlDeclaration: String,
// -- declaration XML file, version, encoding, etc.  
  parXmlHeader: String,
// -- XML header which contains any elements which are before main data part
  parXmlMainElementStart: String,
// -- start root tag
  parXmlMainElementEnd: String,
// -- end root tag
  parDestinationFileName: String,
// -- only XML file name  
  parDestinationFilePath: String,
// -- path to XML file (without file name, only path) with "/" on the end
  parXmlRootTag: String,
// -- root tag for main data part
  parXmlRowTag: String
// -- tag for rows of main data
)
{
try  
  {
// mounting of ADLS storage for saving XML document
    mountDestinationStorage("/mnt/export")

// creation of temp XML file, it's for main data, we create xml file with main data using XML library and then we are reading the XML file back as plain text  
// XML library can't save more complex xml files
    parInputDataDataframe.write
    .format("xml")
    .option("rootTag", parXmlRootTag)
    .option("rowTag", parXmlRowTag)
    .mode("overwrite")
    .save("dbfs:/mnt/export/" + parDestinationFilePath + parDestinationFileName)

// reading of the XML file as plain text
  val fileContents = Source.fromFile("/dbfs/mnt/export/" + parDestinationFilePath + parDestinationFileName + "/part-00000").mkString    
  
// composition of the whole final xml file
  val txtFinal = parXmlDeclaration + parXmlMainElementStart + parXmlHeader + "\r\n" + fileContents + parXmlMainElementEnd

// final save of xml file (as plain text, without using of XML library)
  dbutils.fs.put("dbfs:/mnt/export/" + parDestinationFilePath + parDestinationFileName + ".xml", txtFinal, true)  

// deleteion of temp file
  val directory = new Directory(new File("/dbfs/mnt/export/" + parDestinationFilePath + parDestinationFileName + "/"))
  directory.deleteRecursively()  

// unmounting of ADLS storage
    unmountDestinationStorage("/mnt/export")
  }  
catch
  {
   case e: java.rmi.RemoteException => {
      println("Directory is Already Mounted")
      dbutils.fs.unmount("/mnt/export")
    }
    case e: Exception => {
      println("There was some other error.")
    }
  } 
}
Categories
Azure Databricks

Difference between val and var (Azure Databricks)

(error: reassignment to val)

If you’ve already had the same error message as me, there’s an error between the chair and the keyboard. And in ignorance of what is the difference between val and var.

The difference between val and var is that val makes a variable immutable and var makes a variable mutable. Because val fields can’t vary, some people refer to them as values rather than variables.

Immutable: In object-oriented and functional programming, an immutable object (unchangeable[1] object) is an object whose state cannot be modified after it is created

Categories
Azure Databricks

Create XML export from Azure SQL database

Task: Export data from Azure SQL database to XML file.

We have an Azure SQL database and we need to export data from it in the form of an XML file.

There is a couple of ways how to create XML file

  • SQL server: SQL server can nicely save result of query to XML but then saving the file to ADLS is a difficult. I didn’t find easy way how to do it.
  • Azure Function: Undoubtedly the best way. A simple script for generating XML would suffice. But it would cost quite a lot of research (for me) and I didn’t want that :-).
  • ADF: Unfortunately, XML is not a supported format. A supported format is JSON, for example.
  • Logic App: There can be JSON converted to XML. So we can combine ADF and Logic App.
  • Databricks: Yes, in scala is library which can convert dataframe to XML and it can save it. But it is not almighty. There can be one root node and then only row nodes on one level. But XML can be prepared in plain text form and then saved as XML (via plain text save method).

So final solution is in databricks. At least for now.

Azure Databricks
Although there is a library for working with XML files, if you need to create a more complex document, which does not consist only of the root element and then of individual line elements, then some manual work must be added.
The main idea is to prepare all the important parts of XML separately. The declaration, the tag for the root element, the header, the main data part itself and the footer.

We’ll need a few libraries for all the fun. The main is library com.databricks:spark-xml_2.11:0.9.0 and you can find it in maven.

Declaration
The XML declaration is a processing instruction that identifies the document as being XML.
Example: <?xml version=”1.0″ encoding=”UTF-8″ standalone=”no” ?>

XML declariation does not contain any variable so we can use the classic string value. It’s the same for XML root tag which is formed with start tag parXmlMainElementStart and end tag parXmlMainElementEnd.
parXmlRootTag is for main data and parXmlRowTag is tag for rows in main dataset.

val parXmlDeclaration: String = “<?xml version=’1.0′ encoding=’UTF-8′ standalone=’yes’?>”

val parXmlMainElementStart: String = “<mainRoot>”

val parXmlMainElementEnd: String = “</mainRoot>”

val parXmlRootTag: String = “inquiries”

val parXmlRowTag: String = “inquiry”

The document header is also a standalone value.

val xmlHeader =
<header>
<version>1.1</version>
<firmNumber>NY01017T</firmNumber>
<datetime>20200815</datetime>
</header>

We now have constant values ​​ready. Now just prepare the row data itself. You need to have the data loaded in the dataframe. Then all you have to do is call the write method.

parDfInputData.write
.format(“xml”)
.option(“rootTag”, parXmlRootTag)
.option(“rowTag”, parXmlRowTag)
.mode(“overwrite”)
.save(“dbfs:/mnt/export/data”)

This saved an XML document with the line data itself. A new directory has also been created that contains xml data in a file named part-000000. You only need to use the contents of this file and glue it all together with the rest.
So we load the just created XML document as an ordinary text file into a text value.

val fileContents = Source.fromFile(“/dbfs/mnt/export/data/part-00000”).mkString

And let’s put it all together.

val txtFinal = parXmlDeclaration + parXmlMainElementStart + parXmlHeader + “\r\n” + fileContents + parXmlMainElementEnd

Now we will save XML document as a plain text file.

dbutils.fs.put(“dbfs:/mnt/export/data” + “.xml”, txtFinal, true)

And finally we clean up, the unwanted directory, after ourselves.

val directory = new Directory(new File(“/dbfs/mnt/export/data/” ))
directory.deleteRecursively()

In the end you can create function for creating XML documents from sql database. But that will be in the next post :-).

Categories
Azure Data Factory

Previous working day in Azure Data Factory

Task: Identification of the previous working day to todays date.

A classic task. To load CSV data from the source file system, we need to identify the data of the previous working day.

Ie. on Thursday we download data from Wednesday and on Monday we download data from Friday last week.
It sounds simple, but you need to identify the previous business day to the current day in Azure Data Factory. And here I came across a bit, that’s the problem.

On Monday we need data from Fridey last week.

A clean solution would be to create a condition for finding date in the Add dynamic content section.

And a little dirtier, but a faster solution would be to turn to the good old t-sql.

SET DATEFIRST 1;
SELECT
cast(
format(
DATEADD(DAY,
CASE (DATEPART(WEEKDAY, GETDATE()) + @@DATEFIRST) % 7
WHEN 1 THEN -2
WHEN 2 THEN -3
ELSE
-1
END,
DATEDIFF(DAY, 0, GETDATE())), ‘yyMMdd’) as nvarchar)
as PreviousBusinessDate

In the end, one Lookup in SQL database is enough and it’s done. That is, assuming you have an SQL database. If not, you will have nothing left but “Add dynamic content”.