I'm importing data from a Comma Separated Value (CSV) file into Cloudant, using the most excellent CouchDB tools provided by my IBM colleague, Glynn Bird.
id,givenName,familyName
1,Maggie,Simpson
2,Lisa,Simpson
3,Bart,Simpson
4,Homer,Simpson
5,Fred,Flintstone
6,Wilma,Flintstone
7,Barney,Rubble
8,Betty,Rubble
Having created a CSV: -
vi cartoon.csv
1,Maggie,Simpson
2,Lisa,Simpson
3,Bart,Simpson
4,Homer,Simpson
5,Fred,Flintstone
6,Wilma,Flintstone
7,Barney,Rubble
8,Betty,Rubble
( with due respect to the creators and owners of The Simpsons and The Flintstones )
I setup my environment: -
export ACCOUNT=0e5c777542c5e2cc2418013429e0824f-bluemix:d088ff753c9e258add92e45128cd161d
acbffedbcec0c8f78b216368ba0503ab
export HOST=d088ff753c9e258add92e45128cd161d-bluemix.cloudant.com
export COUCH_URL=https://$ACCOUNT@$HOST
export COUCH_DATABASE="CARTOON"
export COUCH_DATABASE=`echo $COUCH_DATABASE | tr '[:upper:]''[:lower:]'`
export COUCH_DELIMITER=","
and created my database: -
curl -X PUT $COUCH_URL/$COUCH_DATABASE
and populated it: -
cat $COUCH_DATABASE.csv | couchimport
acbffedbcec0c8f78b216368ba0503ab
export HOST=d088ff753c9e258add92e45128cd161d-bluemix.cloudant.com
export COUCH_URL=https://$ACCOUNT@$HOST
export COUCH_DATABASE="CARTOON"
export COUCH_DATABASE=`echo $COUCH_DATABASE | tr '[:upper:]''[:lower:]'`
export COUCH_DELIMITER=","
and created my database: -
curl -X PUT $COUCH_URL/$COUCH_DATABASE
and populated it: -
cat $COUCH_DATABASE.csv | couchimport
This worked well but …. my data had a system-generated _id field whereas I wanted to use my own ID field: -
{
"_id": "e143bcd25bc620e6aa8f2adc206cf21c",
"_rev": "1-0152a3e6867ad34da6e882a80f0fbeff",
"id": "1",
"givenName": "Maggie",
"familyName": "Simpson"
}
"_id": "e143bcd25bc620e6aa8f2adc206cf21c",
"_rev": "1-0152a3e6867ad34da6e882a80f0fbeff",
"id": "1",
"givenName": "Maggie",
"familyName": "Simpson"
}
{
"_id": "82c1068c830759a904cfdd02ab41b980",
"_rev": "1-6bbb94301323a3c3f6ff54f1c3c765e5",
"id": "2",
"givenName": "Lisa",
"familyName": "Simpson"
}
"_id": "82c1068c830759a904cfdd02ab41b980",
"_rev": "1-6bbb94301323a3c3f6ff54f1c3c765e5",
"id": "2",
"givenName": "Lisa",
"familyName": "Simpson"
}
Thankfully Glenn kindly advised me how to use a JavaScript function to mitigate this: -
vi ~/transform_cartoon.js
var transform = function(doc) {
which effectively assigns the _id field to the value of the id field ( as taken from the CSV ) and also drops the original id field.
I dropped the DB: -
curl -X DELETE $COUCH_URL/$COUCH_DATABASE
and recreated it: -
curl -X PUT $COUCH_URL/$COUCH_DATABASE
and then repopulated it: -
cat $COUCH_DATABASE.csv | couchimport --transform ~/transform_cartoon.js
and now we have this: -
{
"_id": "1",
"_rev": "1-0e77dbadefba2a95e5cde5bda2ecd695",
"givenName": "Maggie",
"familyName": "Simpson"
"_id": "1",
"_rev": "1-0e77dbadefba2a95e5cde5bda2ecd695",
"givenName": "Maggie",
"familyName": "Simpson"
}
{
"_id": "2",
"_rev": "1-fc746edc394ac98b013b7788cc1cca5d",
"givenName": "Lisa",
"familyName": "Simpson"
}
"_id": "2",
"_rev": "1-fc746edc394ac98b013b7788cc1cca5d",
"givenName": "Lisa",
"familyName": "Simpson"
}
If needed, I could modify my transform: -
to avoid dropping the original id field, to give me this: -
{
"_id": "1",
"_rev": "1-0152a3e6867ad34da6e882a80f0fbeff",
"id": "1",
"givenName": "Maggie",
"familyName": "Simpson"
}
"_id": "1",
"_rev": "1-0152a3e6867ad34da6e882a80f0fbeff",
"id": "1",
"givenName": "Maggie",
"familyName": "Simpson"
}
{
"_id": "2",
"_rev": "1-6bbb94301323a3c3f6ff54f1c3c765e5",
"id": "2",
"givenName": "Lisa",
"familyName": "Simpson"
}
"_id": "2",
"_rev": "1-6bbb94301323a3c3f6ff54f1c3c765e5",
"id": "2",
"givenName": "Lisa",
"familyName": "Simpson"
}
so I have choices :-)
For more insights, please go here: -